Copyright | (c) 2013-2021 Brendan Hay |
---|---|
License | Mozilla Public License, v. 2.0. |
Maintainer | Brendan Hay <brendan.g.hay+amazonka@gmail.com> |
Stability | auto-generated |
Portability | non-portable (GHC extensions) |
Safe Haskell | None |
Synopsis
- data TransformInput = TransformInput' {}
- newTransformInput :: TransformDataSource -> TransformInput
- transformInput_splitType :: Lens' TransformInput (Maybe SplitType)
- transformInput_compressionType :: Lens' TransformInput (Maybe CompressionType)
- transformInput_contentType :: Lens' TransformInput (Maybe Text)
- transformInput_dataSource :: Lens' TransformInput TransformDataSource
Documentation
data TransformInput Source #
Describes the input source of a transform job and the way the transform job consumes it.
See: newTransformInput
smart constructor.
TransformInput' | |
|
Instances
Create a value of TransformInput
with all optional fields omitted.
Use generic-lens or optics to modify other optional fields.
The following record fields are available, with the corresponding lenses provided for backwards compatibility:
$sel:splitType:TransformInput'
, transformInput_splitType
- The method to use to split the transform job's data files into smaller
batches. Splitting is necessary when the total size of each object is
too large to fit in a single request. You can also use data splitting to
improve performance by processing multiple concurrent mini-batches. The
default value for SplitType
is None
, which indicates that input data
files are not split, and request payloads contain the entire contents of
an input object. Set the value of this parameter to Line
to split
records on a newline character boundary. SplitType
also supports a
number of record-oriented binary data formats. Currently, the supported
record formats are:
- RecordIO
- TFRecord
When splitting is enabled, the size of a mini-batch depends on the
values of the BatchStrategy
and MaxPayloadInMB
parameters. When the
value of BatchStrategy
is MultiRecord
, Amazon SageMaker sends the
maximum number of records in each request, up to the MaxPayloadInMB
limit. If the value of BatchStrategy
is SingleRecord
, Amazon
SageMaker sends individual records in each request.
Some data formats represent a record as a binary payload wrapped with
extra padding bytes. When splitting is applied to a binary data format,
padding is removed if the value of BatchStrategy
is set to
SingleRecord
. Padding is not removed if the value of BatchStrategy
is set to MultiRecord
.
For more information about RecordIO
, see
Create a Dataset Using RecordIO
in the MXNet documentation. For more information about TFRecord
, see
Consuming TFRecord data
in the TensorFlow documentation.
$sel:compressionType:TransformInput'
, transformInput_compressionType
- If your transform data is compressed, specify the compression type.
Amazon SageMaker automatically decompresses the data for the transform
job accordingly. The default value is None
.
$sel:contentType:TransformInput'
, transformInput_contentType
- The multipurpose internet mail extension (MIME) type of the data. Amazon
SageMaker uses the MIME type with each http call to transfer data to the
transform job.
$sel:dataSource:TransformInput'
, transformInput_dataSource
- Describes the location of the channel data, which is, the S3 location of
the input data that the model can consume.
transformInput_splitType :: Lens' TransformInput (Maybe SplitType) Source #
The method to use to split the transform job's data files into smaller
batches. Splitting is necessary when the total size of each object is
too large to fit in a single request. You can also use data splitting to
improve performance by processing multiple concurrent mini-batches. The
default value for SplitType
is None
, which indicates that input data
files are not split, and request payloads contain the entire contents of
an input object. Set the value of this parameter to Line
to split
records on a newline character boundary. SplitType
also supports a
number of record-oriented binary data formats. Currently, the supported
record formats are:
- RecordIO
- TFRecord
When splitting is enabled, the size of a mini-batch depends on the
values of the BatchStrategy
and MaxPayloadInMB
parameters. When the
value of BatchStrategy
is MultiRecord
, Amazon SageMaker sends the
maximum number of records in each request, up to the MaxPayloadInMB
limit. If the value of BatchStrategy
is SingleRecord
, Amazon
SageMaker sends individual records in each request.
Some data formats represent a record as a binary payload wrapped with
extra padding bytes. When splitting is applied to a binary data format,
padding is removed if the value of BatchStrategy
is set to
SingleRecord
. Padding is not removed if the value of BatchStrategy
is set to MultiRecord
.
For more information about RecordIO
, see
Create a Dataset Using RecordIO
in the MXNet documentation. For more information about TFRecord
, see
Consuming TFRecord data
in the TensorFlow documentation.
transformInput_compressionType :: Lens' TransformInput (Maybe CompressionType) Source #
If your transform data is compressed, specify the compression type.
Amazon SageMaker automatically decompresses the data for the transform
job accordingly. The default value is None
.
transformInput_contentType :: Lens' TransformInput (Maybe Text) Source #
The multipurpose internet mail extension (MIME) type of the data. Amazon SageMaker uses the MIME type with each http call to transfer data to the transform job.
transformInput_dataSource :: Lens' TransformInput TransformDataSource Source #
Describes the location of the channel data, which is, the S3 location of the input data that the model can consume.