Copyright | (c) 2013-2021 Brendan Hay |
---|---|
License | Mozilla Public License, v. 2.0. |
Maintainer | Brendan Hay <brendan.g.hay+amazonka@gmail.com> |
Stability | auto-generated |
Portability | non-portable (GHC extensions) |
Safe Haskell | None |
Documentation
data DataProcessing Source #
The data structure used to specify the data to be used for inference in a batch transform job and to associate the data that is relevant to the prediction results in the output. The input filter provided allows you to exclude input data that is not needed for inference in a batch transform job. The output filter provided allows you to include input data relevant to interpreting the predictions in the output from the job. For more information, see Associate Prediction Results with their Corresponding Input Records.
See: newDataProcessing
smart constructor.
DataProcessing' | |
|
Instances
newDataProcessing :: DataProcessing Source #
Create a value of DataProcessing
with all optional fields omitted.
Use generic-lens or optics to modify other optional fields.
The following record fields are available, with the corresponding lenses provided for backwards compatibility:
$sel:outputFilter:DataProcessing'
, dataProcessing_outputFilter
- A
JSONPath
expression used to select a portion of the joined dataset to save in the
output file for a batch transform job. If you want Amazon SageMaker to
store the entire input dataset in the output file, leave the default
value, $
. If you specify indexes that aren't within the dimension
size of the joined dataset, you get an error.
Examples: "$"
, "$[0,5:]"
, "$['id','SageMakerOutput']"
$sel:joinSource:DataProcessing'
, dataProcessing_joinSource
- Specifies the source of the data to join with the transformed data. The
valid values are None
and Input
. The default value is None
, which
specifies not to join the input with the transformed data. If you want
the batch transform job to join the original input data with the
transformed data, set JoinSource
to Input
. You can specify
OutputFilter
as an additional filter to select a portion of the joined
dataset and store it in the output file.
For JSON or JSONLines objects, such as a JSON array, SageMaker adds the
transformed data to the input JSON object in an attribute called
SageMakerOutput
. The joined result for JSON must be a key-value pair
object. If the input is not a key-value pair object, SageMaker creates a
new JSON file. In the new JSON file, and the input data is stored under
the SageMakerInput
key and the results are stored in
SageMakerOutput
.
For CSV data, SageMaker takes each row as a JSON array and joins the transformed data with the input by appending each transformed row to the end of the input. The joined data has the original input data followed by the transformed data and the output is a CSV file.
For information on how joining in applied, see Workflow for Associating Inferences with Input Records.
$sel:inputFilter:DataProcessing'
, dataProcessing_inputFilter
- A
JSONPath
expression used to select a portion of the input data to pass to the
algorithm. Use the InputFilter
parameter to exclude fields, such as an
ID column, from the input. If you want Amazon SageMaker to pass the
entire input dataset to the algorithm, accept the default value $
.
Examples: "$"
, "$[1:]"
, "$.features"
dataProcessing_outputFilter :: Lens' DataProcessing (Maybe Text) Source #
A
JSONPath
expression used to select a portion of the joined dataset to save in the
output file for a batch transform job. If you want Amazon SageMaker to
store the entire input dataset in the output file, leave the default
value, $
. If you specify indexes that aren't within the dimension
size of the joined dataset, you get an error.
Examples: "$"
, "$[0,5:]"
, "$['id','SageMakerOutput']"
dataProcessing_joinSource :: Lens' DataProcessing (Maybe JoinSource) Source #
Specifies the source of the data to join with the transformed data. The
valid values are None
and Input
. The default value is None
, which
specifies not to join the input with the transformed data. If you want
the batch transform job to join the original input data with the
transformed data, set JoinSource
to Input
. You can specify
OutputFilter
as an additional filter to select a portion of the joined
dataset and store it in the output file.
For JSON or JSONLines objects, such as a JSON array, SageMaker adds the
transformed data to the input JSON object in an attribute called
SageMakerOutput
. The joined result for JSON must be a key-value pair
object. If the input is not a key-value pair object, SageMaker creates a
new JSON file. In the new JSON file, and the input data is stored under
the SageMakerInput
key and the results are stored in
SageMakerOutput
.
For CSV data, SageMaker takes each row as a JSON array and joins the transformed data with the input by appending each transformed row to the end of the input. The joined data has the original input data followed by the transformed data and the output is a CSV file.
For information on how joining in applied, see Workflow for Associating Inferences with Input Records.
dataProcessing_inputFilter :: Lens' DataProcessing (Maybe Text) Source #
A
JSONPath
expression used to select a portion of the input data to pass to the
algorithm. Use the InputFilter
parameter to exclude fields, such as an
ID column, from the input. If you want Amazon SageMaker to pass the
entire input dataset to the algorithm, accept the default value $
.
Examples: "$"
, "$[1:]"
, "$.features"