Copyright | (c) 2013-2021 Brendan Hay |
---|---|
License | Mozilla Public License, v. 2.0. |
Maintainer | Brendan Hay <brendan.g.hay+amazonka@gmail.com> |
Stability | auto-generated |
Portability | non-portable (GHC extensions) |
Safe Haskell | None |
Synopsis
- data S3Settings = S3Settings' {
- parquetVersion :: Maybe ParquetVersionValue
- preserveTransactions :: Maybe Bool
- maxFileSize :: Maybe Int
- csvNoSupValue :: Maybe Text
- rfc4180 :: Maybe Bool
- parquetTimestampInMillisecond :: Maybe Bool
- includeOpForFullLoad :: Maybe Bool
- cdcMinFileSize :: Maybe Int
- csvDelimiter :: Maybe Text
- serviceAccessRoleArn :: Maybe Text
- bucketFolder :: Maybe Text
- dataFormat :: Maybe DataFormatValue
- datePartitionEnabled :: Maybe Bool
- encodingType :: Maybe EncodingTypeValue
- cdcMaxBatchInterval :: Maybe Int
- ignoreHeaderRows :: Maybe Int
- externalTableDefinition :: Maybe Text
- dictPageSizeLimit :: Maybe Int
- bucketName :: Maybe Text
- encryptionMode :: Maybe EncryptionModeValue
- enableStatistics :: Maybe Bool
- cdcInsertsOnly :: Maybe Bool
- timestampColumnName :: Maybe Text
- csvRowDelimiter :: Maybe Text
- datePartitionDelimiter :: Maybe DatePartitionDelimiterValue
- addColumnName :: Maybe Bool
- cannedAclForObjects :: Maybe CannedAclForObjectsValue
- compressionType :: Maybe CompressionTypeValue
- csvNullValue :: Maybe Text
- serverSideEncryptionKmsKeyId :: Maybe Text
- dataPageSize :: Maybe Int
- useCsvNoSupValue :: Maybe Bool
- cdcInsertsAndUpdates :: Maybe Bool
- datePartitionSequence :: Maybe DatePartitionSequenceValue
- rowGroupLength :: Maybe Int
- cdcPath :: Maybe Text
- newS3Settings :: S3Settings
- s3Settings_parquetVersion :: Lens' S3Settings (Maybe ParquetVersionValue)
- s3Settings_preserveTransactions :: Lens' S3Settings (Maybe Bool)
- s3Settings_maxFileSize :: Lens' S3Settings (Maybe Int)
- s3Settings_csvNoSupValue :: Lens' S3Settings (Maybe Text)
- s3Settings_rfc4180 :: Lens' S3Settings (Maybe Bool)
- s3Settings_parquetTimestampInMillisecond :: Lens' S3Settings (Maybe Bool)
- s3Settings_includeOpForFullLoad :: Lens' S3Settings (Maybe Bool)
- s3Settings_cdcMinFileSize :: Lens' S3Settings (Maybe Int)
- s3Settings_csvDelimiter :: Lens' S3Settings (Maybe Text)
- s3Settings_serviceAccessRoleArn :: Lens' S3Settings (Maybe Text)
- s3Settings_bucketFolder :: Lens' S3Settings (Maybe Text)
- s3Settings_dataFormat :: Lens' S3Settings (Maybe DataFormatValue)
- s3Settings_datePartitionEnabled :: Lens' S3Settings (Maybe Bool)
- s3Settings_encodingType :: Lens' S3Settings (Maybe EncodingTypeValue)
- s3Settings_cdcMaxBatchInterval :: Lens' S3Settings (Maybe Int)
- s3Settings_ignoreHeaderRows :: Lens' S3Settings (Maybe Int)
- s3Settings_externalTableDefinition :: Lens' S3Settings (Maybe Text)
- s3Settings_dictPageSizeLimit :: Lens' S3Settings (Maybe Int)
- s3Settings_bucketName :: Lens' S3Settings (Maybe Text)
- s3Settings_encryptionMode :: Lens' S3Settings (Maybe EncryptionModeValue)
- s3Settings_enableStatistics :: Lens' S3Settings (Maybe Bool)
- s3Settings_cdcInsertsOnly :: Lens' S3Settings (Maybe Bool)
- s3Settings_timestampColumnName :: Lens' S3Settings (Maybe Text)
- s3Settings_csvRowDelimiter :: Lens' S3Settings (Maybe Text)
- s3Settings_datePartitionDelimiter :: Lens' S3Settings (Maybe DatePartitionDelimiterValue)
- s3Settings_addColumnName :: Lens' S3Settings (Maybe Bool)
- s3Settings_cannedAclForObjects :: Lens' S3Settings (Maybe CannedAclForObjectsValue)
- s3Settings_compressionType :: Lens' S3Settings (Maybe CompressionTypeValue)
- s3Settings_csvNullValue :: Lens' S3Settings (Maybe Text)
- s3Settings_serverSideEncryptionKmsKeyId :: Lens' S3Settings (Maybe Text)
- s3Settings_dataPageSize :: Lens' S3Settings (Maybe Int)
- s3Settings_useCsvNoSupValue :: Lens' S3Settings (Maybe Bool)
- s3Settings_cdcInsertsAndUpdates :: Lens' S3Settings (Maybe Bool)
- s3Settings_datePartitionSequence :: Lens' S3Settings (Maybe DatePartitionSequenceValue)
- s3Settings_rowGroupLength :: Lens' S3Settings (Maybe Int)
- s3Settings_cdcPath :: Lens' S3Settings (Maybe Text)
Documentation
data S3Settings Source #
Settings for exporting data to Amazon S3.
See: newS3Settings
smart constructor.
S3Settings' | |
|
Instances
newS3Settings :: S3Settings Source #
Create a value of S3Settings
with all optional fields omitted.
Use generic-lens or optics to modify other optional fields.
The following record fields are available, with the corresponding lenses provided for backwards compatibility:
$sel:parquetVersion:S3Settings'
, s3Settings_parquetVersion
- The version of the Apache Parquet format that you want to use:
parquet_1_0
(the default) or parquet_2_0
.
$sel:preserveTransactions:S3Settings'
, s3Settings_preserveTransactions
- If set to true
, DMS saves the transaction order for a change data
capture (CDC) load on the Amazon S3 target specified by
CdcPath
. For more information, see
Capturing data changes (CDC) including transaction order on the S3 target.
This setting is supported in DMS versions 3.4.2 and later.
$sel:maxFileSize:S3Settings'
, s3Settings_maxFileSize
- A value that specifies the maximum size (in KB) of any .csv file to be
created while migrating to an S3 target during full load.
The default value is 1,048,576 KB (1 GB). Valid values include 1 to 1,048,576.
$sel:csvNoSupValue:S3Settings'
, s3Settings_csvNoSupValue
- This setting only applies if your Amazon S3 output files during a change
data capture (CDC) load are written in .csv format. If
UseCsvNoSupValue
is set to true, specify a string value that you want DMS to use for all
columns not included in the supplemental log. If you do not specify a
string value, DMS uses the null value for these columns regardless of
the UseCsvNoSupValue
setting.
This setting is supported in DMS versions 3.4.1 and later.
$sel:rfc4180:S3Settings'
, s3Settings_rfc4180
- For an S3 source, when this value is set to true
or y
, each leading
double quotation mark has to be followed by an ending double quotation
mark. This formatting complies with RFC 4180. When this value is set to
false
or n
, string literals are copied to the target as is. In this
case, a delimiter (row or column) signals the end of the field. Thus,
you can't use a delimiter as part of the string, because it signals the
end of the value.
For an S3 target, an optional parameter used to set behavior to comply
with RFC 4180 for data migrated to Amazon S3 using .csv file format
only. When this value is set to true
or y
using Amazon S3 as a
target, if the data has quotation marks or newline characters in it, DMS
encloses the entire column with an additional pair of double quotation
marks ("). Every quotation mark within the data is repeated twice.
The default value is true
. Valid values include true
, false
, y
,
and n
.
$sel:parquetTimestampInMillisecond:S3Settings'
, s3Settings_parquetTimestampInMillisecond
- A value that specifies the precision of any TIMESTAMP
column values
that are written to an Amazon S3 object file in .parquet format.
DMS supports the ParquetTimestampInMillisecond
parameter in versions
3.1.4 and later.
When ParquetTimestampInMillisecond
is set to true
or y
, DMS writes
all TIMESTAMP
columns in a .parquet formatted file with millisecond
precision. Otherwise, DMS writes them with microsecond precision.
Currently, Amazon Athena and Glue can handle only millisecond precision
for TIMESTAMP
values. Set this parameter to true
for S3 endpoint
object files that are .parquet formatted only if you plan to query or
process the data with Athena or Glue.
DMS writes any TIMESTAMP
column values written to an S3 file in .csv
format with microsecond precision.
Setting ParquetTimestampInMillisecond
has no effect on the string
format of the timestamp column value that is inserted by setting the
TimestampColumnName
parameter.
$sel:includeOpForFullLoad:S3Settings'
, s3Settings_includeOpForFullLoad
- A value that enables a full load to write INSERT operations to the
comma-separated value (.csv) output files only to indicate how the rows
were added to the source database.
DMS supports the IncludeOpForFullLoad
parameter in versions 3.1.4 and
later.
For full load, records can only be inserted. By default (the false
setting), no information is recorded in these output files for a full
load to indicate that the rows were inserted at the source database. If
IncludeOpForFullLoad
is set to true
or y
, the INSERT is recorded
as an I annotation in the first field of the .csv file. This allows the
format of your target records from a full load to be consistent with the
target records from a CDC load.
This setting works together with the CdcInsertsOnly
and the
CdcInsertsAndUpdates
parameters for output to .csv files only. For
more information about how these settings work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
$sel:cdcMinFileSize:S3Settings'
, s3Settings_cdcMinFileSize
- Minimum file size, defined in megabytes, to reach for a file output to
Amazon S3.
When CdcMinFileSize
and CdcMaxBatchInterval
are both specified, the
file write is triggered by whichever parameter condition is met first
within an DMS CloudFormation template.
The default value is 32 MB.
$sel:csvDelimiter:S3Settings'
, s3Settings_csvDelimiter
- The delimiter used to separate columns in the .csv file for both source
and target. The default is a comma.
$sel:serviceAccessRoleArn:S3Settings'
, s3Settings_serviceAccessRoleArn
- The Amazon Resource Name (ARN) used by the service to access the IAM
role. The role must allow the iam:PassRole
action. It is a required
parameter that enables DMS to write and read objects from an S3 bucket.
$sel:bucketFolder:S3Settings'
, s3Settings_bucketFolder
- An optional parameter to set a folder name in the S3 bucket. If
provided, tables are created in the path
bucketFolder/schema_name/table_name/
. If this parameter isn't
specified, then the path used is schema_name/table_name/
.
$sel:dataFormat:S3Settings'
, s3Settings_dataFormat
- The format of the data that you want to use for output. You can choose
one of the following:
csv
: This is a row-based file format with comma-separated values (.csv).parquet
: Apache Parquet (.parquet) is a columnar storage file format that features efficient compression and provides faster query response.
$sel:datePartitionEnabled:S3Settings'
, s3Settings_datePartitionEnabled
- When set to true
, this parameter partitions S3 bucket folders based on
transaction commit dates. The default value is false
. For more
information about date-based folder partitioning, see
Using date-based folder partitioning.
$sel:encodingType:S3Settings'
, s3Settings_encodingType
- The type of encoding you are using:
RLE_DICTIONARY
uses a combination of bit-packing and run-length encoding to store repeated values more efficiently. This is the default.PLAIN
doesn't use encoding at all. Values are stored as they are.PLAIN_DICTIONARY
builds a dictionary of the values encountered in a given column. The dictionary is stored in a dictionary page for each column chunk.
$sel:cdcMaxBatchInterval:S3Settings'
, s3Settings_cdcMaxBatchInterval
- Maximum length of the interval, defined in seconds, after which to
output a file to Amazon S3.
When CdcMaxBatchInterval
and CdcMinFileSize
are both specified, the
file write is triggered by whichever parameter condition is met first
within an DMS CloudFormation template.
The default value is 60 seconds.
$sel:ignoreHeaderRows:S3Settings'
, s3Settings_ignoreHeaderRows
- When this value is set to 1, DMS ignores the first row header in a .csv
file. A value of 1 turns on the feature; a value of 0 turns off the
feature.
The default is 0.
$sel:externalTableDefinition:S3Settings'
, s3Settings_externalTableDefinition
- Specifies how tables are defined in the S3 source files only.
$sel:dictPageSizeLimit:S3Settings'
, s3Settings_dictPageSizeLimit
- The maximum size of an encoded dictionary page of a column. If the
dictionary page exceeds this, this column is stored using an encoding
type of PLAIN
. This parameter defaults to 1024 * 1024 bytes (1 MiB),
the maximum size of a dictionary page before it reverts to PLAIN
encoding. This size is used for .parquet file format only.
$sel:bucketName:S3Settings'
, s3Settings_bucketName
- The name of the S3 bucket.
$sel:encryptionMode:S3Settings'
, s3Settings_encryptionMode
- The type of server-side encryption that you want to use for your data.
This encryption type is part of the endpoint settings or the extra
connections attributes for Amazon S3. You can choose either SSE_S3
(the default) or SSE_KMS
.
For the ModifyEndpoint
operation, you can change the existing value of
the EncryptionMode
parameter from SSE_KMS
to SSE_S3
. But you can’t
change the existing value from SSE_S3
to SSE_KMS
.
To use SSE_S3
, you need an Identity and Access Management (IAM) role
with permission to allow "arn:aws:s3:::dms-*"
to use the following
actions:
s3:CreateBucket
s3:ListBucket
s3:DeleteBucket
s3:GetBucketLocation
s3:GetObject
s3:PutObject
s3:DeleteObject
s3:GetObjectVersion
s3:GetBucketPolicy
s3:PutBucketPolicy
s3:DeleteBucketPolicy
$sel:enableStatistics:S3Settings'
, s3Settings_enableStatistics
- A value that enables statistics for Parquet pages and row groups. Choose
true
to enable statistics, false
to disable. Statistics include
NULL
, DISTINCT
, MAX
, and MIN
values. This parameter defaults to
true
. This value is used for .parquet file format only.
$sel:cdcInsertsOnly:S3Settings'
, s3Settings_cdcInsertsOnly
- A value that enables a change data capture (CDC) load to write only
INSERT operations to .csv or columnar storage (.parquet) output files.
By default (the false
setting), the first field in a .csv or .parquet
record contains the letter I (INSERT), U (UPDATE), or D (DELETE). These
values indicate whether the row was inserted, updated, or deleted at the
source database for a CDC load to the target.
If CdcInsertsOnly
is set to true
or y
, only INSERTs from the
source database are migrated to the .csv or .parquet file. For .csv
format only, how these INSERTs are recorded depends on the value of
IncludeOpForFullLoad
. If IncludeOpForFullLoad
is set to true
, the
first field of every CDC record is set to I to indicate the INSERT
operation at the source. If IncludeOpForFullLoad
is set to false
,
every CDC record is written without a first field to indicate the INSERT
operation at the source. For more information about how these settings
work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
DMS supports the interaction described preceding between the
CdcInsertsOnly
and IncludeOpForFullLoad
parameters in versions 3.1.4
and later.
CdcInsertsOnly
and CdcInsertsAndUpdates
can't both be set to true
for the same endpoint. Set either CdcInsertsOnly
or
CdcInsertsAndUpdates
to true
for the same endpoint, but not both.
$sel:timestampColumnName:S3Settings'
, s3Settings_timestampColumnName
- A value that when nonblank causes DMS to add a column with timestamp
information to the endpoint data for an Amazon S3 target.
DMS supports the TimestampColumnName
parameter in versions 3.1.4 and
later.
DMS includes an additional STRING
column in the .csv or .parquet
object files of your migrated data when you set TimestampColumnName
to
a nonblank value.
For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.
For a change data capture (CDC) load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.
The string format for this timestamp column value is
yyyy-MM-dd HH:mm:ss.SSSSSS
. By default, the precision of this value is
in microseconds. For a CDC load, the rounding of the precision depends
on the commit timestamp supported by DMS for the source database.
When the AddColumnName
parameter is set to true
, DMS also includes a
name for the timestamp column that you set with TimestampColumnName
.
$sel:csvRowDelimiter:S3Settings'
, s3Settings_csvRowDelimiter
- The delimiter used to separate rows in the .csv file for both source and
target. The default is a carriage return (\n
).
$sel:datePartitionDelimiter:S3Settings'
, s3Settings_datePartitionDelimiter
- Specifies a date separating delimiter to use during folder partitioning.
The default value is SLASH
. Use this parameter when
DatePartitionedEnabled
is set to true
.
$sel:addColumnName:S3Settings'
, s3Settings_addColumnName
- An optional parameter that, when set to true
or y
, you can use to
add column name information to the .csv output file.
The default value is false
. Valid values are true
, false
, y
, and
n
.
$sel:cannedAclForObjects:S3Settings'
, s3Settings_cannedAclForObjects
- A value that enables DMS to specify a predefined (canned) access control
list for objects created in an Amazon S3 bucket as .csv or .parquet
files. For more information about Amazon S3 canned ACLs, see
Canned ACL
in the Amazon S3 Developer Guide.
The default value is NONE. Valid values include NONE, PRIVATE, PUBLIC_READ, PUBLIC_READ_WRITE, AUTHENTICATED_READ, AWS_EXEC_READ, BUCKET_OWNER_READ, and BUCKET_OWNER_FULL_CONTROL.
$sel:compressionType:S3Settings'
, s3Settings_compressionType
- An optional parameter to use GZIP to compress the target files. Set to
GZIP to compress the target files. Either set this parameter to NONE
(the default) or don't use it to leave the files uncompressed. This
parameter applies to both .csv and .parquet file formats.
$sel:csvNullValue:S3Settings'
, s3Settings_csvNullValue
- An optional parameter that specifies how DMS treats null values. While
handling the null value, you can use this parameter to pass a
user-defined string as null when writing to the target. For example,
when target columns are not nullable, you can use this option to
differentiate between the empty string value and the null value. So, if
you set this parameter value to the empty string ("" or ''), DMS
treats the empty string as the null value instead of NULL
.
The default value is NULL
. Valid values include any valid string.
$sel:serverSideEncryptionKmsKeyId:S3Settings'
, s3Settings_serverSideEncryptionKmsKeyId
- If you are using SSE_KMS
for the EncryptionMode
, provide the KMS key
ID. The key that you use needs an attached policy that enables Identity
and Access Management (IAM) user permissions and allows use of the key.
Here is a CLI example:
aws dms create-endpoint --endpoint-identifier value --endpoint-type target --engine-name s3 --s3-settings ServiceAccessRoleArn=value,BucketFolder=value,BucketName=value,EncryptionMode=SSE_KMS,ServerSideEncryptionKmsKeyId=value
$sel:dataPageSize:S3Settings'
, s3Settings_dataPageSize
- The size of one data page in bytes. This parameter defaults to 1024 *
1024 bytes (1 MiB). This number is used for .parquet file format only.
$sel:useCsvNoSupValue:S3Settings'
, s3Settings_useCsvNoSupValue
- This setting applies if the S3 output files during a change data capture
(CDC) load are written in .csv format. If set to true
for columns not
included in the supplemental log, DMS uses the value specified by
CsvNoSupValue
. If not set or set to false
, DMS uses the null value for these
columns.
This setting is supported in DMS versions 3.4.1 and later.
$sel:cdcInsertsAndUpdates:S3Settings'
, s3Settings_cdcInsertsAndUpdates
- A value that enables a change data capture (CDC) load to write INSERT
and UPDATE operations to .csv or .parquet (columnar storage) output
files. The default setting is false
, but when CdcInsertsAndUpdates
is set to true
or y
, only INSERTs and UPDATEs from the source
database are migrated to the .csv or .parquet file.
For .csv file format only, how these INSERTs and UPDATEs are recorded
depends on the value of the IncludeOpForFullLoad
parameter. If
IncludeOpForFullLoad
is set to true
, the first field of every CDC
record is set to either I
or U
to indicate INSERT and UPDATE
operations at the source. But if IncludeOpForFullLoad
is set to
false
, CDC records are written without an indication of INSERT or
UPDATE operations at the source. For more information about how these
settings work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
DMS supports the use of the CdcInsertsAndUpdates
parameter in versions
3.3.1 and later.
CdcInsertsOnly
and CdcInsertsAndUpdates
can't both be set to true
for the same endpoint. Set either CdcInsertsOnly
or
CdcInsertsAndUpdates
to true
for the same endpoint, but not both.
$sel:datePartitionSequence:S3Settings'
, s3Settings_datePartitionSequence
- Identifies the sequence of the date format to use during folder
partitioning. The default value is YYYYMMDD
. Use this parameter when
DatePartitionedEnabled
is set to true
.
$sel:rowGroupLength:S3Settings'
, s3Settings_rowGroupLength
- The number of rows in a row group. A smaller row group size provides
faster reads. But as the number of row groups grows, the slower writes
become. This parameter defaults to 10,000 rows. This number is used for
.parquet file format only.
If you choose a value larger than the maximum, RowGroupLength
is set
to the max row group length in bytes (64 * 1024 * 1024).
$sel:cdcPath:S3Settings'
, s3Settings_cdcPath
- Specifies the folder path of CDC files. For an S3 source, this setting
is required if a task captures change data; otherwise, it's optional.
If CdcPath
is set, DMS reads CDC files from this path and replicates
the data changes to the target endpoint. For an S3 target if you set
PreserveTransactions
to true
, DMS verifies that you have set this parameter to a folder
path on your S3 target where DMS can save the transaction order for the
CDC load. DMS creates this CDC folder path in either your S3 target
working directory or the S3 target location specified by
BucketFolder
and
BucketName
.
For example, if you specify CdcPath
as MyChangedData
, and you
specify BucketName
as MyTargetBucket
but do not specify
BucketFolder
, DMS creates the CDC folder path following:
MyTargetBucket/MyChangedData
.
If you specify the same CdcPath
, and you specify BucketName
as
MyTargetBucket
and BucketFolder
as MyTargetData
, DMS creates the
CDC folder path following:
MyTargetBucket/MyTargetData/MyChangedData
.
For more information on CDC including transaction order on an S3 target, see Capturing data changes (CDC) including transaction order on the S3 target.
This setting is supported in DMS versions 3.4.2 and later.
s3Settings_parquetVersion :: Lens' S3Settings (Maybe ParquetVersionValue) Source #
The version of the Apache Parquet format that you want to use:
parquet_1_0
(the default) or parquet_2_0
.
s3Settings_preserveTransactions :: Lens' S3Settings (Maybe Bool) Source #
If set to true
, DMS saves the transaction order for a change data
capture (CDC) load on the Amazon S3 target specified by
CdcPath
. For more information, see
Capturing data changes (CDC) including transaction order on the S3 target.
This setting is supported in DMS versions 3.4.2 and later.
s3Settings_maxFileSize :: Lens' S3Settings (Maybe Int) Source #
A value that specifies the maximum size (in KB) of any .csv file to be created while migrating to an S3 target during full load.
The default value is 1,048,576 KB (1 GB). Valid values include 1 to 1,048,576.
s3Settings_csvNoSupValue :: Lens' S3Settings (Maybe Text) Source #
This setting only applies if your Amazon S3 output files during a change
data capture (CDC) load are written in .csv format. If
UseCsvNoSupValue
is set to true, specify a string value that you want DMS to use for all
columns not included in the supplemental log. If you do not specify a
string value, DMS uses the null value for these columns regardless of
the UseCsvNoSupValue
setting.
This setting is supported in DMS versions 3.4.1 and later.
s3Settings_rfc4180 :: Lens' S3Settings (Maybe Bool) Source #
For an S3 source, when this value is set to true
or y
, each leading
double quotation mark has to be followed by an ending double quotation
mark. This formatting complies with RFC 4180. When this value is set to
false
or n
, string literals are copied to the target as is. In this
case, a delimiter (row or column) signals the end of the field. Thus,
you can't use a delimiter as part of the string, because it signals the
end of the value.
For an S3 target, an optional parameter used to set behavior to comply
with RFC 4180 for data migrated to Amazon S3 using .csv file format
only. When this value is set to true
or y
using Amazon S3 as a
target, if the data has quotation marks or newline characters in it, DMS
encloses the entire column with an additional pair of double quotation
marks ("). Every quotation mark within the data is repeated twice.
The default value is true
. Valid values include true
, false
, y
,
and n
.
s3Settings_parquetTimestampInMillisecond :: Lens' S3Settings (Maybe Bool) Source #
A value that specifies the precision of any TIMESTAMP
column values
that are written to an Amazon S3 object file in .parquet format.
DMS supports the ParquetTimestampInMillisecond
parameter in versions
3.1.4 and later.
When ParquetTimestampInMillisecond
is set to true
or y
, DMS writes
all TIMESTAMP
columns in a .parquet formatted file with millisecond
precision. Otherwise, DMS writes them with microsecond precision.
Currently, Amazon Athena and Glue can handle only millisecond precision
for TIMESTAMP
values. Set this parameter to true
for S3 endpoint
object files that are .parquet formatted only if you plan to query or
process the data with Athena or Glue.
DMS writes any TIMESTAMP
column values written to an S3 file in .csv
format with microsecond precision.
Setting ParquetTimestampInMillisecond
has no effect on the string
format of the timestamp column value that is inserted by setting the
TimestampColumnName
parameter.
s3Settings_includeOpForFullLoad :: Lens' S3Settings (Maybe Bool) Source #
A value that enables a full load to write INSERT operations to the comma-separated value (.csv) output files only to indicate how the rows were added to the source database.
DMS supports the IncludeOpForFullLoad
parameter in versions 3.1.4 and
later.
For full load, records can only be inserted. By default (the false
setting), no information is recorded in these output files for a full
load to indicate that the rows were inserted at the source database. If
IncludeOpForFullLoad
is set to true
or y
, the INSERT is recorded
as an I annotation in the first field of the .csv file. This allows the
format of your target records from a full load to be consistent with the
target records from a CDC load.
This setting works together with the CdcInsertsOnly
and the
CdcInsertsAndUpdates
parameters for output to .csv files only. For
more information about how these settings work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
s3Settings_cdcMinFileSize :: Lens' S3Settings (Maybe Int) Source #
Minimum file size, defined in megabytes, to reach for a file output to Amazon S3.
When CdcMinFileSize
and CdcMaxBatchInterval
are both specified, the
file write is triggered by whichever parameter condition is met first
within an DMS CloudFormation template.
The default value is 32 MB.
s3Settings_csvDelimiter :: Lens' S3Settings (Maybe Text) Source #
The delimiter used to separate columns in the .csv file for both source and target. The default is a comma.
s3Settings_serviceAccessRoleArn :: Lens' S3Settings (Maybe Text) Source #
The Amazon Resource Name (ARN) used by the service to access the IAM
role. The role must allow the iam:PassRole
action. It is a required
parameter that enables DMS to write and read objects from an S3 bucket.
s3Settings_bucketFolder :: Lens' S3Settings (Maybe Text) Source #
An optional parameter to set a folder name in the S3 bucket. If
provided, tables are created in the path
bucketFolder/schema_name/table_name/
. If this parameter isn't
specified, then the path used is schema_name/table_name/
.
s3Settings_dataFormat :: Lens' S3Settings (Maybe DataFormatValue) Source #
The format of the data that you want to use for output. You can choose one of the following:
csv
: This is a row-based file format with comma-separated values (.csv).parquet
: Apache Parquet (.parquet) is a columnar storage file format that features efficient compression and provides faster query response.
s3Settings_datePartitionEnabled :: Lens' S3Settings (Maybe Bool) Source #
When set to true
, this parameter partitions S3 bucket folders based on
transaction commit dates. The default value is false
. For more
information about date-based folder partitioning, see
Using date-based folder partitioning.
s3Settings_encodingType :: Lens' S3Settings (Maybe EncodingTypeValue) Source #
The type of encoding you are using:
RLE_DICTIONARY
uses a combination of bit-packing and run-length encoding to store repeated values more efficiently. This is the default.PLAIN
doesn't use encoding at all. Values are stored as they are.PLAIN_DICTIONARY
builds a dictionary of the values encountered in a given column. The dictionary is stored in a dictionary page for each column chunk.
s3Settings_cdcMaxBatchInterval :: Lens' S3Settings (Maybe Int) Source #
Maximum length of the interval, defined in seconds, after which to output a file to Amazon S3.
When CdcMaxBatchInterval
and CdcMinFileSize
are both specified, the
file write is triggered by whichever parameter condition is met first
within an DMS CloudFormation template.
The default value is 60 seconds.
s3Settings_ignoreHeaderRows :: Lens' S3Settings (Maybe Int) Source #
When this value is set to 1, DMS ignores the first row header in a .csv file. A value of 1 turns on the feature; a value of 0 turns off the feature.
The default is 0.
s3Settings_externalTableDefinition :: Lens' S3Settings (Maybe Text) Source #
Specifies how tables are defined in the S3 source files only.
s3Settings_dictPageSizeLimit :: Lens' S3Settings (Maybe Int) Source #
The maximum size of an encoded dictionary page of a column. If the
dictionary page exceeds this, this column is stored using an encoding
type of PLAIN
. This parameter defaults to 1024 * 1024 bytes (1 MiB),
the maximum size of a dictionary page before it reverts to PLAIN
encoding. This size is used for .parquet file format only.
s3Settings_bucketName :: Lens' S3Settings (Maybe Text) Source #
The name of the S3 bucket.
s3Settings_encryptionMode :: Lens' S3Settings (Maybe EncryptionModeValue) Source #
The type of server-side encryption that you want to use for your data.
This encryption type is part of the endpoint settings or the extra
connections attributes for Amazon S3. You can choose either SSE_S3
(the default) or SSE_KMS
.
For the ModifyEndpoint
operation, you can change the existing value of
the EncryptionMode
parameter from SSE_KMS
to SSE_S3
. But you can’t
change the existing value from SSE_S3
to SSE_KMS
.
To use SSE_S3
, you need an Identity and Access Management (IAM) role
with permission to allow "arn:aws:s3:::dms-*"
to use the following
actions:
s3:CreateBucket
s3:ListBucket
s3:DeleteBucket
s3:GetBucketLocation
s3:GetObject
s3:PutObject
s3:DeleteObject
s3:GetObjectVersion
s3:GetBucketPolicy
s3:PutBucketPolicy
s3:DeleteBucketPolicy
s3Settings_enableStatistics :: Lens' S3Settings (Maybe Bool) Source #
A value that enables statistics for Parquet pages and row groups. Choose
true
to enable statistics, false
to disable. Statistics include
NULL
, DISTINCT
, MAX
, and MIN
values. This parameter defaults to
true
. This value is used for .parquet file format only.
s3Settings_cdcInsertsOnly :: Lens' S3Settings (Maybe Bool) Source #
A value that enables a change data capture (CDC) load to write only
INSERT operations to .csv or columnar storage (.parquet) output files.
By default (the false
setting), the first field in a .csv or .parquet
record contains the letter I (INSERT), U (UPDATE), or D (DELETE). These
values indicate whether the row was inserted, updated, or deleted at the
source database for a CDC load to the target.
If CdcInsertsOnly
is set to true
or y
, only INSERTs from the
source database are migrated to the .csv or .parquet file. For .csv
format only, how these INSERTs are recorded depends on the value of
IncludeOpForFullLoad
. If IncludeOpForFullLoad
is set to true
, the
first field of every CDC record is set to I to indicate the INSERT
operation at the source. If IncludeOpForFullLoad
is set to false
,
every CDC record is written without a first field to indicate the INSERT
operation at the source. For more information about how these settings
work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
DMS supports the interaction described preceding between the
CdcInsertsOnly
and IncludeOpForFullLoad
parameters in versions 3.1.4
and later.
CdcInsertsOnly
and CdcInsertsAndUpdates
can't both be set to true
for the same endpoint. Set either CdcInsertsOnly
or
CdcInsertsAndUpdates
to true
for the same endpoint, but not both.
s3Settings_timestampColumnName :: Lens' S3Settings (Maybe Text) Source #
A value that when nonblank causes DMS to add a column with timestamp information to the endpoint data for an Amazon S3 target.
DMS supports the TimestampColumnName
parameter in versions 3.1.4 and
later.
DMS includes an additional STRING
column in the .csv or .parquet
object files of your migrated data when you set TimestampColumnName
to
a nonblank value.
For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.
For a change data capture (CDC) load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.
The string format for this timestamp column value is
yyyy-MM-dd HH:mm:ss.SSSSSS
. By default, the precision of this value is
in microseconds. For a CDC load, the rounding of the precision depends
on the commit timestamp supported by DMS for the source database.
When the AddColumnName
parameter is set to true
, DMS also includes a
name for the timestamp column that you set with TimestampColumnName
.
s3Settings_csvRowDelimiter :: Lens' S3Settings (Maybe Text) Source #
The delimiter used to separate rows in the .csv file for both source and
target. The default is a carriage return (\n
).
s3Settings_datePartitionDelimiter :: Lens' S3Settings (Maybe DatePartitionDelimiterValue) Source #
Specifies a date separating delimiter to use during folder partitioning.
The default value is SLASH
. Use this parameter when
DatePartitionedEnabled
is set to true
.
s3Settings_addColumnName :: Lens' S3Settings (Maybe Bool) Source #
An optional parameter that, when set to true
or y
, you can use to
add column name information to the .csv output file.
The default value is false
. Valid values are true
, false
, y
, and
n
.
s3Settings_cannedAclForObjects :: Lens' S3Settings (Maybe CannedAclForObjectsValue) Source #
A value that enables DMS to specify a predefined (canned) access control list for objects created in an Amazon S3 bucket as .csv or .parquet files. For more information about Amazon S3 canned ACLs, see Canned ACL in the Amazon S3 Developer Guide.
The default value is NONE. Valid values include NONE, PRIVATE, PUBLIC_READ, PUBLIC_READ_WRITE, AUTHENTICATED_READ, AWS_EXEC_READ, BUCKET_OWNER_READ, and BUCKET_OWNER_FULL_CONTROL.
s3Settings_compressionType :: Lens' S3Settings (Maybe CompressionTypeValue) Source #
An optional parameter to use GZIP to compress the target files. Set to GZIP to compress the target files. Either set this parameter to NONE (the default) or don't use it to leave the files uncompressed. This parameter applies to both .csv and .parquet file formats.
s3Settings_csvNullValue :: Lens' S3Settings (Maybe Text) Source #
An optional parameter that specifies how DMS treats null values. While
handling the null value, you can use this parameter to pass a
user-defined string as null when writing to the target. For example,
when target columns are not nullable, you can use this option to
differentiate between the empty string value and the null value. So, if
you set this parameter value to the empty string ("" or ''), DMS
treats the empty string as the null value instead of NULL
.
The default value is NULL
. Valid values include any valid string.
s3Settings_serverSideEncryptionKmsKeyId :: Lens' S3Settings (Maybe Text) Source #
If you are using SSE_KMS
for the EncryptionMode
, provide the KMS key
ID. The key that you use needs an attached policy that enables Identity
and Access Management (IAM) user permissions and allows use of the key.
Here is a CLI example:
aws dms create-endpoint --endpoint-identifier value --endpoint-type target --engine-name s3 --s3-settings ServiceAccessRoleArn=value,BucketFolder=value,BucketName=value,EncryptionMode=SSE_KMS,ServerSideEncryptionKmsKeyId=value
s3Settings_dataPageSize :: Lens' S3Settings (Maybe Int) Source #
The size of one data page in bytes. This parameter defaults to 1024 * 1024 bytes (1 MiB). This number is used for .parquet file format only.
s3Settings_useCsvNoSupValue :: Lens' S3Settings (Maybe Bool) Source #
This setting applies if the S3 output files during a change data capture
(CDC) load are written in .csv format. If set to true
for columns not
included in the supplemental log, DMS uses the value specified by
CsvNoSupValue
. If not set or set to false
, DMS uses the null value for these
columns.
This setting is supported in DMS versions 3.4.1 and later.
s3Settings_cdcInsertsAndUpdates :: Lens' S3Settings (Maybe Bool) Source #
A value that enables a change data capture (CDC) load to write INSERT
and UPDATE operations to .csv or .parquet (columnar storage) output
files. The default setting is false
, but when CdcInsertsAndUpdates
is set to true
or y
, only INSERTs and UPDATEs from the source
database are migrated to the .csv or .parquet file.
For .csv file format only, how these INSERTs and UPDATEs are recorded
depends on the value of the IncludeOpForFullLoad
parameter. If
IncludeOpForFullLoad
is set to true
, the first field of every CDC
record is set to either I
or U
to indicate INSERT and UPDATE
operations at the source. But if IncludeOpForFullLoad
is set to
false
, CDC records are written without an indication of INSERT or
UPDATE operations at the source. For more information about how these
settings work together, see
Indicating Source DB Operations in Migrated S3 Data
in the Database Migration Service User Guide..
DMS supports the use of the CdcInsertsAndUpdates
parameter in versions
3.3.1 and later.
CdcInsertsOnly
and CdcInsertsAndUpdates
can't both be set to true
for the same endpoint. Set either CdcInsertsOnly
or
CdcInsertsAndUpdates
to true
for the same endpoint, but not both.
s3Settings_datePartitionSequence :: Lens' S3Settings (Maybe DatePartitionSequenceValue) Source #
Identifies the sequence of the date format to use during folder
partitioning. The default value is YYYYMMDD
. Use this parameter when
DatePartitionedEnabled
is set to true
.
s3Settings_rowGroupLength :: Lens' S3Settings (Maybe Int) Source #
The number of rows in a row group. A smaller row group size provides faster reads. But as the number of row groups grows, the slower writes become. This parameter defaults to 10,000 rows. This number is used for .parquet file format only.
If you choose a value larger than the maximum, RowGroupLength
is set
to the max row group length in bytes (64 * 1024 * 1024).
s3Settings_cdcPath :: Lens' S3Settings (Maybe Text) Source #
Specifies the folder path of CDC files. For an S3 source, this setting
is required if a task captures change data; otherwise, it's optional.
If CdcPath
is set, DMS reads CDC files from this path and replicates
the data changes to the target endpoint. For an S3 target if you set
PreserveTransactions
to true
, DMS verifies that you have set this parameter to a folder
path on your S3 target where DMS can save the transaction order for the
CDC load. DMS creates this CDC folder path in either your S3 target
working directory or the S3 target location specified by
BucketFolder
and
BucketName
.
For example, if you specify CdcPath
as MyChangedData
, and you
specify BucketName
as MyTargetBucket
but do not specify
BucketFolder
, DMS creates the CDC folder path following:
MyTargetBucket/MyChangedData
.
If you specify the same CdcPath
, and you specify BucketName
as
MyTargetBucket
and BucketFolder
as MyTargetData
, DMS creates the
CDC folder path following:
MyTargetBucket/MyTargetData/MyChangedData
.
For more information on CDC including transaction order on an S3 target, see Capturing data changes (CDC) including transaction order on the S3 target.
This setting is supported in DMS versions 3.4.2 and later.