S3 Tables

AWS S3 tables provide tabular data storage on top of S3, allowing applications to treat S3 like a SQL data store. It does this by writing parquet files to S3 and updating the catalog to inform S3 tables what the schema of the files is.

Connection Settings

Authentication Type

Setting Description
Token Enter an IAM Access and Secret Key that have permissions to write to S3. See below section on IAM permission best practices.
Assume EC2 IAM Role If running on an EC2 instance with an IAM role attached, automatically assumes that role. No credentials are required. See below section on IAM permission best practices.

Region

Region of the Amazon Kinesis Data Streams instance (e.g. us-east-1)

Input Settings

Inputs are not currently supported although the Athena JDBC driver can be used with the JDBC connection to read from S3 Tables

Output Settings

Table Bucket ARN

The full ARN of the table bucket to write to. For example: arn:aws:s3tables:us-west-2:000000000000:bucket/test

Namespace

The namespace to write to in the table bucket. Table buckets can have multiple namespaces but a table has a single namespace.

Table

The name of the table to write to. Table names must follow S3 naming restrictions (ex. no upper case).

Create

When Off is selected, the table and namespace must already exist. When Create is selected, the table and namespace are created if they don’t exist. When Create & Update is selected the table and namespace are created if they don’t exist, and the table schema are updated if new attributes appear in the write.

Output Examples

The S3 Tables connection writes out tabular data and matches the data to the table schema by name.

For example, assume the payload is as follows. Assume the table already exists, and has two columns “col1” and “col2”.

{
  "col1": 1.23,
  "col2": "hello world",
  "col3": null
}

The above write will match col1 and col2 by name, and insert a single row into the table [1.23, “hello world”]. Following the example, assume the following payload is written to the same table, but Create & Update is enabled.

[
  {
    "col1": 1.23,
    "col2": "hello world"
  },
  {
    "col3": false
  }
]

This will add a “col3” to the table with a data type of boolean, and then insert the first row as [1.23, “hello world”, null] and the second row as [null, null, false].

AWS IAM Best Practices

Please see AWS documentation on IAM best practices. HighByte strongly recommends following the policy of least privilege when granting the IAM role for the connection.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-iam.html https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html

It is also recommended that users occasionally rotate new IAM credentials and manually update the Intelligence Hub configuration with the new credentials.

The following IAM permissions are used by the S3 Tables Connection.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3tables:*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "PassRoleToS3TablesReplication",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [
                        "replication.s3tables.amazonaws.com"
                    ]
                }
            }
        }
    ]
}