S3

Connection Settings

Authentication Type

Setting Description
Token Enter an IAM Access and Secret Key that have permissions to write to S3. See below section on IAM permission best practices.
Assume EC2 IAM Role If running on an EC2 instance with an IAM role attached, automatically assumes that role. No credentials are required. See below section on IAM permission best practices.

Region

Region of the Amazon Kinesis Data Streams instance (e.g., us-east-1)

Endpoint Override

The URL for a custom S3-compatible service. Leave blank to use the default AWS S3 endpoint for the selected region. This can also be set using the AWS_ENDPOINT_URL_S3 environment variable.

Checksum Validation

Controls whether to validate the checksum of objects when interacting with the S3 endpoint. Only configurable when an Endpoint Override is specified.

Option Description
When Supported Validates the checksum when the endpoint supports it. This is the default.
When Required Only validates the checksum when the endpoint requires it.

Path Style

Controls the S3 path style. Only configurable when an Endpoint Override is specified.

Option Description
Virtual Hosted Bucket names are used as subdomains (e.g., bucket.s3.amazonaws.com). This is the default.
Path Bucket names are included as part of the URL path (e.g., s3.amazonaws.com/bucket).

Proxy Endpoint

The URL for a custom S3 proxy. Leave blank to route traffic directly to AWS S3.

Input Settings

Type

Specifies the type of input operation. List will return the a set of directories and files at a specified path. Read will retrieve the data associated with a specific file.

List

Bucket Name

The name of the bucket to read data from.

Key

The name of the file to read.

Recursive

Specifies the set of files to list. When false, only immediate children of the directory are returned. When true, all files and directories under this path are included recursively.

Time Filter

Enables filtering files based on an ISO-8601 time. When enabled only returns files updated after the provided time.

Read

Bucket Name

The name of the bucket to read data from.

Key

The name of the file to read.

Note: Retrieving files stored in the S3 Glacier Flexible Retrieval storage class, the S3 Glacier Deep Archive storage class, the S3 Intelligent-Tiering Archive Access tier, or the S3 Intelligent-Tiering Deep Archive Access tier is currently not supported.

Encoding

The character set used to decode the contents of the file. Select Binary to read the file as binary and skip decoding. Select Auto to decode the file using the BOM of the file. Otherwise, use a specific encoding type to decode the file.

Include Metadata

Includes metadata specific to S3. Will include System-defined and User-defined object metadata.

Output Settings

Bucket Name

The name of the bucket to write output data to. This field supports dynamic output references.

Key

Specifies the default file name in S3. If left empty the file name is a GUID with a timestamp. This field supports dynamic output references to control the key depending on the output payload.

Source Type

Specifies where the file content comes from.

Option Description
Payload The traditional experience. File content comes from the pipeline payload.
Source File Path File content is streamed directly from a path on the machine running Intelligence Hub, avoiding loading the entire file into memory at the same time.

Payload Reference

When working with complex payloads, this setting uses dynamic outputs to specify the attribute that contains the file payload (e.g. {{this.filePayload}}). Available when Source Type is set to Payload.

Note: If left blank, the entire payload is written out.

Source File Path

The path to the file on the machine running Intelligence Hub. The file is streamed directly to S3, avoiding loading the entire file into memory at the same time. Available when Source Type is set to Source File Path.

Metadata

Sets system defined name-value pairs included with the value sent to the S3 bucket. For example, users can set the Content-Type of the payload to different types such as application/json or text/csv.

Storage Class

Specify the storage class of the payloads being written out to S3.

Prefix Path with UTC Time

When enabled a time prefix is added to the key to logically separate files in S3 as yyyy/MM/dd/HH/key.

AWS IAM Best Practices

Please see AWS documentation on IAM best practices. HighByte strongly recommends following the policy of least privilege when granting the IAM role for the connection.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-iam.html https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html

It is also recommended that users occasionally rotate new IAM credentials and manually update the Intelligence Hub configuration with the new credentials.

The following IAM permissions are used by the S3 Connection.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
                "s3-object-lambda:*"
            ],
            "Resource": "*"
        }
    ]
}