Snowflake SQL

The Snowflake SQL connection connects to Snowflake using JDBC driver. The connection supports reading data from Snowflake using SQL queries.

Connection Settings

Account Identifier

The identifier for the Snowflake account, which can be found when logging into the Snowflake portal. See the Snowflake documentation for more details. An example format is ZHX29753.

https://docs.snowflake.com/en/user-guide/admin-account-identifier

User

The user account in Snowflake to authenticate with.

Role

The role of this user account. The role must have the ability to create tables if create is used in the Output settings.

Private Key

The private key used to authenticate the user. See the Snowflake documentation for generating private key for an account. The private key must be imported into the Intelligence Hub certificate store (See Certificates) and referenced here.

https://docs.snowflake.com/en/user-guide/key-pair-auth

Database

The Snowflake database to write to.

Schema

The default schema to use. Defaults to Public.

Warehouse

The warehouse to use for any processing of data.

Input Settings

See JDBC Driver Settings for input settings.

Errors with JVM Version

Depending on your JVM, you may see an error similar to the following when reading a Snowflake SQL input.

json
JDBC driver internal error: exception creating result java.lang.NoClassDefFoundError: Could not initialize class net.snowflake.client.jdbc.internal.apache.arrow.memory.RootAllocator...

This is caused by an issue with Arrow, an open source Apache project used by the Snowflake JDBC driver to provide a more efficient binary format for exchanging data with Snowflake. This issue exists on JVM versions 16 and 17. To get around this issue there are a few options.

  1. Upgrade your JVM to the latest v21
  2. Add a JVM_OPTION via the command line or environments variable to find/load the dependency
    1. Command line: --add-opens=java.base/java.nio=ALL-UNNAMED
    2. ENV variable: _JAVA_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED"
  3. Once the connection is made in HighByte, run the following SQL query on the input. Note this needs to be run any time the connection reconnects. This command skips the Arrow binary format and exchanges data in JSON.
    1. ALTER SESSION SET JDBC_QUERY_RESULT_FORMAT='JSON'

Output Settings

Table

The name of the table to write to.

Write Type

Writes to the table can be Inserts, Updates, or Upserts. Inserts add the output data as new rows to the table. Updates update existing rows that have a matching value of the attribute referenced in the Where Column. If there are no matches Update does nothing. Upsert performs an Update if there are matching rows, and if there are no matching rows it performs an Insert.

Where Clause

Specify the body of the where clause used to filter updated records. This setting is used if the Write Type is set to Update or Upsert. As an example, assume the table being to contains a column ‘batchId’. To only update rows that contain a ‘batchId’ greater than 15, the Where Clause would be set to “batchId > 15”. To specify multiple columns and conditions users can use conditional logic such as “AND” and “OR”. This field supports dynamic outputs.

Log as JSON

Specifies whether modeled values should be serialized as a single column JSON blob versus expanding to multiple columns. When enabled, the column name is ‘json’.

Create Table

Controls how the table definition is managed. Options are Off, Create, and Create.

  • When Off is used, it’s assumed that the table schema is defined and managed externally. The table must exist. See Table Cache Interval for details on how to respond to table schema changes.
  • When Create is used, it’s assumed the connection manages the table and it’s created once. If the table doesn’t exist it’s created using the schema of the payload.
  • When Create & Update is used, it’s assumed the connection manages the table and updates the table on schema change. For example, if a new attribute is added to the data, the column is added to the table.

When creating a table the data from the write is used to control the table schema. All tables are created with an auto incrementing _id column used as the primary key. Snowflake uses a sequence to generate the values for an auto-incremented column. Sequences have limitations; see Sequence Semantics for more details.

Tables will also include _name, _model, and _timestamp columns unless the output filters these out. Below is a mapping of write data types to SQL data types. Note strings are limited in size to 2048 characters. This must be changed by altering the table after creating if larger strings are required.

Write Data Type SQL Data Type
String VARCHAR(2048)
DateTime DATETIME
UInt8 NUMBER
Int8 NUMBER
UInt16 NUMBER
Int16 NUMBER
UInt32 NUMBER
Int32 NUMBER
UInt64 NUMBER
Int64 NUMBER
Real32 FLOAT
Real64 FLOAT

Table Cache Interval

This setting is useful for scenarios where there have been external modification to the table schema, and those updates should be reflected in the internally cached table schema. This setting is only available when Create Table is set to Off. Disable this refresh by setting the time interval to 0.

Case Sensitive

By default, Snowflake is case insensitive, and all tables and columns are upper case. This is the recommended approach. When writing to a table that is case sensitive, the table name and columns must be wrapped in quotes. Enabling this option wraps all names in quotes.