Snowflake SQL
The Snowflake SQL connection connects to Snowflake using JDBC driver. The connection supports reading data from Snowflake using SQL queries.
Connection Settings
Account Identifier
The identifier for the Snowflake account, which can be found when logging into the Snowflake portal. See the Snowflake documentation for more details. An example format is ZHX29753.
https://docs.snowflake.com/en/user-guide/admin-account-identifier
User
The user account in Snowflake to authenticate with.
Role
The role of this user account. The role must have the ability to create tables if create is used in the Output settings.
Private Key
The private key used to authenticate the user. See the Snowflake documentation for generating private key for an account. The private key must be imported into the Intelligence Hub certificate store (See Certificates) and referenced here.
https://docs.snowflake.com/en/user-guide/key-pair-auth
Database
The Snowflake database to write to.
Schema
The default schema to use. Defaults to Public
.
Warehouse
The warehouse to use for any processing of data.
Input Settings
See JDBC Driver Settings for input settings.
Errors with JVM Version
Depending on your JVM, you may see an error similar to the following when reading a Snowflake SQL input.
JDBC driver internal error: exception creating result java.lang.NoClassDefFoundError: Could not initialize class net.snowflake.client.jdbc.internal.apache.arrow.memory.RootAllocator...
This is caused by an issue with Arrow, an open source Apache project used by the Snowflake JDBC driver to provide a more efficient binary format for exchanging data with Snowflake. This issue exists on JVM versions 16 and 17. To get around this issue there are a few options.
- Upgrade your JVM to the latest v21
- Add a JVM_OPTION via the command line or environments variable to find/load the dependency
- Command line:
--add-opens=java.base/java.nio=ALL-UNNAMED
- ENV variable:
_JAVA_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED"
- Command line:
- Once the connection is made in HighByte, run the following SQL query on the input. Note this needs to be run any time the connection reconnects. This command skips the Arrow binary
format and exchanges data in JSON.
ALTER SESSION SET JDBC_QUERY_RESULT_FORMAT='JSON'
Output Settings
Table
The name of the table to write to.
Write Type
Writes to the table can be Inserts, Updates, or Upserts. Inserts add the output data as new rows to the table. Updates update existing rows that have a matching value of the attribute referenced in the Where Column. If there are no matches Update does nothing. Upsert performs an Update if there are matching rows, and if there are no matching rows it performs an Insert.
Where Clause
Specify the body of the where clause used to filter updated records. This setting is used if the Write Type is set to Update or Upsert. As an example, assume the table being to contains a column ‘batchId’. To only update rows that contain a ‘batchId’ greater than 15, the Where Clause would be set to “batchId > 15”. To specify multiple columns and conditions users can use conditional logic such as “AND” and “OR”. This field supports dynamic outputs.
Log as JSON
Specifies whether modeled values should be serialized as a single column JSON blob versus expanding to multiple columns. When enabled, the column name is ‘json’.
Create Table
Controls how the table definition is managed. Options are Off, Create, and Create.
- When Off is used, it’s assumed that the table schema is defined and managed externally. The table must exist. See Table Cache Interval for details on how to respond to table schema changes.
- When Create is used, it’s assumed the connection manages the table and it’s created once. If the table doesn’t exist it’s created using the schema of the payload.
- When Create & Update is used, it’s assumed the connection manages the table and updates the table on schema change. For example, if a new attribute is added to the data, the column is added to the table.
When creating a table the data from the write is used to control the table schema. All tables are created with an auto incrementing _id column used as the primary key. Snowflake uses a sequence to generate the values for an auto-incremented column. Sequences have limitations; see Sequence Semantics for more details.
Tables will also include _name, _model, and _timestamp columns unless the output filters these out. Below is a mapping of write data types to SQL data types. Note strings are limited in size to 2048 characters. This must be changed by altering the table after creating if larger strings are required.
Write Data Type | SQL Data Type |
---|---|
String | VARCHAR(2048) |
DateTime | DATETIME |
UInt8 | NUMBER |
Int8 | NUMBER |
UInt16 | NUMBER |
Int16 | NUMBER |
UInt32 | NUMBER |
Int32 | NUMBER |
UInt64 | NUMBER |
Int64 | NUMBER |
Real32 | FLOAT |
Real64 | FLOAT |
Table Cache Interval
This setting is useful for scenarios where there have been external modification to the table schema, and those updates should be reflected in the internally cached table schema. This setting is only available when Create Table is set to Off. Disable this refresh by setting the time interval to 0.
Case Sensitive
By default, Snowflake is case insensitive, and all tables and columns are upper case. This is the recommended approach. When writing to a table that is case sensitive, the table name and columns must be wrapped in quotes. Enabling this option wraps all names in quotes.