High Availability

Overview

High Availability (HA) mode allows the Intelligence Hub to run across multiple nodes, ensuring continuous operation and failover. Only one node is active at a time; standby nodes are ready to take over if the primary node fails. This is managed through heartbeats and file synchronization.

All HA coordination is managed through a PostgreSQL database. Nodes do not communicate directly with each other; instead, each node interacts independently with the PostgreSQL server for heartbeat management, leadership election, and data synchronization.

Heartbeats: Each node periodically signals its status to the PostgreSQL database. If the primary node stops sending heartbeats, a standby node automatically takes over.
File Synchronization: Configuration files in appData, which includes Connection and Pipeline State Extension caches, are regularly synced through the PostgreSQL database to ensure a standby node can seamlessly assume the primary role.

Setup

To setup a primary and secondary hub, use the following steps. More information on the command line arguments is provided in the sections below.

Install or have available a PostgreSQL server
Install the primary hub
Run the create command on the primary hub to initialize the database tables used for synchronization. Note replace node1 with a unique name for the primary. Replace the JDBC URI with your SQL connection settings.

Note to run the examples below on Linux, replace the ; in the classpath (-cp) with a :

java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain create -n node1Primary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"

Install the secondary hub
Copy the intelligencehub-certificatestore.pksc12 file from the appData of the primary hub to the appData of the secondary hub. While the certificates and private keys stored in this file are synchronized across HA instances, the encryption mechanisms used for encrypting and decrypting secrets remain on the local instance. Copying the pkcs12 file ensures the secondary hub can properly decrypt secrets needed for configuration.
Launch the primary hub with the start command with the same nodeid used in create . This starts the hub using the database configuration

java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node1Primary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"

Launch the secondary hub with the start command using a new unique nodeid for the secondary

java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node2Secondary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"

In the above setup the primary hub takes precedence, meaning if it’s online and running it takes control. The secondary only takes control when the primary is down.

Command Line Usage

Below are details on the command line arguments to configure and start HA. The primary commands are a follows:

create: Initializes the failover service, syncs initial configuration and state files, and sets the node ID as the preferred primary node.
- The preferred primary node is always selected as primary when available.
start: Starts a node in HA mode, joining the failover group.
help: Displays usage information.

The basic command syntax looks as follows, with the arguments detailed below. It’s best to run these commands from the runtime directory.

java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain [options] <create|start|help>

Option	Description	Example Value
`-j`, `--jdbcURI`	JDBC URI for the node (can use environment variables)	`jdbc:postgresql://localhost:5432/dbName?user=username&password=password` `env:URI`
`-n`, `--nodeId`	Unique node ID for the HA instance
`-e`, `--heartbeatExpiration`	Set the Heartbeat expiration period. Defaults to 10s. Minimum value of 2. `create` only	`-e 30` `env:HBE`

Note that environment variables can be used for the database configuration commands to avoid passing sensitive information on the command line. Below is an example of using an environment variable.

java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node1 -j env:URI

Docker

When running the Intelligence Hub in Docker, High Availability can be configured using environment variables instead of command line arguments. Set the following variables in your container configuration:

Environment Variable	Description	Example Value
`HA_NODE_ID`	Unique node ID for this HA instance	`node1Primary`
`HA_POSTGRES_URI`	JDBC URI for the PostgreSQL database	`jdbc:postgresql://localhost:5432/dbName?user=username&password=password`
`START_MODE`	Controls how the node joins the HA cluster. See values below.	`HA_INIT`

The START_MODE variable accepts the following values:

HA_INIT — Initializes a new HA cluster and starts the node as the primary. Use this for the first node when standing up a new cluster.
HA_JOIN — Joins an existing HA cluster as a standby node. Use this for all subsequent nodes.

Failover Behavior

Heartbeat and Lease Management

The HA system uses a heartbeat mechanism to monitor node health and manage leadership:

Heartbeat interval: Every 5 seconds. If a manual Heartbeat Expiration Period is set, the interval is half that value, rounded down to the nearest second.
Heartbeat expiration period: 10 seconds
Typical failover window: 5–15 seconds by default. With a custom Heartbeat Expiration Period, the window ranges from the heartbeat interval to the heartbeat interval plus the expiration period.
- Best case: ~heartbeat interval (standby detects failure on its next heartbeat)
- Worst case: ~heartbeat interval + expiration period (standby waits for next heartbeat after primary heartbeat expires)

The primary node renews its heartbeat in the PostgreSQL database at the heartbeat interval (5 seconds by default). If a primary node fails to check in, the heartbeat expires after the configured Heartbeat Expiration Period (10 seconds by default). Standby nodes also send heartbeats at the same interval, and the first standby to send a heartbeat after the primary’s heartbeat expires is elected as the new primary by the database.

File Synchronization

The following files are maintained and synchronized across all nodes in the HA cluster:

Configuration Files:

intelligencehub-configuration.json - Project configuration
intelligencehub-settings.json - System settings
intelligencehub-systemvariables.json - System variables
intelligencehub-remoteconfig.json - Remote configuration settings
intelligencehub-users.json - User accounts, roles, and API keys
intelligencehub-secrets.json - Encrypted secrets
intelligencehub-certificatestore.pkcs12 - Only certificates and private keys are maintained. The mechanisms used for encrypting and decrypting secrets will remain on the local instance. To decrypt keys across multiple HA instances, copy the pkcs12 file of the primary to each instance.
intelligencehub-identityproviders.json - Identity provider configurations
intelligencehub.license - License file

Data Stores:

intelligencehub-cache.db - Cache store
intelligencehub-state.db - State store

Configuration and data stores are synchronized between nodes to ensure a standby can seamlessly take over:

Primary mode sync interval: Half the Heartbeat Expiration Period, rounded down to the nearest second (5 seconds by default). Pushes changes to the database.
Secondary mode sync interval: Six times the Heartbeat Expiration Period (60 seconds by default). Pulls changes from the database.
Sync operations:
- Deployment files are compared by hash before transfer. Files are synced when there is a change.
- Cache and state stores only push the most recent 10,000 values per sync attempt
Data retention:
- Data from the Data Stores are retained on the PostgreSQL database for up to 24 hours
- Each transaction row is tracked relative to its upload time
- Transactions that haven’t been updated and are older than 24 hours are automatically deleted
- To miss data, more than 10,000 unique transactions per 5 seconds would need to occur in the cache or state stores

Promotion Process

When a Secondary Becomes Primary

Pre-promotion sync: Secondary performs one final pull to ensure data consistency
Runtime startup: Starts the main runtime with synchronized data
Heartbeat confirmation: New primary renews heartbeat at the heartbeat interval

When a Primary Node Fails

Heartbeat expiration: After the Heartbeat Expiration Period without renewal, the heartbeat expires
Leadership acquisition: Next heartbeat from a standby (within one heartbeat interval) acquires leadership
Automatic promotion: Standby node detects acquisition and promotes to primary
Service continuity: Total downtime falls within the typical failover window

When the Preferred Primary Comes Back Online

The system supports setting a preferred primary node configuration:

When the preferred primary node comes online, it will take over from the current primary
This happens during the next heartbeat cycle (within one heartbeat interval)
The transition follows the same promotion process (sync → promote → start)

Graceful Shutdown

Primary Node Shutdown

Leadership release: Immediately expires the heartbeat (sets expiration to the Heartbeat Expiration Period in the past)
Runtime stop: Cleanly shuts down the active runtime
Connection cleanup: Closes HA database connection
Fast failover: By expiring the heartbeat immediately, standby nodes can acquire leadership on their next heartbeat (within one heartbeat interval)

Database

Currently, only PostgreSQL is supported for the database.