High Availability
Overview
High Availability (HA) mode allows the Intelligence Hub to run across multiple nodes, ensuring continuous operation and failover. Only one node is active at a time; standby nodes are ready to take over if the primary node fails. This is managed through heartbeats and file synchronization.
All HA coordination is managed through a PostgreSQL database. Nodes do not communicate directly with each other; instead, each node interacts independently with the PostgreSQL server for heartbeat management, leadership election, and data synchronization.
- Heartbeats: Each node periodically signals its status to the PostgreSQL database. If the primary node stops sending heartbeats, a standby node automatically takes over.
- File Synchronization: Configuration files in appData, which includes Connection and Pipeline State Extension caches, are regularly synced through the PostgreSQL database to ensure a standby node can seamlessly assume the primary role.
Setup
To setup a primary and secondary hub, use the following steps. More information on the command line arguments is provided in the sections below.
- Install or have available a PostgreSQL server
- Install the primary hub
- Run the create command on the primary hub to initialize the database tables used for synchronization. Note replace node1 with a unique name for the primary. Replace the JDBC URI with your SQL connection settings.
java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain create -n node1Primary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"
- Install the secondary hub
- Copy the
intelligencehub-certificatestore.pksc12file from the appData of the primary hub to the appData of the secondary hub. While the certificates and private keys stored in this file are synchronized across HA instances, the encryption mechanisms used for encrypting and decrypting secrets remain on the local instance. Copying the pkcs12 file ensures the secondary hub can properly decrypt secrets needed for configuration. - Launch the primary hub with the start command with the same nodeid used in create . This starts the hub using the database configuration
java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node1Primary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"
- Launch the secondary hub with the start command using a new unique nodeid for the secondary
java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node2Secondary -j "jdbc:postgresql://localhost:5432/dbName?user=username&password=password"
In the above setup the primary hub takes precedence, meaning if it’s online and running it takes control. The secondary only takes control when the primary is down.
Command Line Usage
Below are details on the command line arguments to configure and start HA. The primary commands are a follows:
create: Initializes the failover service, syncs initial configuration and state files, and sets the node ID as the preferred primary node.- The preferred primary node is always selected as primary when available.
start: Starts a node in HA mode, joining the failover group.help: Displays usage information.
The basic command syntax looks as follows, with the arguments detailed below. It’s best to run these commands from the runtime directory.
java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain [options] <create|start|help>
| Option | Description | Example Value |
|---|---|---|
-j, --jdbcURI |
JDBC URI for the node (can use environment variables) | jdbc:postgresql://localhost:5432/dbName?user=username&password=passwordenv:URI |
-n, --nodeId |
Unique node ID for the HA instance | |
-e, --heartbeatExpiration |
Set the Heartbeat expiration period. Defaults to 10s. Minimum value of 2. create only |
-e 30env:HBE |
Note that environment variables can be used for the database configuration commands to avoid passing sensitive information on the command line. Below is an example of using an environment variable.
java -cp "intelligencehub-runtime.jar;lib/*" com.highbyte.intelligencehub.runtime.HAMain start -n node1 -j env:URI
Docker
When running the Intelligence Hub in Docker, High Availability can be configured using environment variables instead of command line arguments. Set the following variables in your container configuration:
| Environment Variable | Description | Example Value |
|---|---|---|
HA_NODE_ID |
Unique node ID for this HA instance | node1Primary |
HA_POSTGRES_URI |
JDBC URI for the PostgreSQL database | jdbc:postgresql://localhost:5432/dbName?user=username&password=password |
START_MODE |
Controls how the node joins the HA cluster. See values below. | HA_INIT |
The START_MODE variable accepts the following values:
HA_INIT— Initializes a new HA cluster and starts the node as the primary. Use this for the first node when standing up a new cluster.HA_JOIN— Joins an existing HA cluster as a standby node. Use this for all subsequent nodes.
Failover Behavior
Heartbeat and Lease Management
The HA system uses a heartbeat mechanism to monitor node health and manage leadership:
- Heartbeat interval: Every 5 seconds. If a manual Heartbeat Expiration Period is set, the interval is half that value, rounded down to the nearest second.
- Heartbeat expiration period: 10 seconds
- Typical failover window: 5–15 seconds by default. With a custom Heartbeat Expiration Period, the window ranges from the heartbeat interval to the heartbeat interval plus the expiration period.
- Best case: ~heartbeat interval (standby detects failure on its next heartbeat)
- Worst case: ~heartbeat interval + expiration period (standby waits for next heartbeat after primary heartbeat expires)
The primary node renews its heartbeat in the PostgreSQL database at the heartbeat interval (5 seconds by default). If a primary node fails to check in, the heartbeat expires after the configured Heartbeat Expiration Period (10 seconds by default). Standby nodes also send heartbeats at the same interval, and the first standby to send a heartbeat after the primary’s heartbeat expires is elected as the new primary by the database.
File Synchronization
The following files are maintained and synchronized across all nodes in the HA cluster:
Configuration Files:
intelligencehub-configuration.json- Project configurationintelligencehub-settings.json- System settingsintelligencehub-systemvariables.json- System variablesintelligencehub-remoteconfig.json- Remote configuration settingsintelligencehub-users.json- User accounts, roles, and API keysintelligencehub-secrets.json- Encrypted secretsintelligencehub-certificatestore.pkcs12- Only certificates and private keys are maintained. The mechanisms used for encrypting and decrypting secrets will remain on the local instance. To decrypt keys across multiple HA instances, copy the pkcs12 file of the primary to each instance.intelligencehub-identityproviders.json- Identity provider configurationsintelligencehub.license- License file
Data Stores:
intelligencehub-cache.db- Cache storeintelligencehub-state.db- State store
Configuration and data stores are synchronized between nodes to ensure a standby can seamlessly take over:
- Primary mode sync interval: Half the Heartbeat Expiration Period, rounded down to the nearest second (5 seconds by default). Pushes changes to the database.
- Secondary mode sync interval: Six times the Heartbeat Expiration Period (60 seconds by default). Pulls changes from the database.
- Sync operations:
- Deployment files are compared by hash before transfer. Files are synced when there is a change.
- Cache and state stores only push the most recent 10,000 values per sync attempt
- Data retention:
- Data from the Data Stores are retained on the PostgreSQL database for up to 24 hours
- Each transaction row is tracked relative to its upload time
- Transactions that haven’t been updated and are older than 24 hours are automatically deleted
- To miss data, more than 10,000 unique transactions per 5 seconds would need to occur in the cache or state stores
Promotion Process
When a Secondary Becomes Primary
- Pre-promotion sync: Secondary performs one final pull to ensure data consistency
- Runtime startup: Starts the main runtime with synchronized data
- Heartbeat confirmation: New primary renews heartbeat at the heartbeat interval
When a Primary Node Fails
- Heartbeat expiration: After the Heartbeat Expiration Period without renewal, the heartbeat expires
- Leadership acquisition: Next heartbeat from a standby (within one heartbeat interval) acquires leadership
- Automatic promotion: Standby node detects acquisition and promotes to primary
- Service continuity: Total downtime falls within the typical failover window
When the Preferred Primary Comes Back Online
The system supports setting a preferred primary node configuration:
- When the preferred primary node comes online, it will take over from the current primary
- This happens during the next heartbeat cycle (within one heartbeat interval)
- The transition follows the same promotion process (sync → promote → start)
Graceful Shutdown
Primary Node Shutdown
- Leadership release: Immediately expires the heartbeat (sets expiration to the Heartbeat Expiration Period in the past)
- Runtime stop: Cleanly shuts down the active runtime
- Connection cleanup: Closes HA database connection
- Fast failover: By expiring the heartbeat immediately, standby nodes can acquire leadership on their next heartbeat (within one heartbeat interval)
Database
Currently, only PostgreSQL is supported for the database.