The Problem: RPC Downtime
No RPC endpoint has 100% uptime. When your primary node goes down, blocks pass unmonitored. Paychainly solves this with two complementary systems: live failover and startup backfill.
Multi-RPC Failover
The rcps table holds a pool of RPC endpoints ordered by priority. When a request fails or returns a 429 rate-limit response, RpcFailoverService immediately switches to the next endpoint. The failed endpoint enters a 30-second cooldown before rejoining the pool.
The listenerLastBlock Checkpoint
Every 10 blocks (configurable via LISTENER_CHECKPOINT_INTERVAL), the current block number is saved to network_configs.listenerLastBlock. On restart, this value determines where to resume.
Startup Backfill
On boot, BlockPipelineBootstrapService computes the gap:
gap = currentSafeBlock - lastCheckpointBlock
// if gap > LISTENER_MAX_BACKFILL_BLOCKS (50000): alert + partial backfill
// else: full backfill in LISTENER_BACKFILL_CHUNK_BLOCKS (10) chunks
Backfill jobs run at BullMQ priority 10 (lower than live blocks at priority 1), so live payments are never delayed.
Hourly Gap Detector
Even mid-session, the hourly GapDetectorService runs a generate_series SQL query against the block_audit table to find any missing block ranges, then enqueues backfill jobs for them.
Idempotent Processing
Backfill may re-process blocks already seen. The unique txHash constraint on the transactions table ensures every payment is credited exactly once — no double-webhooks.