Field report

Migrating a Solana indexer off hosted RPC: a 10-day field report

A practical, opinionated guide to moving a real production Solana indexer from a hosted RPC provider to SolanaVault, with the timeline and the surprises.

2026-04-19 · SolanaVault Team

migrationindexercase study

We migrated an indexer this month. It processes about 18 million RPC calls a day against a hosted Solana provider, runs four worker pools, and feeds a Postgres + Redis hot path. The migration to SolanaVault took ten working days, end to end, including one weekend we did not work. This post is what we would tell the next team to do.

The indexer in question is for a partner who asked not to be named. The numbers are real. The workflow is generalizable.

Day 0: measuring the baseline

Before changing anything, we collected three things:

A 7-day distribution of RPC methods called. The shape was 62% getSignaturesForAddress, 24% getTransaction, 9% getConfirmedBlock, and 5% miscellaneous.
The p50, p95, and p99 latency for each method against the existing provider.
The monthly bill, broken out by request volume and any archive surcharges.

This baseline matters. Without it you cannot honestly evaluate whether the migration was worth it. We strongly recommend any team contemplating a similar change spend at least a day on this step.

Day 1: spinning up a Vault light client

The light client is a single Rust binary. We built it, deposited a small per-query balance, and pointed a non-production worker at the local JSON-RPC endpoint to confirm parity with the four most common methods.

cargo build --release
./target/release/vault-light-client start --balance 50000
export SOLANA_RPC_URL=http://localhost:8899
node -e 'require("@solana/web3.js").Connection(process.env.SOLANA_RPC_URL).getSlot().then(console.log)'

Latency on getSlot was 23ms p50 from us-east-1. On the existing hosted provider it was 18ms. Slightly slower at the tip, which was expected — the light client routes through whichever public gateway the DHT selects. We made a note and moved on.

Day 2: shadow-running historical reads

Because the indexer’s read mix is heavily historical, we set up a shadow worker that replayed the previous day’s read traffic against both endpoints and diffed the responses byte-for-byte.

Out of 4.2 million shadow requests over 24 hours, we found:

0 byte-level disagreements on getTransaction.
0 byte-level disagreements on getConfirmedBlock.
3 ordering differences on getSignaturesForAddress when the before cursor pointed exactly at a slot boundary — both providers were within spec, just paginating slightly differently.

The byte-identical historical reads were the proof we needed. The compression layer is lossless. The decompressed payload is bit-equal to what the hosted provider returns.

Day 3-4: latency tuning

The light client’s selected gateway was 5-15ms slower than the hosted provider on cold reads and 2-3ms slower on warm reads. For an indexer running batched fanout, the cumulative difference would have shown up in our SLA.

Three things fixed it:

We biased the gateway selector toward operators in us-east-2 rather than the round-robin default. -7ms p50.
We enabled HTTP keep-alive on the worker’s Solana client. Most clients do this by default; ours had been forced off years ago for unrelated reasons. -5ms p50.
We enabled the light client’s prefetch hint for sequential getSignaturesForAddress pagination. -3ms p50 on the second page and beyond.

After these tweaks the SolanaVault path was within 1-2ms of the hosted provider on all four hot methods. On getConfirmedBlock it was actually faster, which surprised us until we remembered that the compressed wire payload is a fifteenth the size and decompression runs in 13-85us per block.

Day 5: cost projection

With the latency story acceptable, we ran the cost math. The hosted provider invoice was about USD 4,200/month for the indexer’s load. The SolanaVault path charges only per-query gateway fees set by operators in SOL, with no archive surcharge and no flat platform tier.

At 18M queries/day, the indexer’s run rate was 540M queries/month. Even at the upper end of typical gateway pricing, the projected monthly cost landed in the low triple digits — a >90% reduction versus the hosted invoice. The compression-driven unit economics are simply different: a gateway serving 15-25x smaller blocks from its hot working set has fundamentally lower marginal cost per query.

Day 6: the surprise

The surprise was operational. We turned on the light client’s tracing output (it ships with tracing-subscriber and structured JSON logging out of the box) and immediately noticed that one of our worker pools was making redundant getSignaturesForAddress calls on the same address — a bug in our pagination logic that the hosted provider’s billing structure had been silently masking because the redundant calls fell into a flat-rate bucket.

Fixing the bug cut another 18% off our query volume. The migration paid for the rewrite of the pagination code by itself.

Day 7-8: production cutover

We did a percentage-based traffic shift: 5%, 25%, 50%, 100% over four hours. The 5% canary ran for two hours before we promoted. The full cutover took 11 minutes once we committed.

We left the hosted provider as a hot standby for one week with a circuit breaker that would shift back on a 5% error rate spike. The breaker never tripped.

Day 9: cleanup

We:

Deleted the hosted provider’s API key.
Updated the runbook with the new endpoint URL.
Added a vault_* metric prefix to our Prometheus scrape so the new gateway shows up alongside the old hosted-provider metrics in our dashboards.
Wrote this post.

Day 10: the postmortem

The honest postmortem: the migration was significantly less work than we expected because the RPC interface is genuinely identical. The places that took time were operational — measuring the baseline properly, building the shadow harness, doing a proper canary — not protocol-level.

If you are contemplating a similar move, here is the order we would do it again:

Measure the baseline for a full week before you change anything. Capture the method distribution, the latency percentiles, and the exact line items on the invoice.
Build the SolanaVault workspace and run vault-light-client start --balance <n> against the live network.
Stand up a shadow worker that diffs responses byte-for-byte for at least 24 hours of production traffic.
Tune latency: regional gateway, keep-alive, sequential prefetch hints.
Run the cost projection.
Canary 5% for at least an hour, then promote in stages.
Keep the old provider as a hot standby with an automated breaker for at least a week.

The result for us was a ~92% cost reduction and a slightly better p99 on the historical hot path. The result for our partner was a runbook they actually understood, owned by a smaller vendor that returns Slack messages within an hour. Both of those things matter.

Try SolanaVault on your workload

Clone the workspace and reproduce the numbers in this post with vault-cli compress-demo against your own slot range.

View on GitHub More posts