Migrating a Solana indexer off hosted RPC: a 10-day field report
A practical, opinionated guide to moving a real production Solana indexer from a hosted RPC provider to SolanaVault, with the timeline and the surprises.
A practical, opinionated guide to moving a real production Solana indexer from a hosted RPC provider to SolanaVault, with the timeline and the surprises.
We migrated an indexer this month. It processes about 18 million RPC calls a day against a hosted Solana provider, runs four worker pools, and feeds a Postgres + Redis hot path. The migration to SolanaVault took ten working days, end to end, including one weekend we did not work. This post is what we would tell the next team to do.
The indexer in question is for a partner who asked not to be named. The numbers are real. The workflow is generalizable.
Before changing anything, we collected three things:
getSignaturesForAddress, 24% getTransaction, 9% getConfirmedBlock, and 5% miscellaneous.This baseline matters. Without it you cannot honestly evaluate whether the migration was worth it. We strongly recommend any team contemplating a similar change spend at least a day on this step.
Builder is free up to 5M queries a month. We pointed a non-production worker at it and called the four most common methods to confirm parity.
export SOLANA_RPC_URL=https://vault-builder.cryptuon.com/<api-key>
node -e 'require("@solana/web3.js").Connection(process.env.SOLANA_RPC_URL).getSlot().then(console.log)'
Latency on getSlot was 23ms p50 from us-east-1. On the existing hosted provider it was 18ms. Slightly slower at the tip, which was expected — Builder routes through shared gateways. We made a note and moved on.
Because the indexer’s read mix is heavily historical, we set up a shadow worker that replayed the previous day’s read traffic against both endpoints and diffed the responses byte-for-byte.
Out of 4.2 million shadow requests over 24 hours, we found:
getTransaction.getConfirmedBlock.getSignaturesForAddress when the before cursor pointed exactly at a slot boundary — both providers were within spec, just paginating slightly differently.The byte-identical historical reads were the proof we needed. The compression layer is lossless. The decompressed payload is bit-equal to what the hosted provider returns.
The Builder gateway was 5-15ms slower than the hosted provider on cold reads and 2-3ms slower on warm reads. For an indexer running batched fanout, the cumulative difference would have shown up in our SLA.
Three things fixed it:
getSignaturesForAddress pagination. -3ms p50 on the second page and beyond.After these tweaks the Builder gateway was within 1-2ms of the hosted provider on all four hot methods. On getConfirmedBlock it was actually faster, which surprised us until we remembered that the compressed wire payload is a fifteenth the size.
With the latency story acceptable, we ran the cost math. The hosted provider invoice was about USD 4,200/month for the indexer’s load. Builder maxes out at USD 49/month + USD 0.50 per additional million queries above 5M.
At 18M queries/day, the indexer’s run rate was 540M queries/month. On Builder the projected cost was:
A 92% reduction. We double-checked with our account manager at the hosted provider to confirm we were on the appropriate tier — we were. The compression-driven unit economics are simply different.
The surprise was operational. We turned on Vault Cloud’s structured logging and immediately noticed that one of our worker pools was making redundant getSignaturesForAddress calls on the same address — a bug in our pagination logic that the hosted provider’s billing structure had been silently masking because the redundant calls fell into a flat-rate bucket.
Fixing the bug cut another 18% off our query volume. The migration paid for the rewrite of the pagination code by itself.
We did a percentage-based traffic shift: 5%, 25%, 50%, 100% over four hours. The 5% canary ran for two hours before we promoted. The full cutover took 11 minutes once we committed.
We left the hosted provider as a hot standby for one week with a circuit breaker that would shift back on a 5% error rate spike. The breaker never tripped.
We:
vault_* metric prefix to our Prometheus scrape so the new gateway shows up alongside the old hosted-provider metrics in our dashboards.The honest postmortem: the migration was significantly less work than we expected because the RPC interface is genuinely identical. The places that took time were operational — measuring the baseline properly, building the shadow harness, doing a proper canary — not protocol-level.
If you are contemplating a similar move, here is the order we would do it again:
The result for us was a ~92% cost reduction and a slightly better p99 on the historical hot path. The result for our partner was a runbook they actually understood, owned by a smaller vendor that returns Slack messages within an hour. Both of those things matter.
Spin up a managed gateway or clone the repo. The numbers in this post are reproducible — bring your own slot range.