Byzantine consensus on block reads: how SolanaVault catches lying storage nodes
A practical explainer of the consensus and reputation system that protects SolanaVault clients from malicious or buggy storage nodes.
A practical explainer of the consensus and reputation system that protects SolanaVault clients from malicious or buggy storage nodes.
Decentralization is easy until somebody lies. Once you stop trusting a single operator to be honest, every reply needs to be checkable. SolanaVault does this with a layered Byzantine consensus protocol that runs on every read, plus a reputation system that makes lying expensive over time. This post walks through how that machinery actually works and what it costs.
A SolanaVault gateway routes a read request to one or more storage nodes that hold the slot in question. The threat model assumes:
The goal is to ensure that a light client always receives the canonical block bytes, or a hard error — never silently bad data.
For each read, the gateway requests the block from k replicas chosen by reputation and locality. Each replica returns the compressed payload plus a signed attestation containing the SHA-256 of the decompressed bytes. The gateway accepts the response only if at least q of the k replicas agree on the same hash, where q is the configured Byzantine quorum (typically 2k/3 + 1).
Default values: k = 5, q = 4. Configurable per gateway, per slot range.
Solana blocks have an externally derivable canonical hash through the leader’s signed Vote at the next slot. But for historical reads we do not always have cheap access to those votes, and verifying them inline would defeat the latency budget. We rely instead on intra-replica agreement: if four of five independently chosen replicas, none of which can see each others’ answers in advance, all return the same hash, the probability that they are colluding on a bad block is bounded by the fraction of malicious nodes in the network squared (approximately).
We periodically run a slower “deep verification” job that picks random slots, fetches the leader’s signed vote, and audits whether the network’s quorum hash matches the on-chain vote hash. Discrepancies trigger reputation slashing for the implicated replicas and an alert to gateway operators.
A storage node’s reply is small and structured:
#[derive(Serialize, Deserialize)]
struct ReplicaResponse {
compressed_block: Vec<u8>,
decompressed_hash: [u8; 32],
storage_node_id: PeerId,
signature: Ed25519Signature, // over (slot, decompressed_hash)
timestamp_ns: u64,
}
The signature ties the storage node’s identity to its claim about the block. Replays are bounded by the timestamp and the gateway’s local clock.
A single missed quorum is a soft fault — replicas time out, lag, or have transient bugs. Repeated disagreement is a hard fault. The reputation system tracks both.
The state per node looks roughly like this:
struct ReputationRecord {
node_id: PeerId,
successful_quorums: u64,
soft_faults: u64,
hard_faults: u64,
stake_locked: u64,
last_updated_slot: u64,
}
Hard faults reduce the node’s effective reputation by a multiplicative factor. Below a threshold, the node stops being picked by the gateway’s reputation-weighted selector. Far below it, the node’s locked stake is partially redistributed to the replicas that delivered the correct answer in the offending quorum.
The economic point: a storage node earns small per-query revenue. A single confirmed hard fault wipes out roughly a thousand queries’ worth of earnings. Sustained lying is not profitable.
Running this protocol on every read is not free. Naive implementations would double or triple the p99 because the gateway waits for q of k replicas before responding.
We hide most of the cost with three optimizations:
k replicas from the same geographic cluster as the gateway when possible. NNG transport over a fast intra-region link makes the worst-case quorum wait a single-digit millisecond delta over the single-replica path.In production we observe quorum-protected reads adding 2-7ms over single-replica reads at the p99, and effectively zero at the p50 because the speculative path almost always succeeds.
The protocol catches bad data. It does not catch unavailability. If k - q + 1 replicas refuse to serve a slot, the gateway returns a hard error — which is the correct behavior — but the slot is unreadable until storage capacity rebalances. The reputation system handles this asymmetrically: replicas that are honest but slow lose less reputation than replicas that are fast but wrong.
It also does not protect against a global eclipse: an adversary that controls the DHT routing for a specific gateway can in principle direct all k replica picks to colluding nodes. We mitigate this with DHT path verification and random replica injection (one of the k replicas is always chosen uniformly at random from the global pool, ignoring locality). The eclipse attack remains the most theoretically interesting weakness, and we are upfront about it.
If you run a gateway, the consensus parameters are exposed in your config:
[consensus]
k = 5
q = 4
random_replica_count = 1
deep_verification_sample_rate = 0.001
soft_fault_decay_halflife_slots = 432000 # ~2 days
[reputation]
slashing_enabled = true
max_slashing_per_fault = 0.05 # fraction of locked stake
The defaults are reasonable for most operators. Geyser-style real-time workloads may want lower k for latency. Compliance-grade workloads may want higher q and explicit deep verification on every read.
Decentralization that does not verify is just delegation with extra steps. The Byzantine consensus layer is what lets us claim that SolanaVault is genuinely peer-to-peer without asking customers to trust any single operator. It costs us a few milliseconds at p99. We think that is a fair trade for never having to apologize for serving bad data.
Spin up a managed gateway or clone the repo. The numbers in this post are reproducible — bring your own slot range.