Engineering

Inside the SolanaVault compression pipeline: how we hit 15-25:1 on mainnet blocks

A deep technical walkthrough of the multi-stage compression strategy that ships in vault-core, including the v3 production pipeline and the XGBoost-tuned variant.

2026-03-22 · SolanaVault Team

compressionrustinternals

When people see the README’s “15-25:1 compression ratio” claim, the first reaction is usually polite skepticism. Solana blocks are dense binary payloads — transaction signatures, account writes, compute budget metadata — and the data is already structured. There is no Shannon-style redundancy to vacuum out, no compressible English text. So how do we do it?

This post walks through the four stages of the SolanaVault compression pipeline as it ships in crates/vault-core/src/compression/. We will be concrete: the strategies are named, the trade-offs are explained, and the benchmark numbers are reproducible with vault-cli compress-demo.

The shape of a Solana block

The first move is recognizing what Solana blocks actually look like in the wild. A typical mainnet block at slot 245000000:

Holds 800 to 1,500 transactions on average.
Repeats the same account public keys across hundreds of those transactions.
Repeats the same program IDs (Token, System, AssociatedTokenAccount) tens of thousands of times across a day.
Carries blockhashes that change once per block but appear in every transaction within it.
Encodes compute budget instructions that are nearly identical across most transactions.

In other words, the data has enormous structural redundancy at the high level — the same keys, the same programs, the same instruction headers — even though each transaction’s data field looks random.

That is the lever we pull.

Stage 1: dictionary compaction

The first stage of the pipeline replaces every repeated account, program, and blockhash with a stable 2-byte index into a per-block dictionary. We ship the dictionary at the head of the compressed payload.

struct Stage1Output {
    dictionary: Vec<Pubkey>,    // 32 bytes per entry
    blockhash_index: u16,
    transactions: Vec<TransactionWithIndices>,
}

This alone delivers roughly 4-6x compression because a typical block uses about 200 unique account keys to represent 30,000 to 60,000 account references. Trading 32-byte public keys for 2-byte indices is a strict win every time.

We do not use a global dictionary across blocks. The temptation is real — global dictionaries would push the ratio higher — but the verification story gets ugly. A storage node that loses the dictionary loses every block referencing it. Per-block dictionaries keep each block self-describing and Byzantine-checkable in isolation.

Stage 2: structural delta encoding

The second stage exploits intra-block similarity at the transaction level. Most transactions in a block target the same program with structurally identical instruction layouts. We delta-encode each transaction against an exemplar from its instruction class.

enum TransactionEncoding {
    Exemplar(Transaction),
    Delta { exemplar_index: u16, patches: Vec<Patch> },
}

The delta is field-by-field: account index lists, instruction data, compute budget instructions. For high-volume programs like Jupiter aggregator routes or Raydium swaps, the delta patches are tiny — sometimes just the swap amount and the destination account index.

Stage 2 adds about 2-3x on top of stage 1’s ratio, so we are now at 8-15x for typical blocks. We stop short of cross-block deltas for the same reason we avoided global dictionaries: each block must remain independently verifiable.

Stage 3: entropy coding with adaptive models

The third stage is the one most readers expect to see. We feed the stage-2 output into an adaptive entropy coder. The implementation uses range coding with context-mixing models tuned to the byte distributions we observe in real Solana blocks.

The interesting choice here is the model. We ship two variants:

v3 uses a hand-tuned order-2 context model that performs well on the median block.
production_v3 adds a small ensemble — three context models combined with hard-coded weights derived from offline analysis of one million mainnet blocks.

production_v3 typically adds another 1.5-2x on top of stage 2, putting us at 12-25x depending on block content. This is the strategy the README quotes.

Stage 3.5: the XGBoost-tuned variant

For workloads with a known traffic profile — for example, an indexer that mostly reads Jupiter swap blocks — we ship optimized_xgboost.rs. This variant uses an XGBoost model trained on labeled blocks to predict which of several entropy-coding modes will perform best for a given payload, then commits to that mode.

let mode = xgboost_model.predict_mode(&block.metadata);
let compressed = match mode {
    Mode::HighDelta => stage3_high_delta(block),
    Mode::SparseAccounts => stage3_sparse(block),
    Mode::DenseProgramCalls => stage3_dense(block),
};

In our benchmarks the XGBoost variant adds another 5-10% over production_v3 on workloads it has seen during training. It loses a couple of percent on workloads it has not. We default to production_v3 and let operators opt into the XGBoost variant when they have a stable traffic profile.

Decompression: where it matters

Compression is interesting; decompression is what users feel. Our targets were aggressive: 13-85 microseconds per block on commodity hardware. We hit them by keeping the dictionary lookups in stage 1 vectorizable, the delta patches in stage 2 cache-friendly, and the entropy coder’s table sizes small enough to fit in L2.

The biggest single optimization was switching from a generic BTreeMap<u16, Pubkey> to a packed Vec<Pubkey> indexed by position. The compiler then vectorizes the dictionary application loop and decompression latency drops by a factor of three.

// Slower: BTreeMap probe per access
let pubkey = dictionary.get(&index)?;

// Faster: pure offset, vectorized
let pubkey = &dictionary[index as usize];

Verifying integrity

Compression is useless if the bytes that come out are not bit-identical to what went in. We verify at three points:

After stage 1 we re-expand the dictionary and assert the transaction byte arrays match the input.
After stage 2 we apply each delta against its exemplar and assert the transaction byte arrays match.
After stage 3 we round-trip a hash of the original block through the encoder and decoder, then assert the hash matches.

The Byzantine consensus layer adds a fourth check: when a storage node serves a block, the gateway re-hashes the decompressed payload and compares against an attestation from at least one independent replica. Mismatches reduce the offending node’s reputation and may trigger slashing.

What we will not promise

Compression is data-dependent. If a malicious or adversarial workload produces blocks designed to defeat our models — high-entropy random transactions with unique account keys per call — we will not hit 15-25:1. We will probably hit 3-5:1, which is still useful but less dramatic. Real mainnet traffic does not look like that, but if your application’s traffic does, run the benchmark before you commit.

Try it

The full pipeline is in crates/vault-core/src/compression/. To reproduce the numbers in this post:

git clone https://github.com/cryptuon/solanavault
cd solanavault
cargo build --release
./target/release/vault-cli compress-demo --blocks 245000000:245001000 --strategy production_v3

You will see per-block ratios in your terminal and an aggregate at the end. Pick your own slot range. The pipeline is content-addressable; the ratios should reproduce.

We will publish the XGBoost training pipeline and the labeled block dataset in a follow-up post next month. Until then, production_v3 is the strategy we recommend for general workloads.

Try SolanaVault on your workload

Clone the workspace and reproduce the numbers in this post with vault-cli compress-demo against your own slot range.

View on GitHub More posts