Fast, Parallel, and Scalable Differential Graph Compression
Graph compression is essential for storing and analyzing large-scale graphs such as web crawls, social networks, and code repositories. In this paper we discuss new compression techniques for the Rust implementation of the WebGraph framework. We take inspiration from Zuckerli's extensions to the classical Boldi--Vigna (BV) differential graph compression scheme, and propose a scalable, parallelizable compressor. We also develop a hybrid Huffman coding implementation with programmable maximum codeword length which makes it possible to eliminate context modeling and achieve faster decoding times with similar compression effectiveness. We obtain compression ratios up to 15\% better than the standard BV compressor, while providing random-access times up to 54% faster than the Zuckerli implementation. We combine these techniques with a new, streamlined version of Apostolico—Drovandi's $\pi$ codes, which further improve compression and increase encoding and decoding speed. We validate our approach on graphs ranging from hundreds of thousands to over a billion nodes, including a Software Heritage archive with $\approx 50$ billion nodes and $\approx854$ billion arcs. In particular, we show that, unlike Zuckerli, we can improve at the same time both the compression ratio and the random-access speed with respect to WebGraph.

