When archiving data, choosing the right compression algorithm can make a significant difference in both storage efficiency and processing time. With modern multi-core processors, several high-performance compression tools have emerged beyond the traditional gzip and bzip2. This post examines four popular compression algorithms: xz, zstd, bzip3, and brotli to determine which offers the best balance of compression ratio and speed.

Disclaimer

This was a fun Saturday morning experiment and is not meant to be comprehensive. The tests are specific to my datasets, which will not be made publicly available. This exercise was primarily for my own learning and understanding of modern compression programs. Memory usage was not a consideration as I was mostly interested in maximum efficiency but was also willing to use the second-best algorithm when the time duration delta made sense. Claude Sonnet 4.5 was used for the analysis and interpretation of the benchmark results.

Methodology

All tests were conducted on a 2025 MacBook Air M4 with 10 CPU cores, 32 GB RAM, running macOS Sequoia 15.6.1. All compression programs were installed via Homebrew with the following versions:

  • xz: XZ Utils 5.8.1 (liblzma 5.8.1)
  • zstd: Zstandard CLI v1.5.7
  • bzip3: 1.5.3
  • brotli: 1.1.0

Each algorithm was configured for near-maximum or maximum compression:

  • xz: xz -T0 -9 (level 9, all cores)
  • zstd: zstd -T0 -19 (level 19, all cores)
  • bzip3: bzip3 -j 10 -b 511 (10 threads, 511 MiB blocks)
  • brotli: brotli -q 9 -w 24 (quality 9, 24-bit window)

Four different tar archives were tested, each representing different content types commonly found in software projects and web applications.

Test Results

Archive 1: Mostly HTML Content (388.7 MiB)

AlgorithmCompressed SizeRatioTime
bzip3232,344,53857.01%34.76s
xz235,449,21657.77%62.79s
zstd236,257,14357.96%16.68s
brotli237,228,82158.20%14.88s

For HTML-heavy content, all algorithms performed similarly, with only a 2.1% difference between best and worst compression. bzip3 achieved the best compression but was 2.08x slower than zstd. The modest compression advantage (3.73 MiB) didn’t justify the speed penalty for this dataset.

Archive 2: Mix of HTML, Markdown, JSON, Python Code (75.16 MiB)

AlgorithmCompressed SizeRatioTime
bzip328,648,44236.35%8.23s
xz30,449,23238.64%21.96s
zstd31,601,03140.10%12.48s
brotli32,556,56441.31%4.31s

bzip3 dominated this mixed-content archive, achieving the best compression while being 2.67x faster than xz and 1.52x faster than zstd. The 10.3% compression advantage over zstd (2.82 MiB savings) combined with the speed advantage made bzip3 the clear winner.

Archive 3: Mostly JSON Content (319.72 MiB)

AlgorithmCompressed SizeRatioTime
bzip386,972,13725.94%23.04s
xz94,108,89628.08%41.42s
zstd95,203,07328.40%14.35s
brotli98,351,70229.34%26.13s

JSON data compressed exceptionally well with all algorithms. bzip3 excelled here, achieving 9.46% better compression than zstd while being only 1.61x slower. The 7.85 MiB savings made bzip3 the clear winner for this content type.

Archive 4: Mix of JSON, Markdown, Go, Shell Scripts (79.84 MiB)

AlgorithmCompressed SizeRatioTime
bzip327,437,34232.78%8.17s
xz29,189,89234.87%22.15s
zstd30,591,70936.55%12.42s
brotli31,552,52437.69%4.49s

bzip3 completely dominated this dataset, achieving both the best compression AND the second-fastest speed. It was 11.5% more efficient than zstd while being 1.52x faster than zstd and 2.71x faster than xz.

Aggregate Analysis

Compression Ranking (Consistent Across All Datasets)

  1. bzip3 - Always best compression
  2. xz - Always second
  3. zstd - Always third
  4. brotli - Always fourth

Speed Ranking

  1. brotli - Fastest (3 out of 4 datasets)
  2. bzip3/zstd - Competitive for 2nd/3rd
  3. xz - Always slowest

Key Findings

bzip3 consistently delivered:

  • Best compression across all four datasets (100% win rate)
  • Compression advantage over zstd from 1.67% to 11.5% (average ~8%)
  • Speed comparison vs zstd: faster in 2 datasets, slower in 2 datasets (1.52x faster to 2.08x slower)
  • Surprisingly fast performance, placing 2nd in speed in three of the four tests

xz never justified its existence:

  • Always achieved second-best compression but was consistently slower than bzip3
  • No scenario where it outperformed bzip3 on either metric

Dataset characteristics mattered:

  • HTML-heavy content (Archive 1) showed minimal compression differences between algorithms
  • JSON-heavy content (Archive 3) showed the largest compression differences, with bzip3 pulling ahead significantly
  • Mixed content showed moderate but meaningful compression advantages for bzip3

Recommendations

Default Choice: bzip3

Use bzip3 as your standard compression tool because it:

  • Consistently achieves maximum compression (won all four tests)
  • Maintains competitive speed (often 2nd place, sometimes even faster than zstd)
  • Scales well with multiple CPU cores

After publishing this analysis, I received feedback from Kamila Szewczyk, the author of bzip3, on optimal configuration. Her recommendations:

For maximum speed with all cores:

# Rule of thumb: divide file size by core count to get block size
# Example: 300 MB file on 10-core CPU
bzip3 -j 10 -b 30 input.tar

For maximum compression:

# Use single-threaded with block size equal to file size
# Example: 300 MB file
bzip3 -b 300 input.tar

Important note: The -b 511 setting used in these benchmarks compresses 5GB of data at once. For files smaller than ~500MB, this causes bzip3 to operate in single-threaded mode, which explains why it was sometimes slower than expected in the benchmarks above.

Alternative: zstd for Speed-Critical Operations

zstd -T0 -19 input.tar -o output.tar.zst

Consider zstd only when:

  • Your data compresses similarly across algorithms (like Archive 1)
  • You compress frequently and every second counts
  • The ~1-10% larger files are acceptable

Never Use

  • xz: bzip3 is both better and faster in every scenario tested
  • brotli: Worst compression with insufficient speed advantage to compensate (typically 1.8-2.3x faster than bzip3, but sometimes slower)

Conclusion

After testing four diverse datasets, bzip3 emerged as the clear winner for maximum compression without unreasonable speed penalties. Its 100% win rate on compression ratio, combined with competitive speed that frequently matched or exceeded zstd, makes it the ideal default choice for archiving data.

The traditional go-to tools like xz are no longer optimal because bzip3's superior algorithm delivers better compression in less time. While zstd and brotli have their place in scenarios demanding maximum speed, for users seeking the best compression with reasonable performance, bzip3 is the definitive choice in 2025.

Appendix: Benchmark Script

The following bash script was used to conduct the benchmarks. It creates a copy of the source tar file and tests each compression algorithm in sequence, ensuring consistent test conditions:

#!/bin/bash

TAR="$1"
cp -f "${TAR}" source
chmod 400 source

function setup() {
    cp -f source "${TAR}"
    chmod 600 "${TAR}"
}

setup
echo "Compression: xz"
# can also try: xz -T0 -9e
time xz -T0 -9 "${TAR}"

setup
echo "Compression: zstd"
# can also try: -T0 --ultra -22
time zstd -T0 -19 "${TAR}" -o "${TAR}.zst"

setup
echo "Compression: bzip3"
timeit bzip3 -j 10 -b 511 "${TAR}"

setup
echo "Compression: brotli"
# can also try: brotli -v -q 11
time brotli -q 9 -w 24 "${TAR}" -o "${TAR}.br"