When archiving data, choosing the right compression algorithm can make a significant difference in both storage efficiency and processing time. With modern multi-core processors, several high-performance compression tools have emerged beyond the traditional gzip and bzip2. This post examines four popular compression algorithms: xz, zstd, bzip3, and brotli to determine which offers the best balance of compression ratio and speed.
Disclaimer
This was a fun Saturday morning experiment and is not meant to be comprehensive. The tests are specific to my datasets, which will not be made publicly available. This exercise was primarily for my own learning and understanding of modern compression programs. Memory usage was not a consideration as I was mostly interested in maximum efficiency but was also willing to use the second-best algorithm when the time duration delta made sense. Claude Sonnet 4.5 was used for the analysis and interpretation of the benchmark results.
Methodology
All tests were conducted on a 2025 MacBook Air M4 with 10 CPU cores, 32 GB RAM, running macOS Sequoia 15.6.1. All compression programs were installed via Homebrew with the following versions:
- xz: XZ Utils 5.8.1 (liblzma 5.8.1)
- zstd: Zstandard CLI v1.5.7
- bzip3: 1.5.3
- brotli: 1.1.0
Each algorithm was configured for near-maximum or maximum compression:
- xz:
xz -T0 -9(level 9, all cores) - zstd:
zstd -T0 -19(level 19, all cores) - bzip3:
bzip3 -j 10 -b 511(10 threads, 511 MiB blocks) - brotli:
brotli -q 9 -w 24(quality 9, 24-bit window)
Four different tar archives were tested, each representing different content types commonly found in software projects and web applications.
Test Results
Archive 1: Mostly HTML Content (388.7 MiB)
| Algorithm | Compressed Size | Ratio | Time |
|---|---|---|---|
| bzip3 | 232,344,538 | 57.01% | 34.76s |
| xz | 235,449,216 | 57.77% | 62.79s |
| zstd | 236,257,143 | 57.96% | 16.68s |
| brotli | 237,228,821 | 58.20% | 14.88s |
For HTML-heavy content, all algorithms performed similarly, with only a 2.1% difference between best and worst compression. bzip3 achieved the best compression but was 2.08x slower than zstd. The modest compression advantage (3.73 MiB) didn’t justify the speed penalty for this dataset.
Archive 2: Mix of HTML, Markdown, JSON, Python Code (75.16 MiB)
| Algorithm | Compressed Size | Ratio | Time |
|---|---|---|---|
| bzip3 | 28,648,442 | 36.35% | 8.23s |
| xz | 30,449,232 | 38.64% | 21.96s |
| zstd | 31,601,031 | 40.10% | 12.48s |
| brotli | 32,556,564 | 41.31% | 4.31s |
bzip3 dominated this mixed-content archive, achieving the best compression while being 2.67x faster than xz and 1.52x faster than zstd. The 10.3% compression advantage over zstd (2.82 MiB savings) combined with the speed advantage made bzip3 the clear winner.
Archive 3: Mostly JSON Content (319.72 MiB)
| Algorithm | Compressed Size | Ratio | Time |
|---|---|---|---|
| bzip3 | 86,972,137 | 25.94% | 23.04s |
| xz | 94,108,896 | 28.08% | 41.42s |
| zstd | 95,203,073 | 28.40% | 14.35s |
| brotli | 98,351,702 | 29.34% | 26.13s |
JSON data compressed exceptionally well with all algorithms. bzip3 excelled here, achieving 9.46% better compression than zstd while being only 1.61x slower. The 7.85 MiB savings made bzip3 the clear winner for this content type.
Archive 4: Mix of JSON, Markdown, Go, Shell Scripts (79.84 MiB)
| Algorithm | Compressed Size | Ratio | Time |
|---|---|---|---|
| bzip3 | 27,437,342 | 32.78% | 8.17s |
| xz | 29,189,892 | 34.87% | 22.15s |
| zstd | 30,591,709 | 36.55% | 12.42s |
| brotli | 31,552,524 | 37.69% | 4.49s |
bzip3 completely dominated this dataset, achieving both the best compression AND the second-fastest speed. It was 11.5% more efficient than zstd while being 1.52x faster than zstd and 2.71x faster than xz.
Aggregate Analysis
Compression Ranking (Consistent Across All Datasets)
- bzip3 - Always best compression
- xz - Always second
- zstd - Always third
- brotli - Always fourth
Speed Ranking
- brotli - Fastest (3 out of 4 datasets)
- bzip3/zstd - Competitive for 2nd/3rd
- xz - Always slowest
Key Findings
bzip3 consistently delivered:
- Best compression across all four datasets (100% win rate)
- Compression advantage over
zstdfrom 1.67% to 11.5% (average ~8%) - Speed comparison vs
zstd: faster in 2 datasets, slower in 2 datasets (1.52x faster to 2.08x slower) - Surprisingly fast performance, placing 2nd in speed in three of the four tests
xz never justified its existence:
- Always achieved second-best compression but was consistently slower than
bzip3 - No scenario where it outperformed
bzip3on either metric
Dataset characteristics mattered:
- HTML-heavy content (Archive 1) showed minimal compression differences between algorithms
- JSON-heavy content (Archive 3) showed the largest compression differences, with
bzip3pulling ahead significantly - Mixed content showed moderate but meaningful compression advantages for
bzip3
Recommendations
Default Choice: bzip3
Use bzip3 as your standard compression tool because it:
- Consistently achieves maximum compression (won all four tests)
- Maintains competitive speed (often 2nd place, sometimes even faster than
zstd) - Scales well with multiple CPU cores
After publishing this analysis, I received feedback from Kamila Szewczyk, the author of bzip3, on optimal configuration. Her recommendations:
For maximum speed with all cores:
# Rule of thumb: divide file size by core count to get block size
# Example: 300 MB file on 10-core CPU
bzip3 -j 10 -b 30 input.tar
For maximum compression:
# Use single-threaded with block size equal to file size
# Example: 300 MB file
bzip3 -b 300 input.tar
Important note: The -b 511 setting used in these benchmarks compresses 5GB of data at once. For files smaller than ~500MB, this causes bzip3 to operate in single-threaded mode, which explains why it was sometimes slower than expected in the benchmarks above.
Alternative: zstd for Speed-Critical Operations
zstd -T0 -19 input.tar -o output.tar.zst
Consider zstd only when:
- Your data compresses similarly across algorithms (like Archive 1)
- You compress frequently and every second counts
- The ~1-10% larger files are acceptable
Never Use
- xz:
bzip3is both better and faster in every scenario tested - brotli: Worst compression with insufficient speed advantage to compensate (typically 1.8-2.3x faster than
bzip3, but sometimes slower)
Conclusion
After testing four diverse datasets, bzip3 emerged as the clear winner for maximum compression without unreasonable speed penalties. Its 100% win rate on compression ratio, combined with competitive speed that frequently matched or exceeded zstd, makes it the ideal default choice for archiving data.
The traditional go-to tools like xz are no longer optimal because bzip3's superior algorithm delivers better compression in less time. While zstd and brotli have their place in scenarios demanding maximum speed, for users seeking the best compression with reasonable performance, bzip3 is the definitive choice in 2025.
Appendix: Benchmark Script
The following bash script was used to conduct the benchmarks. It creates a copy of the source tar file and tests each compression algorithm in sequence, ensuring consistent test conditions:
#!/bin/bash
TAR="$1"
cp -f "${TAR}" source
chmod 400 source
function setup() {
cp -f source "${TAR}"
chmod 600 "${TAR}"
}
setup
echo "Compression: xz"
# can also try: xz -T0 -9e
time xz -T0 -9 "${TAR}"
setup
echo "Compression: zstd"
# can also try: -T0 --ultra -22
time zstd -T0 -19 "${TAR}" -o "${TAR}.zst"
setup
echo "Compression: bzip3"
timeit bzip3 -j 10 -b 511 "${TAR}"
setup
echo "Compression: brotli"
# can also try: brotli -v -q 11
time brotli -q 9 -w 24 "${TAR}" -o "${TAR}.br"