Using “comm” to Compare Files: A Practical Guide
The comm
command is a powerful GNU utility for comparing sorted text files line by line. While similar to diff
, comm
provides output that’s especially useful for scripting and automated processing.
Table of Contents
Prerequisites
- On Linux:
comm
comes pre-installed - On Windows: Install via Git Bash or Windows Subsystem for Linux (WSL)
- Files must be sorted before comparison (see Working with Unsorted Files)
Basic Usage
comm [OPTION]... LEFT_FILE RIGHT_FILE
Options:
-1
: Suppress lines unique to left file-2
: Suppress lines unique to right file-3
: Suppress lines that appear in both files
Examples
Let’s compare two sample files:
a.txt
-----
alpha
bravo
charlie
delta
echo
foxtrot
golf
hotel
b.txt
-----
alpha
bravo
delta
echo
echo2
foxtrot
hotel
india
juliett
kilo
Finding Lines Unique to LEFT File
$ comm -2 -3 a.txt b.txt
charlie
golf
Finding Lines Unique to RIGHT File
$ comm -1 -3 a.txt b.txt
echo2
india
juliett
kilo
Viewing All Differences
$ comm -3 a.txt b.txt
charlie # Left file only (no leading spaces)
echo2 # Right file only (4 leading spaces)
golf
india
juliett
kilo
Note: The output format uses columns separated by tabs:
- Column 1 (no spaces): Lines unique to first file
- Column 2 (4 spaces): Lines unique to second file
- Column 3 (8 spaces): Lines common to both files (when not using -3)
Finding Common Lines
$ comm -1 -2 a.txt b.txt
alpha
bravo
delta
echo
foxtrot
hotel
Working with Unsorted Files
If your files aren’t sorted, you’ll need to sort them before comparison. While you could create temporary sorted files, there’s a more elegant solution using process substitution:
$ comm <(sort file1.txt) <(sort file2.txt)
Let’s break down how this works:
The
<(command)
syntax is called process substitution. It:- Executes the command inside the parentheses
- Creates a temporary file-like object containing the command’s output
- Provides that file-like object to the outer command
In this case:
<(sort file1.txt)
sorts the contents of file1.txt and provides it as the LEFT_FILE to comm<(sort file2.txt)
sorts the contents of file2.txt and provides it as the RIGHT_FILE to comm- Both sorting operations happen in memory without creating temporary files
This is equivalent to, but more efficient than, the following multiple-step process:
# The long way (not recommended)
$ sort file1.txt > sorted1.txt
$ sort file2.txt > sorted2.txt
$ comm sorted1.txt sorted2.txt
$ rm sorted1.txt sorted2.txt
The process substitution method is:
- More efficient (no temporary files needed)
- Cleaner (no cleanup required)
- Thread-safe (no risk of name collisions in temporary files)
- Perfect for use in scripts and pipelines
You can combine this with other commands. For example, to sort case-insensitively, use -f
.
This option converts all lowercase characters to their uppercase equivalent before comparison:
$ comm <(sort -f file1.txt) <(sort -f file2.txt)
Comparison with diff
While diff
provides similar functionality, its output format can be harder to parse:
$ diff --side-by-side --color --suppress-common-line a.txt b.txt
charlie <
> echo2
golf <
> india
> juliett
> kilo
comm
is often preferred when:
- You need machine-readable output
- You want to process the differences programmatically
- You want to easily extract specific differences (unique to left, right, or common)
Conclusion
The comm
command is a versatile tool for comparing sorted text files. Its simple, column-based output format makes it especially useful for scripting and automated processing tasks. While it requires pre-sorted input, this limitation is easily addressed using the sort
command.