Using “comm” to Compare Files: A Practical Guide

The comm command is a powerful GNU utility for comparing sorted text files line by line. While similar to diff, comm provides output that’s especially useful for scripting and automated processing.

Table of Contents

Prerequisites

  • On Linux: comm comes pre-installed
  • On Windows: Install via Git Bash or Windows Subsystem for Linux (WSL)
  • Files must be sorted before comparison (see Working with Unsorted Files)

Basic Usage

comm [OPTION]... LEFT_FILE RIGHT_FILE

Options:

  • -1: Suppress lines unique to left file
  • -2: Suppress lines unique to right file
  • -3: Suppress lines that appear in both files

Examples

Let’s compare two sample files:

a.txt
-----
alpha
bravo
charlie
delta
echo
foxtrot
golf
hotel

b.txt
-----
alpha
bravo
delta
echo
echo2
foxtrot
hotel
india
juliett
kilo

Finding Lines Unique to LEFT File

$ comm -2 -3 a.txt b.txt
charlie
golf

Finding Lines Unique to RIGHT File

$ comm -1 -3 a.txt b.txt
echo2
india
juliett
kilo

Viewing All Differences

$ comm -3 a.txt b.txt
    charlie          # Left file only (no leading spaces)
        echo2        # Right file only (4 leading spaces)
    golf
        india
        juliett
        kilo

Note: The output format uses columns separated by tabs:

  • Column 1 (no spaces): Lines unique to first file
  • Column 2 (4 spaces): Lines unique to second file
  • Column 3 (8 spaces): Lines common to both files (when not using -3)

Finding Common Lines

$ comm -1 -2 a.txt b.txt
alpha
bravo
delta
echo
foxtrot
hotel

Working with Unsorted Files

If your files aren’t sorted, you’ll need to sort them before comparison. While you could create temporary sorted files, there’s a more elegant solution using process substitution:

$ comm <(sort file1.txt) <(sort file2.txt)

Let’s break down how this works:

  1. The <(command) syntax is called process substitution. It:

    • Executes the command inside the parentheses
    • Creates a temporary file-like object containing the command’s output
    • Provides that file-like object to the outer command
  2. In this case:

    • <(sort file1.txt) sorts the contents of file1.txt and provides it as the LEFT_FILE to comm
    • <(sort file2.txt) sorts the contents of file2.txt and provides it as the RIGHT_FILE to comm
    • Both sorting operations happen in memory without creating temporary files

This is equivalent to, but more efficient than, the following multiple-step process:

# The long way (not recommended)
$ sort file1.txt > sorted1.txt
$ sort file2.txt > sorted2.txt
$ comm sorted1.txt sorted2.txt
$ rm sorted1.txt sorted2.txt

The process substitution method is:

  • More efficient (no temporary files needed)
  • Cleaner (no cleanup required)
  • Thread-safe (no risk of name collisions in temporary files)
  • Perfect for use in scripts and pipelines

You can combine this with other commands. For example, to sort case-insensitively, use -f. This option converts all lowercase characters to their uppercase equivalent before comparison:

$ comm <(sort -f file1.txt) <(sort -f file2.txt)

Comparison with diff

While diff provides similar functionality, its output format can be harder to parse:

$ diff --side-by-side --color --suppress-common-line a.txt b.txt
charlie                          <
                                 >  echo2
golf                             <
                                 > india
                                 > juliett
                                 > kilo

comm is often preferred when:

  • You need machine-readable output
  • You want to process the differences programmatically
  • You want to easily extract specific differences (unique to left, right, or common)

Conclusion

The comm command is a versatile tool for comparing sorted text files. Its simple, column-based output format makes it especially useful for scripting and automated processing tasks. While it requires pre-sorted input, this limitation is easily addressed using the sort command.