5 min read

How to Process Text Files Like a Pro with `cut`, `sort`, and `uniq`

Table of Contents

Introduction

You have a large text file—a log, a CSV export, or system output—and you need to pull out specific information. Maybe you need a list of unique IP addresses, a count of error types, or just the third column from every line. Manually sifting through this data is slow and error-prone.

The Linux command line gives you a powerful trio of tools for this job: cut, sort, and uniq. By connecting them with pipes (|), you can build a fast and reusable data-processing pipeline right in your terminal. Let’s walk through how to use them.

Step-by-Step: From Raw Data to Insight

First, let’s create a sample data file to work with. We’ll call it access.log and populate it with some fake server request data in the format IP_ADDRESS,REQUEST_PATH,STATUS_CODE.

Open your terminal and run the following command. This uses cat with a “here document” to write multiple lines to a file at once.

cat <<'EOF' > access.log
192.168.1.10,/home,200
10.0.0.5,/api/v1/users,404
192.168.1.10,/assets/style.css,200
172.16.0.8,/home,200
10.0.0.5,/api/v1/status,500
192.168.1.10,/home,200
EOF

Now that you have access.log, let’s process it.

Step 1: Extract Data with cut

Your goal is to isolate a specific column of data. Let’s extract just the request paths, which are in the second column.

cut -d ',' -f 2 access.log

Here’s the breakdown:

  • cut: The command to slice up each line.
  • -d ',': Sets the delimiter. We’re telling cut that our columns are separated by a comma.
  • -f 2: Specifies the field (or column) number you want to extract.

Your output should look like this:

/home
/api/v1/users
/assets/style.css
/home
/api/v1/status
/home

Step 2: Organize the Output with sort

The list of paths is useful, but it’s unordered. Before we can find unique entries, we need to group all the identical lines together. We do this by piping the output of cut directly into sort.

cut -d ',' -f 2 access.log | sort

The pipe (|) is a fundamental concept in the shell—it sends the standard output of the command on its left to the standard input of the command on its right. Now your output is alphabetically sorted:

/api/v1/status
/api/v1/users
/assets/style.css
/home
/home
/home

Step 3: Find Unique Lines with uniq

Now that identical lines are adjacent, we can filter out the duplicates. The uniq command is perfect for this. It reads its input, compares adjacent lines, and by default, prints only one copy of each.

Important: uniq will not work correctly on unsorted data, because it only checks for duplicates between consecutive lines. This is why the sort step is essential.

cut -d ',' -f 2 access.log | sort | uniq

You’ve now built a complete pipeline to extract a column and find all unique values within it.

/api/v1/status
/api/v1/users
/assets/style.css
/home

Bonus Step: Counting Occurrences

What if you want to know how many times each path was requested? Just add the -c (count) flag to uniq. Then, to see the most frequent requests, add another sort at the end.

cut -d ',' -f 2 access.log | sort | uniq -c | sort -nr

Let’s break down that new ending:

  • uniq -c: Now outputs each unique line prefixed with its count.
  • sort -nr: Sorts the result again.
    • -n: Sorts numerically (based on the count uniq added).
    • -r: Sorts in reverse order (from highest to lowest).

The final output shows you exactly which path is most popular:

      3 /home
      1 /assets/style.css
      1 /api/v1/users
      1 /api/v1/status

Conclusion

You’ve just performed a common data analysis task in a single line. By combining cut, sort, and uniq, you can create powerful and elegant solutions to complex text-processing problems.

Remember these three tools and how they work together:

  • cut: Extracts columns based on a delimiter.
  • sort: Orders lines alphabetically or numerically.
  • uniq: Removes or counts duplicate adjacent lines.

The next time you’re faced with a wall of text, don’t open a spreadsheet. Start building a pipeline one command at a time. Begin with cat yourfile | head to see a sample, add a cut, then a sort, and watch as you transform raw data into clear answers right from your terminal.