Introduction
You have a large text file—a log, a CSV export, or system output—and you need to pull out specific information. Maybe you need a list of unique IP addresses, a count of error types, or just the third column from every line. Manually sifting through this data is slow and error-prone.
The Linux command line gives you a powerful trio of tools for this job: cut, sort, and uniq. By connecting them with pipes (|), you can build a fast and reusable data-processing pipeline right in your terminal. Let’s walk through how to use them.
Step-by-Step: From Raw Data to Insight
First, let’s create a sample data file to work with. We’ll call it access.log and populate it with some fake server request data in the format IP_ADDRESS,REQUEST_PATH,STATUS_CODE.
Open your terminal and run the following command. This uses cat with a “here document” to write multiple lines to a file at once.
cat <<'EOF' > access.log
192.168.1.10,/home,200
10.0.0.5,/api/v1/users,404
192.168.1.10,/assets/style.css,200
172.16.0.8,/home,200
10.0.0.5,/api/v1/status,500
192.168.1.10,/home,200
EOF
Now that you have access.log, let’s process it.
Step 1: Extract Data with cut
Your goal is to isolate a specific column of data. Let’s extract just the request paths, which are in the second column.
cut -d ',' -f 2 access.log
Here’s the breakdown:
cut: The command to slice up each line.-d ',': Sets the delimiter. We’re tellingcutthat our columns are separated by a comma.-f 2: Specifies the field (or column) number you want to extract.
Your output should look like this:
/home
/api/v1/users
/assets/style.css
/home
/api/v1/status
/home
Step 2: Organize the Output with sort
The list of paths is useful, but it’s unordered. Before we can find unique entries, we need to group all the identical lines together. We do this by piping the output of cut directly into sort.
cut -d ',' -f 2 access.log | sort
The pipe (|) is a fundamental concept in the shell—it sends the standard output of the command on its left to the standard input of the command on its right. Now your output is alphabetically sorted:
/api/v1/status
/api/v1/users
/assets/style.css
/home
/home
/home
Step 3: Find Unique Lines with uniq
Now that identical lines are adjacent, we can filter out the duplicates. The uniq command is perfect for this. It reads its input, compares adjacent lines, and by default, prints only one copy of each.
Important: uniq will not work correctly on unsorted data, because it only checks for duplicates between consecutive lines. This is why the sort step is essential.
cut -d ',' -f 2 access.log | sort | uniq
You’ve now built a complete pipeline to extract a column and find all unique values within it.
/api/v1/status
/api/v1/users
/assets/style.css
/home
Bonus Step: Counting Occurrences
What if you want to know how many times each path was requested? Just add the -c (count) flag to uniq. Then, to see the most frequent requests, add another sort at the end.
cut -d ',' -f 2 access.log | sort | uniq -c | sort -nr
Let’s break down that new ending:
uniq -c: Now outputs each unique line prefixed with its count.sort -nr: Sorts the result again.-n: Sorts numerically (based on the countuniqadded).-r: Sorts in reverse order (from highest to lowest).
The final output shows you exactly which path is most popular:
3 /home
1 /assets/style.css
1 /api/v1/users
1 /api/v1/status
Conclusion
You’ve just performed a common data analysis task in a single line. By combining cut, sort, and uniq, you can create powerful and elegant solutions to complex text-processing problems.
Remember these three tools and how they work together:
cut: Extracts columns based on a delimiter.sort: Orders lines alphabetically or numerically.uniq: Removes or counts duplicate adjacent lines.
The next time you’re faced with a wall of text, don’t open a spreadsheet. Start building a pipeline one command at a time. Begin with cat yourfile | head to see a sample, add a cut, then a sort, and watch as you transform raw data into clear answers right from your terminal.