As a Linux admin, you will probably find yourself running grep quite frequently. It is a quick way to find whatever you need from a text file. At least in my case, I probably use grep daily.
Just finding the results is not always enough. Sometimes you also need to count unique values that grep returns. One example is when I need to find out how often a particular error has appeared in a log file.
How do I count unique matches from grep?
There are actually two small programs that I would use in this scenario. They are sort and uniq. Both of them are a part of the GNU coreutils package and should be installed by default on any Linux distribution. Sort sorts the lines alphabetically, and uniq finds repetitions in the output.
Let’s try a simple example. I have made a small mock log file, that has some simple error messages. Here is a snippet from the file:
... ERROR: Error code 8 ERROR: Error code 16 Some other output Some other output ERROR: Error code 19 ERROR: Error code 32 ERROR: Error code 32 ...
Now we want to find all the errors in the file using grep. That should be simple enough.
grep ERROR myfile.log
Now we should get only the error messages. But we want to know how often each of them appears.
To do this, we first pass it through the sort command. Using the pipe character (|). This will sort all the lines and make sure each repetition appears with the lines that are like it.
Next, we will do the actual counting with the uniq command. By default, the uniq command removes duplicates. It also has a helpful argument to count how many duplicates there were. This argument is -c.
So the full command will be:
grep ERROR myfile.log | sort | uniq -c
Running this will show us each message only once with the count prepended to the line.
6 ERROR: Error code 16 7 ERROR: Error code 19 17 ERROR: Error code 32 5 ERROR: Error code 8
This is very helpful in our situation since we can see that “error code 32” was the most common message. But it would help even more if we could see the results ordered.
We can run sort again to achieve this. But this time, we will add two arguments. To ensure that it is sorting by numbers, we add n, and to show the results in descending order, we add r.
grep ERROR myfile.txt | sort | uniq -c | sort -nr
The results are now ordered by frequency.
17 ERROR: Error code 32 7 ERROR: Error code 19 6 ERROR: Error code 16 5 ERROR: Error code 8
Count the total number of results from grep
In some cases, you might just want the total number of results. Not the number of unique results. The example we just went through might be a bit too convoluted for these cases.
This time we could use the wc command. It is also a part of the coreutils package.
The wc command will print the number of lines, words and bytes in the text you pass to it. In our example, we are only interested in the number of lines. So we will pass the -l argument.
grep "Error code 32" myfile.txt | wc -l
This will output the number of times error code 32 appeared: 17
How do I count matches within a line?
I’ll just add this as a bonus, as I just learned this myself. You can also grep for multiple occurrences within a line. Using the -o (–only-matching) option, grep will only output the exact match and continue searching the current line. Each match will be printed on a separate line.
Let’s try this. In keeping with the error theme of the previous examples, the string is the word “error” printed multiple times.
echo "error error error error" | grep -o error
The output should look like this:
error error error error
We can now pipe the output to wc to count the total number of results.
echo "error error error error" | grep -o error | wc -l
Count unique results within a line
We can also count unique values in a similar way as we did in our first example with a regular expression. To do this we add the “-e” argument to the grep command. This regular expression is quite simple. All we do is add a space and the “[[:digit:]]” expression to search for entries that look like this: “error n”. Where n is a single digit:
echo "error 1 error 2 error 1 error 2" | grep -o -e 'error [[:digit:]]
The output should look like this:
error 1 error 2 error 1 error 2
Using our trusty utilities, sort and uniq, we can now count each unique result.
echo "error 1 error 2 error 1 error 2" | grep -o -e 'error [[:digit:]]' | sort | uniq -c
From this one-liner we can see that there are two of each value.
2 error 1 2 error 2
That’s it for now. You should be able to your own counting now. Good luck!