[AWK] Explaining NR==FNR
[AWK] Explaining NR==FNR
Introduction
Sometimes you might see the phrase NR==FNR
in an awk command. Actually this is a common idiom we used in awk.
NR
and FNR
are two built-in variables in awk. NR
refers to the total Number of input Records seen so far. FNR
refers to the Number of input Record in the current input File seen so far. Each input record is usually a line, so NR
and FNR
are usually referring to the line numbers. We don’t really need to care about NR
if we are only processing 1 file, as the NR
and FNR
are always the same when processing the first file.
Built-in Variables | Meaning |
---|---|
NR | The total Number of input Records seen so far. |
FNR | The Number of input Record in the current input File seen so far. |
Printing out ‘NR’ and ‘FNR’
Let’s visualize NR
and FNR
in the following example so that we have a better idea of what they are.
- First, we have to create 2 files.
noob@learnfromnoobs:~$ cat file1.txt
file1-line1
file1-line2
file1-line3
noob@learnfromnoobs:~$ cat file2.txt
file2-line1
file2-line2
file2-line3
- We can use the following command to print out the filename,
NR
,FNR
and the line itself for the files we just created.
noob@learnfromnoobs:~$ awk '{print FILENAME, NR, FNR, $0}' file1.txt file2.txt
file1.txt 1 1 file1-line1
file1.txt 2 2 file1-line2
file1.txt 3 3 file1-line3
file2.txt 4 1 file2-line1
file2.txt 5 2 file2-line2
file2.txt 6 3 file2-line3
As we can see from the above output, both NR
and FNR
increases when awk processes a new line. The only difference is that FNR
resets to 1 when awk processes another file.
Therefore, NR==FNR
is actually checking whether we are reading the first file in the argument or not.
Common usage
NR==FNR
can come in handy when we are processing two or more files. Here are two common use cases. (Note that there are other ways to accomplish the same goal. We only discuss how awk can be used in these cases. You should pick the right tool according to your situation.)
Let’s update file1.txt
and file2.txt
before we discuss further.
noob@learnfromnoobs:~$ cat file1.txt
John
Tom
Tony
Alex
Michael
Kalvin
noob@learnfromnoobs:~$ cat file2.txt
Tony
Alex
Chris
1. Print common lines in two files.
awk can be used to print the common lines in two files.
noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } $0 in array' file1.txt file2.txt
Tony
Alex
The command saves all lines in the first file in an array. If the lines from file2.txt
are found in the array, we just print them out as the default action of awk. If you still find it difficult to understand the command, we can again it into a few parts:
NR==FNR { array[$0]; next }
means if the current line is from the first file (file1.txt
), save the line to an array. Otherwise, just skip it.$0 in array
means that we will just print out the lines infile2.txt
if they are found in the array we just created (lines infile1.txt
).
2. Print all lines in file1.txt that do not appear in file2.txt and file3.txt.
In this article, we have already discussed about how we can print all lines in file1 that do not appear in file2 using awk. Now, let’s expand our knowledge so that we can filter lines from more files.
Let’s create the file file3.txt
and run our command now.
noob@learnfromnoobs:~$ cat file3.txt
Tony
John
noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } { delete array[$0] } END { for (key in array) { print key } }' file1.txt file2.txt file3.txt
Tom
Michael
Kalvin
We can break the command into a few parts:
NR==FNR { array[$0];next }
means if the current line is from the first file (file1.txt
), save the line to an array. Otherwise, just skip it.{ delete array[$0] }
is executed when we are processing lines from the rest of the files (file2.txt
andfile3.txt
). If the line is found in the array we created, remove that from the array.END { for (key in array) { print key } }
is executed after all input is read. It prints out all the keys that are still in the array. i.e. the lines infile1.txt
that do not appear infile2.txt
andfile3.txt
.
We can also rename our files to make our command easier to understand.
noob@learnfromnoobs:~$ cp file1.txt all.txt
noob@learnfromnoobs:~$ cp file2.txt remove1.txt
noob@learnfromnoobs:~$ cp file3.txt remove2.txt
noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } { delete array[$0] } END{for (key in array) { print key } }' all.txt remove*.txt
Tom
Michael
Kalvin
Now it is clear that the command will show all lines in all.txt
that do not appear in any of the remove*.txt
files.
Conclusion
In this article, we discussed:
- what
NR
andFNR
are - what
NR==FNR
means - some common use cases for
NR==FNR
I hope you enjoyed this article and learned something new.
Keep learning and have fun!