Journey of a noob

Learn From Noobs

[AWK] Explaining NR==FNR

Posted at # AWK

[AWK] Explaining NR==FNR

Introduction

Sometimes you might see the phrase NR==FNR in an awk command. Actually this is a common idiom we used in awk.

NR and FNR are two built-in variables in awk. NR refers to the total Number of input Records seen so far. FNR refers to the Number of input Record in the current input File seen so far. Each input record is usually a line, so NR and FNR are usually referring to the line numbers. We don’t really need to care about NR if we are only processing 1 file, as the NR and FNR are always the same when processing the first file.

Built-in VariablesMeaning
NRThe total Number of input Records seen so far.
FNRThe Number of input Record in the current input File seen so far.

Printing out ‘NR’ and ‘FNR’

Let’s visualize NR and FNR in the following example so that we have a better idea of what they are.

  1. First, we have to create 2 files.
noob@learnfromnoobs:~$ cat file1.txt
file1-line1
file1-line2
file1-line3
noob@learnfromnoobs:~$ cat file2.txt
file2-line1
file2-line2
file2-line3
  1. We can use the following command to print out the filename, NR, FNR and the line itself for the files we just created.
noob@learnfromnoobs:~$ awk '{print FILENAME, NR, FNR, $0}' file1.txt file2.txt
file1.txt 1 1 file1-line1
file1.txt 2 2 file1-line2
file1.txt 3 3 file1-line3
file2.txt 4 1 file2-line1
file2.txt 5 2 file2-line2
file2.txt 6 3 file2-line3

As we can see from the above output, both NR and FNR increases when awk processes a new line. The only difference is that FNR resets to 1 when awk processes another file.

Therefore, NR==FNR is actually checking whether we are reading the first file in the argument or not.


Common usage

NR==FNR can come in handy when we are processing two or more files. Here are two common use cases. (Note that there are other ways to accomplish the same goal. We only discuss how awk can be used in these cases. You should pick the right tool according to your situation.)

Let’s update file1.txt and file2.txt before we discuss further.

noob@learnfromnoobs:~$ cat file1.txt
John
Tom
Tony
Alex
Michael
Kalvin
noob@learnfromnoobs:~$ cat file2.txt
Tony
Alex
Chris

1. Print common lines in two files.

awk can be used to print the common lines in two files.

noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } $0 in array' file1.txt file2.txt
Tony
Alex

The command saves all lines in the first file in an array. If the lines from file2.txt are found in the array, we just print them out as the default action of awk. If you still find it difficult to understand the command, we can again it into a few parts:

  1. NR==FNR { array[$0]; next } means if the current line is from the first file (file1.txt), save the line to an array. Otherwise, just skip it.
  2. $0 in array means that we will just print out the lines in file2.txt if they are found in the array we just created (lines in file1.txt).

2. Print all lines in file1.txt that do not appear in file2.txt and file3.txt.

In this article, we have already discussed about how we can print all lines in file1 that do not appear in file2 using awk. Now, let’s expand our knowledge so that we can filter lines from more files.

Let’s create the file file3.txt and run our command now.

noob@learnfromnoobs:~$ cat file3.txt
Tony
John
noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } { delete array[$0] } END { for (key in array) { print key } }' file1.txt file2.txt file3.txt
Tom
Michael
Kalvin

We can break the command into a few parts:

  1. NR==FNR { array[$0];next } means if the current line is from the first file (file1.txt), save the line to an array. Otherwise, just skip it.
  2. { delete array[$0] } is executed when we are processing lines from the rest of the files (file2.txt and file3.txt). If the line is found in the array we created, remove that from the array.
  3. END { for (key in array) { print key } } is executed after all input is read. It prints out all the keys that are still in the array. i.e. the lines in file1.txt that do not appear in file2.txt and file3.txt.

We can also rename our files to make our command easier to understand.

noob@learnfromnoobs:~$ cp file1.txt all.txt
noob@learnfromnoobs:~$ cp file2.txt remove1.txt
noob@learnfromnoobs:~$ cp file3.txt remove2.txt
noob@learnfromnoobs:~$ awk 'NR==FNR { array[$0]; next } { delete array[$0] } END{for (key in array) { print key } }' all.txt remove*.txt
Tom
Michael
Kalvin

Now it is clear that the command will show all lines in all.txt that do not appear in any of the remove*.txt files.


Conclusion

In this article, we discussed:

  1. what NR and FNR are
  2. what NR==FNR means
  3. some common use cases for NR==FNR

I hope you enjoyed this article and learned something new.

Keep learning and have fun!