Awk (pattern scanning and processing language)


The structure of AWK command include three parts, which is :
BEGIN, statement block with the pattern match option and END.

awk ' BEGIN{ print "start" } pattern { commands } END{ print "end"}' filename
awk ' BEGIN{ i=0 } { i++ } END{ print i }' filename

BEGIN commonly used for variable initialization and printing the output header for an output table
END commonly used for printing results after analyzing all the lines

$ echo -e "line1\nline2" | awk 'BEGIN{print "==START=="} { print } END{ print "==END==" }'

==START==
line1
line2
==END==

The print command can accept arguments. These arguments are separated by commas, they are printed with a space delimitter. Double quotes are used as the concatenation operator. It's a common practice to place initial variable assignments suach as var=0; in the BEGIN block. The END{} block contains commands to print the results.

$ echo | awk '{one="1"; two="2"; three="3"; print 1+2+3}'
6

$ echo | awk '{one="1"; two="2"; three="3"; print 1 "-" 2 "-" 3}'
1-2-3

Some special variables that can be used with awk are as follows :
- NR : Current record/line number when awk uses line as records.
- NF : Number of fields in the current record being processed (last column)

ps auxf | awk 'NR < 5' #print first four lines
ps auxf | awk 'NR==2,NR==5 #print line 2-5
ps auxf | awk '/darin/' #using regex, grep darin
ps auxf | awk ''!/darin/' #using regex, exclude darin

$ ps auxf | awk 'BEGIN{print "List process :"}{print $0} END {print "Total process is : " NR}'

List process :
xxxxxxxxxxxxx
Total process is : 14

Using the -v argument, we can pass external variable to awk command. We also can pass multiple external variable to awk command. Below is the exa,ple :

$ SENTENCE="Darin"
$ echo | awk -v NAME=$SENTENCE '{print NAME}'
Darin

$ one="JUST"
$ two="DARIN"
$ echo | awk '{print NAME1 NAME2}' NAME1=$one NAME2=$two
JUSTDARIN

$ cat > filename
one="JUST"
two="DARIN"

$ echo | awk '{print NAME1 NAME2}' NAME1=$one NAME2=$two filename
JUSTDARIN

By default, the delimiter for fields is a space. The -F option defines a different field for delimiter. Consider this example :

$ cat /etc/passwd| awk -F: '{print $1}' | tail -3
psaftp
darin
nginx

$ cat /etc/passwd| awk 'BEGIN { FS=":" } { print $1 }' | tail -3
psaftp
darin
nginx