AWK programming language




AWK is a programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating system. When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language.


SOME AWK COMMANDS AND USE:

to find all the lines with a string ‘foo’
awk ‘/foo/ {print}’ file

to print line 1 to 2
awk ‘NR == 1, NR == 2 {print}’ file

to print line 15000 to end of the file
awk 'NR==15000, NR==$NR' test.csv

to print 2nd and last columns
awk '{print $2,$NF;}' employee.txt
In the print statement ‘,’ is a concatenator.

Printf >>
Dash (‘-’) means left aligned
awk -F ' ' '{printf "%-10s%-10s%-10s\n", $2,$4,$6}'

Sprintf >>
Replace the first field by its formatted representation and output the line.
awk '{$1 = sprintf("%4d", $1); print}' infile > outfile.txt

To use bash variables inside awk command >>>
If bash variable is the pattern to march >
awk -v ref="$pattern" 'match($0, ref) {print $2}'
If bash variable is to print as output >
awk -v var="$printthis" '/patt/ {print var}'
for ff in $( ls out* ); do sed -n '/^1.*/p' $ff |awk -F':' -v v="$ff" '{printf("%s %e\n"), v, $2}'; done

to get the number of lines >>
awk ‘END {print NR}’ file_name

Syntax >>
BEGIN { Actions}    # Action before read file
{ACTION}             # Action for every line in a file
END { Actions }        # Action after read file
file_name

example >>
awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}
{print $2,"\t",$3,"\t",$4,"\t",$NF;}
END{print "Report Generated\n--------------";
}' file_name

Get average of 1st column
awk '{ sum += $1 } END { if (NR > 0) print sum / NR }'

to print if the column 1 value is greater than 200
awk ‘$1 > 200’ file_name

~ operator is for comparing regular expressions. to print all the line which have “text” in 4th column
awk '$4 ~/text/' file_name

To count number of rows with “text” in 4th column
awk 'BEGIN { count=0;}
$4 ~ /text/ { count++; }
END { print "Number of lines =",count;}' file_name

to find average of entries which satisfy a condition.
awk -F, 'BEGIN {s=0; c=0;} {if($1>1952) {s=s+$2; c++;}} END {print "avg =",s/c}' input_file

Conditions >>
Check if line starts with ## and print second column
awk -F ' ' '(/^##/) {print $2}' file_name

If statement>>
awk -F ' ' ' {if(/^##/) print $2}' file_name


Combined conditions >>
awk '($1>100) && ($1<200) {print $1;}' rgb-only

If statement>>
awk '{if($1>100 && $1<200) print $1;}' rgb-only

If-else statement>>
awk -v vv="$myline" '{if(/^region/) print vv; else print $0}' in.comp.32

Nested if statements >>
If the line begins with Totaltime then increment q and print if q%2 is equal to 1
awk 'BEGIN{q=0;}{if(/^Totaltime=.*/){q++; if(q%2==1) print;}}' Log1




to denote field separator use -F
awk -F, '$2>=1950 {print $2}' file_name

Using .awk files>>
Content of the grade.awk file ::
{
total=$3+$4+$5;
avg=total/3;
if ( avg >= 90 ) grade="A";
else if ( avg >= 80) grade ="B";
else if (avg >= 70) grade ="C";
else grade="D";

print $0,"=>",grade;
}

Command to run ::
awk -f grade.awk student-marks


http://www.hcs.harvard.edu/~dholland/computers/awk.html
lab2 -Data viz >>

get data from line 4 to the end of the file.
sed -n '4,$p' nation.1751_2010.csv >> onlydata

get lines which have values greater than 1950. "-F," field are
separated by commas.
awk -F, '$2>=1950' onlydata >> onlydata.4m1950

to sort according to the numerical value of column 3. "-t," indicate the
field separator.
sort -nt, -k 3,3  onlydata.4m1950 >>sorted.byemission

To get all the lines starting the string "UNITED KINGDOM".
awk '/^UNITED KINGDOM,/' onlydata.4m1950 >>uk.csv

to replace commas with tabs while extracting column 2, 3, and 9.
awk -F, '{print $2"\t"$3"\t"$9}' ../japan.csv >>japan.tsv

remove lines with "CHINA" in the beginning. then sort according to 3rd line
awk  '!/^CHINA/ {print}' trimed_data | sort -nt, -k 3,3

remove lines with all the names given and sort.
awk  '!/^CHINA|^UNITED|^INDIA|^JAPAN/ {print}' trimed_data | sort -nt, -k 3,3


awk -F'\t' '$2 >=1999 {print}'

to use bash variables in awk, use "'${var_name}'"
year=1950
awk -F'\t' '$2 >="'${year}'" {print}' all.others.4m1950


awk 'BEGIN{x=0;} $2==1999 {x=x+$3} END {print x}' all.others.4m1950


to add values in third column to a single value if the year matches
##################
#!/bin/bash
for i in `seq 1950 2010`
do
x=`awk 'BEGIN{x=0;} $2=="'${i}'" {x=x+$3} END {print x}' all.others.4m1950`
    echo "$i,$x"
done
##################


to get the mean of the columns 2, 3, 4, and 5 (except the 1st row) from several files and print it with the file name
#!/bin/bash
for file in 60 80 120 158 182 200 216
do
    printf $i"\t"
awk 'BEGIN{x1=0; x2=0; x3=0; x4=0; count=0;} \
{if(NR>1) x1+=$2; x2+=$3; x3+=$4; x4+=$5; count++;} \
END{print x1/count"\t"x2/count"\t"x3/count"\t"x4/count}' \
 ./$file
done       

to do same with sed (this is only for one file at a time)
sed -n '2,$p' 20 |awk 'BEGIN{x=0} {x+=$2} END{print x/NR}'


Combine two files with matching values >>
File2 contains the column1 ($1) with the unique ID and column2 and column3 with values need to be added to the end of the lines in file1, when unique ID is matches file1’s 1st column

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1

Explanation ::
    FNR==NR > if current file’s record number(FNR) == total records read(NR). This is true only for the first file (ie. file2 in this case)
    {a[$1]=$2 FS $3;next} > create a hash array with key being $1 and values being $2 and $3 separated by field separator FS. This is executed only for the first file, since this is the body of the above condition. next make awk to jump into the next line without executing the rest.
    { print $0, a[$1]} >prints whole line ($0) and the value associated with the key $1 from the hash array a[]


Subtract a value in a column from value in a column of a different file >>
Here, values in 3rd column of ‘bb’ are subtracted from values in 3rd column of file ‘tt’

awk '{c1 = $1; c2=$2; c3=$3; getline<"bb"; print c1,c2,c3,$3, c3-$3;}' <tt


Comments

Popular posts from this blog

Challenging obstacles for immigrants

What is Linux Shell?

Permission and Ownership using awk