Getting stats with awk, sort and uniq

I find myself sorting through logs all the time, and I have developed a couple of tricks for pulling the information I need out. With a little awk, sort and uniq magic you can get a great deal of info out of your logs.

Say you are looking at the log from a Pix firewall.


Apr 12 00:00:11 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.1/12345 dst DMZ:192.168.2.5/80 \\
  by access-group "outside" 
Apr 12 00:00:12 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.2/54321 dst DMZ:192.168.2.5/25 \\
  by access-group "outside" 
Apr 12 00:00:13 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.3/58453 dst DMZ:192.168.2.5/53 \\
  by access-group "outside" 
Apr 12 00:00:14 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.1/12346 dst DMZ:192.168.2.5/80 \\
  by access-group "outside" 
Apr 12 00:00:15 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.2/54322 dst DMZ:192.168.2.5/25 \\
  by access-group "outside" 
Apr 12 00:00:16 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.3/58454 dst DMZ:192.168.2.5/53 \\
  by access-group "outside" 
Apr 12 00:00:17 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.1/12347 dst DMZ:192.168.2.5/80 \\
  by access-group "outside" 
Apr 12 00:00:18 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.2/54323 dst DMZ:192.168.2.5/25 \\
  by access-group "outside" 
Apr 12 00:00:19 192.168.1 %PIX-1: Deny tcp src \\
  outside:1.1.1.3/58455 dst DMZ:192.168.2.5/53 \\
  by access-group "outside" 

You may have thousands of lines like this, but you just want to get a list of the source and destination addresses. ou can use awk’s print function to just output the source and destination columns. Spaces define the columns, so if you count the columns in the above example you will see that the source address is in column 9 and the destination is in column 11. So you can use:


awk '{ print "src " $9  "  dst " $11 }' logfile.log

and this will give you


src outside:1.1.1.1/12345  dst DMZ:192.168.2.5/80
src outside:1.1.1.2/54321  dst DMZ:192.168.2.5/25
src outside:1.1.1.3/58453  dst DMZ:192.168.2.5/53
src outside:1.1.1.1/12346  dst DMZ:192.168.2.5/80
src outside:1.1.1.2/54322  dst DMZ:192.168.2.5/25
src outside:1.1.1.3/58454  dst DMZ:192.168.2.5/53
src outside:1.1.1.1/12347  dst DMZ:192.168.2.5/80
src outside:1.1.1.2/54323  dst DMZ:192.168.2.5/25
src outside:1.1.1.3/58455  dst DMZ:192.168.2.5/53

So now you might want to get some totals of what source addresses are most prevalent. (The “\” means that it all goes on one line)


awk '{ sub(/\\/.*/,"",$9) } { print "src " $9 }' \\
  logfile.log | sort | uniq -c

and this will give you…


3 src outside:1.1.1.1
3 src outside:1.1.1.2
3 src outside:1.1.1.3

So basically what you are doing is removing everything after the “/” in column 9 to get rid of the ever changing port number. Then you pipe that out to the sort function and telling it to give you all of the unique address with a count.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: