Tag: ip
2010
08.02

In case anyone else is interested in counting requests per IP without the use of some scary sed/awk, here’s a combination of shell commands that I found very useful:

1
grep 'text' /path/to/access.log | cut -d' ' -f1 | sort | uniq -c | sort -r

It breaks down like this:

  1. You grep for whatever string you’re interested in inside your access log. You could want to find a certain path, a certain user-agent, etc. The grep could really be replaced by any command, so long as the result is lines from a standard Apache access log.
  2. The result is piped to cut, which splits each line and grabs the first field (the IP by default)
  3. The result is piped to sort, which sorts the result, putting identical IPs next to each other. Remember, one line corresponds to one request.
  4. The result is piped to uniq, which groups sorted IPs together. The -c option causes uniq to also return the number of lines containing the IP. This is important since we’re interested in frequency
  5. Finally, we pipe the result through sort one more time, which sorts the results by their frequency. The -r option puts the most frequent IPs at the top.

Kudos to http://blogs.law.harvard.edu/djcp/2009/04/how-to-extract-uniq-ips-from-apache-via-grep-cut-and-uniq/, which pointed me in the right direction to begin with.

Note: As Frank mentions in the comments, this tip applies equally as well to other web servers (eg: lighttpd, Nginx) when they’re using their default log format. In fact, the basic principles can be applied to just about any data: you don’t have to be looking at IPs in an access log. The cut/sort/uniq/sort chaining should work well no matter what kind of textual data you’re looking at!