In case anyone else is interested in counting requests per IP without the use of some scary sed/awk, here’s a combination of shell commands that I found very useful:
grep 'text' /path/to/access.log | cut -d' ' -f1 | sort | uniq -c | sort -r
It breaks down like this:
- You grep for whatever string you’re interested in inside your access log. You could want to find a certain path, a certain user-agent, etc. The grep could really be replaced by any command, so long as the result is lines from a standard Apache access log.
- The result is piped to cut, which splits each line and grabs the first field (the IP by default)
- The result is piped to sort, which sorts the result, putting identical IPs next to each other. Remember, one line corresponds to one request.
- The result is piped to uniq, which groups sorted IPs together. The -c option causes uniq to also return the number of lines containing the IP. This is important since we’re interested in frequency
- Finally, we pipe the result through sort one more time, which sorts the results by their frequency. The -r option puts the most frequent IPs at the top.
Kudos to http://blogs.law.harvard.edu/djcp/2009/04/how-to-extract-uniq-ips-from-apache-via-grep-cut-and-uniq/, which pointed me in the right direction to begin with.
Note: As Frank mentions in the comments, this tip applies equally as well to other web servers (eg: lighttpd, Nginx) when they’re using their default log format. In fact, the basic principles can be applied to just about any data: you don’t have to be looking at IPs in an access log. The cut/sort/uniq/sort chaining should work well no matter what kind of textual data you’re looking at!