- not so Frequently Asked Questions - update 2004/11/29
|
|
Analyze and count httpd access_logBecause an access-log file of Web server becomes very large within a short period, this file is renewed at the fixed interval - every week / month, etc. Here we assume the case that the log-file is renewed every month automatically. This is a very common case for many WWW servers. A default log-file of Apache (access_log) has following formatted lines. host.domain - - [01/Jan/2000:01:23:45 +0900] "GET /index.html HTTP/1.1" 200 1548 host.domain - - [01/Jan/2000:01:23:50 +0900] "GET /icons/mail.png HTTP/1.1" 200 229 Now let's count how many accesses are there during 24 hours. What we need is the "date" part (01) in [01/Jan/2000: . Count the number of lines those "date" is the same. This is done for the first day to 31th in a month. #!/usr/bin/perl while(<>){ if(/\.html/){ split; $day = substr($_[3],1,2); $count[$day]++; } } for($i=1;$i<=$#count;$i++){ printf("%10d %10d\n",$i,$count[$i]); } In the access_log file, any kinds of Web access such as image files are recorded, so you need to exclude any log-lines other than the access to HTML files. You can also count an access to a definite file by changing if line in the Perl program. Firstly each line is separated into items (delimiter is white-space), then the "date" part is cut off by substr. Substitute this into the variable $day, and increment its counter. We named the Perl script above "webplot.pl". This is an example of access statistics to some Web Server in January, 2000, processed by "webplot.plt". 1 172 2 321 3 208 4 279 5 327 .... .... 25 588 26 1038 27 848 28 772 29 570 30 495 31 548 The following shows a graph drawn by gnuplot, dumb terminal. The letter < in "< webplot.pl access_log" means to read an output of Perl program. gnuplot> set term dumb Terminal type set to 'dumb' Options are 'feed 79 24' gnuplot> plot "< webplot.pl access_log" with step 1600 ++--------+---------+---------+---------+---------+---------+--------++ + + + + "< webplot.pl access_log" ****** + 1400 ++ *** ++ | * * | | * * | 1200 ++ * * ++ | * * | 1000 ++ *** * * *** ++ | * * * * * * | | * * * * * *** | 800 ++ * * * * * *** ++ | * * *** * * * * | 600 ++ *** * *** * * * * * ++ | *** * * * *** * ***** *** * | | * * * ******* * * * *** | 400 ++ * * * *** *** *** ++ | *** *** * * ***** * * | 200 ++ * ***** *** *** *** ++ | *** | + + + + + + + + 0 ++--------+---------+---------+---------+---------+---------+--------++ 0 5 10 15 20 25 30 35 |