yhdev/dat/dump-awshrink
Yorhel b7746b3122 Redesigned and rewrote the website
Moved most pages out of the index.cgi and into a POD file while I was at
it.
2012-01-17 20:23:53 +01:00

102 lines
4 KiB
Text

=pod
People who run AWStats on large log files have most likely noticed: the data
files can grow quite large, resulting in both a waste of disk space and longer
page generation times for the AWStats pages. I wrote a small script that
analyzes these data files and can remove any information you think is
unnecessary.
B<Download:> L<awshrink|http://dev.yorhel.nl/download/code/awshrink> (copy to
/usr/bin to install).
=head2 Important
Do B<NOT> use this script on data files that are not completed yet (i.e. data
files of the month you're living in). This will result in inaccurate sorting of
visits, pages, referers and whatever other list you're shrinking. Also, keep
in mind that this is just a fast written perl hack, it is by no means fast and
may hog some memory while shrinking data files.
=head2 Usage
awshrink [-c -s] [-SECTION LINES] [..] datafile
-s Show statistics
-c Overwrite datafile instead of writing to a backupfile (datafile~)
-SECTION LINES
Shrink the selected SECTION to LINES lines. (See example below)|;
=head2 Typical command-line usage
While awshrink is most useful for monthly cron jobs, here's an example of basic
command line usage to demonstrate what the script can do:
$ wc -c awstats122007.a.txt
29916817 awstats122007.a.txt
$ awshrink -s awstats122007.a.txt
Section Size (Bytes) Lines
SCREENSIZE* 74 0
WORMS 131 0
EMAILRECEIVER 135 0
EMAILSENDER 143 0
CLUSTER* 144 0
LOGIN 155 0
ORIGIN* 178 6
ERRORS* 229 10
SESSION* 236 7
FILETYPES* 340 12
MISC* 341 10
GENERAL* 362 8
OS* 414 29
SEREFERRALS 587 34
TIME* 1270 24
DAY* 1293 31
ROBOT 1644 40
BROWSER 1992 127
DOMAIN 2377 131
UNKNOWNREFERERBROWSER 5439 105
UNKNOWNREFERER 20585 317
SIDER_404 74717 2199
PAGEREFS 130982 2500
KEYWORDS 288189 27036
SIDER 1058723 25470
SEARCHWORDS 5038611 157807
VISITOR 23285662 416084
* = not shrinkable
$ awshrink -s -c -VISITOR 100 -SEARCHWORDS 100 -SIDER 100 awstats122007.a.txt
Section Size (Bytes) Lines
SCREENSIZE* 74 0
WORMS 131 0
EMAILRECEIVER 135 0
EMAILSENDER 143 0
CLUSTER* 144 0
LOGIN 155 0
ORIGIN* 178 6
ERRORS* 229 10
SESSION* 236 7
FILETYPES* 340 12
MISC* 341 10
GENERAL* 362 8
OS* 414 29
SEREFERRALS 587 34
TIME* 1270 24
DAY* 1293 31
ROBOT 1644 40
BROWSER 1992 127
SEARCHWORDS 2289 100
DOMAIN 2377 131
SIDER 3984 100
UNKNOWNREFERERBROWSER 5439 105
VISITOR 5980 100
UNKNOWNREFERER 20585 317
SIDER_404 74717 2199
PAGEREFS 130982 2500
KEYWORDS 288189 27036
* = not shrinkable
$ wc -c awstats122007.a.txt
546074 awstats122007.a.txt