=pod

People who run AWStats on large log files have most likely noticed: the data
files can grow quite large, resulting in both a waste of disk space and longer
page generation times for the AWStats pages. I wrote a small script that
analyzes these data files and can remove any information you think is
unnecessary.

B<Download:> L<awshrink|http://dev.yorhel.nl/download/code/awshrink> (copy to
/usr/bin to install).


=head2 Important

Do B<NOT> use this script on data files that are not completed yet (i.e. data
files of the month you're living in). This will result in inaccurate sorting of
visits, pages, referers and whatever other list you're shrinking. Also, keep
in mind that this is just a fast written perl hack, it is by no means fast and
may hog some memory while shrinking data files.


=head2 Usage

  awshrink [-c -s] [-SECTION LINES] [..] datafile
  -s  Show statistics
  -c  Overwrite datafile instead of writing to a backupfile (datafile~)
  -SECTION LINES
    Shrink the selected SECTION to LINES lines. (See example below)


=head2 Typical command-line usage

While awshrink is most useful for monthly cron jobs, here's an example of basic
command line usage to demonstrate what the script can do:

  $ wc -c awstats122007.a.txt
  29916817 awstats122007.a.txt

  $ awshrink -s awstats122007.a.txt
                 Section  Size (Bytes)   Lines
             SCREENSIZE*            74       0
                  WORMS            131       0
          EMAILRECEIVER            135       0
            EMAILSENDER            143       0
                CLUSTER*           144       0
                  LOGIN            155       0
                 ORIGIN*           178       6
                 ERRORS*           229      10
                SESSION*           236       7
              FILETYPES*           340      12
                   MISC*           341      10
                GENERAL*           362       8
                     OS*           414      29
            SEREFERRALS            587      34
                   TIME*          1270      24
                    DAY*          1293      31
                  ROBOT           1644      40
                BROWSER           1992     127
                 DOMAIN           2377     131
  UNKNOWNREFERERBROWSER           5439     105
         UNKNOWNREFERER          20585     317
              SIDER_404          74717    2199
               PAGEREFS         130982    2500
               KEYWORDS         288189   27036
                  SIDER        1058723   25470
            SEARCHWORDS        5038611  157807
                VISITOR       23285662  416084
  * = not shrinkable

  $ awshrink -s -c -VISITOR 100 -SEARCHWORDS 100 -SIDER 100 awstats122007.a.txt
                 Section  Size (Bytes)   Lines
             SCREENSIZE*            74       0
                  WORMS            131       0
          EMAILRECEIVER            135       0
            EMAILSENDER            143       0
                CLUSTER*           144       0
                  LOGIN            155       0
                 ORIGIN*           178       6
                 ERRORS*           229      10
                SESSION*           236       7
              FILETYPES*           340      12
                   MISC*           341      10
                GENERAL*           362       8
                     OS*           414      29
            SEREFERRALS            587      34
                   TIME*          1270      24
                    DAY*          1293      31
                  ROBOT           1644      40
                BROWSER           1992     127
            SEARCHWORDS           2289     100
                 DOMAIN           2377     131
                  SIDER           3984     100
  UNKNOWNREFERERBROWSER           5439     105
                VISITOR           5980     100
         UNKNOWNREFERER          20585     317
              SIDER_404          74717    2199
               PAGEREFS         130982    2500
               KEYWORDS         288189   27036
  * = not shrinkable

  $ wc -c awstats122007.a.txt
  546074 awstats122007.a.txt
