This fixes selecting the right man page for 'mount', which would
otherwise grab a version from an old distro that happened to have it
with an explicit 'en' locale.
Primarily aimed at reducing the size of the old 'man' (now: files)
table, using smaller integers to refer to man contents and text fields,
and storing a shorthash as an integer for quick lookups. This better
normalization also removes the need to keep a separate 'man_index' cache
for the search function.
The old schema wasn't necessarily bad, but I was in the mood for some
optimizations. And a little cleanup.
Prolly introduces a bunch of new bugs, I haven't tested this too well.
Also greatly simplified basename_from_filename() because apparently I
couldn't write regexes back then.
(And the removed REFERENCES line is to sync schema.sql with the actual
state of the DB, which doesn't have that constraint for some reason.
I'll prolly fix that later)
We've got a lot of packages in the DB that have long been removed from
the Arch repos. These are still indexed, but won't clutter the package
listing anymore.
Also fixed an issue with packages.id numbers getting rather large
because the indexer allocates a new ID for every package on every
update.
The encoding metadata will be very useful in finding badly decoded man
pages. The package 'arch' is necessary to properly identify which
package was used, which is not obvious now that I'm going to switch more
systems to the (more common) x86_64 arch.