Performance improvement. The ON CONFLICT DO UPDATE was primarily to make
sure that old man pages would get fixed when the indexer did a better
job at detecting the encoding, but there haven't been any relevant fixes
to the indexer lately so this forced-update won't do much now.
Unfortunately, this can lead to slightly confusing scenarios, because
the exact package of the displayed man page is not very well defined.
It's possible that, when browsing from a package listing to a man page,
you may see an included file that does not come from the package you
browsed from.
E.g. https://manned.org/pwrite/5f2909f6 - that man page simply includes
pread.2, but from the URL it's unclear from which package or system it
should be included.
The only way to fix this is to add the package ID to the link format.
Unlike tar, cpio does not have a separate entry for each directory, so
the link resolution can't assume that directory entries exist for each
path component.
I also mistakenly assumed that cpio handled hardlinks similarly to tar,
but that's clearly not the case. libarchive does help a bit, but these
differences still suck.
- Fix segfault on empty output (bug was in XS code)
- Still better end-of-URL detection
- Recognize a few common multicharacter sections in man references
- Grotty escape sequences are now better interpreted. I feel rather
stupid for not realizing the idea behind how those codes are supposed
to work earlier. It finally hit me when I read the BSD ul(1) source
code.
- URL end detection is slightly better (much better than the old C code)
- Man page references with : are recognized now (common in Perl modules).
- More efficient HTML escaping, no need to escape > and ".
There's still a bunch of improvements to make, but I have much more
confidence in the current implementation already.