Unlike tar, cpio does not have a separate entry for each directory, so
the link resolution can't assume that directory entries exist for each
path component.
I also mistakenly assumed that cpio handled hardlinks similarly to tar,
but that's clearly not the case. libarchive does help a bit, but these
differences still suck.
- Fix segfault on empty output (bug was in XS code)
- Still better end-of-URL detection
- Recognize a few common multicharacter sections in man references
- Grotty escape sequences are now better interpreted. I feel rather
stupid for not realizing the idea behind how those codes are supposed
to work earlier. It finally hit me when I read the BSD ul(1) source
code.
- URL end detection is slightly better (much better than the old C code)
- Man page references with : are recognized now (common in Perl modules).
- More efficient HTML escaping, no need to escape > and ".
There's still a bunch of improvements to make, but I have much more
confidence in the current implementation already.
The previous C code was troublesome.
- Didn't handle long lines
- I couldn't convince myself that it was free of memory safety issues
- Needed improving anyway, there are some formatting bugs. These are
hard to fix in the current code.
I mostly replicated the formatting bugs of the old C implementation in
Rust, and possibly added a few new bugs as well. It's not a significant
improvement right now, more testing and fixing will be needed.
The performance of both implementations is comparable, with the Rust
version being slightly faster in many cases (and slower in some others).
I did spend more time trying to optimize this Rust version than I did
with the old C code. I initially tried a naive-ish conversion of the C
code to Rust, but that turned out to be much slower and I had to resort
to using regexes and different data structures fix that.
HTTPS isn't used, so removing it saves some space.
The std SipHash API has been deprecated, and since hashing performance
isn't exactly critical in this case I've replaced it with SHA1, which
was already being used in man.rs.
Further improvements can be gained by caching the results of
get_contents(), since the same Contents file is often parsed multiple
times in a single cron run. But this is already a significant
achievement.
This is where the old nav menu used to be. This involved shrinking the
width of the locations/versions selector, but that never needed the full
page width anyway. Unfortunately I suck at CSS so the nav menu and
selector thing won't look too great on smaller screen sizes; but that's
just a minor visual uglyness.
The encoding metadata will be very useful in finding badly decoded man
pages. The package 'arch' is necessary to properly identify which
package was used, which is not obvious now that I'm going to switch more
systems to the (more common) x86_64 arch.