Commit graph

168 commits

Author SHA1 Message Date
Yorhel
06694fd131 Style changes 2017-01-20 09:55:43 +01:00
Yorhel
8235fb28b8 indexer: Fix link resolution and hardlink handling for rpm
Unlike tar, cpio does not have a separate entry for each directory, so
the link resolution can't assume that directory entries exist for each
path component.

I also mistakenly assumed that cpio handled hardlinks similarly to tar,
but that's clearly not the case. libarchive does help a bit, but these
differences still suck.
2017-01-18 13:07:42 +01:00
Yorhel
608f79eb93 indexer: Add support for indexing RPM repositories
This code hasn't been thoroughly tested, I'll see how things go when
indexing a live repo.

And XML parsing sucks in every language.
2017-01-17 17:05:03 +01:00
Yorhel
f77db5f541 indexer: Add bare RPM directory indexing
This is for a few special cases, most RPM repos will have proper
metadata and all.
2017-01-17 12:50:25 +01:00
Yorhel
d720441fb4 indexer: Rust crate updates 2017-01-17 11:01:11 +01:00
Yorhel
1923b9901d Support bold+italic in HTML conversion 2017-01-16 09:52:32 +01:00
Yorhel
746889851c A few more HTML conversion improvements
- Fix segfault on empty output (bug was in XS code)
- Still better end-of-URL detection
- Recognize a few common multicharacter sections in man references
2017-01-15 20:27:16 +01:00
Yorhel
1ccc86ce86 Whole bunch of HTML conversion improvements
- Grotty escape sequences are now better interpreted. I feel rather
  stupid for not realizing the idea behind how those codes are supposed
  to work earlier. It finally hit me when I read the BSD ul(1) source
  code.
- URL end detection is slightly better (much better than the old C code)
- Man page references with : are recognized now (common in Perl modules).
- More efficient HTML escaping, no need to escape > and ".

There's still a bunch of improvements to make, but I have much more
confidence in the current implementation already.
2017-01-15 17:07:03 +01:00
Yorhel
6114b17389 Experimental rewrite of grotty to html conversion in Rust
The previous C code was troublesome.
- Didn't handle long lines
- I couldn't convince myself that it was free of memory safety issues
- Needed improving anyway, there are some formatting bugs. These are
  hard to fix in the current code.

I mostly replicated the formatting bugs of the old C implementation in
Rust, and possibly added a few new bugs as well. It's not a significant
improvement right now, more testing and fixing will be needed.

The performance of both implementations is comparable, with the Rust
version being slightly faster in many cases (and slower in some others).
I did spend more time trying to optimize this Rust version than I did
with the old C code. I initially tried a naive-ish conversion of the C
code to Rust, but that turned out to be much slower and I had to resort
to using regexes and different data structures fix that.
2017-01-15 12:17:34 +01:00
Yorhel
8a3af4aee2 util/freebsd.sh: Fix copy-paste error in package dates 2016-12-30 18:10:46 +01:00
Yorhel
8d6e7bc2d8 indexer: Prioritize 7bit encodings when decoding man pages
Fixes parsing of https://manned.org/xshisen/ae5d469f
2016-12-29 09:27:19 +01:00
Yorhel
eac4b6ac77 Dont index ELF binaries + remove some non-man-pages 2016-12-18 16:35:25 +01:00
Yorhel
d153004532 indexer: Support FreeBSD 9.3+; remove now obsolete add_index.pl 2016-12-18 15:08:56 +01:00
Yorhel
b9764fce4a indexer: Remove openssl + replace siphash with sha1 in cache filename
HTTPS isn't used, so removing it saves some space.

The std SipHash API has been deprecated, and since hashing performance
isn't exactly critical in this case I've replaced it with SHA1, which
was already being used in man.rs.
2016-12-11 13:41:10 +01:00
Yorhel
defaa032f8 indexer: Support for indexing FreeBSD <9.3 repositories 2016-12-11 10:59:54 +01:00
Yorhel
1ca0cd4325 Indexer: Remove pointless check 2016-11-27 10:59:31 +01:00
Yorhel
b79ecfb284 indexer: Fix bug in Contents file parsing + decrease cron verbosity
Turns out that not all Contents files heave a header.
2016-11-27 10:48:35 +01:00
Yorhel
eb15b6e2c7 indexer: Improve Debian Contents file parsing performance by 5.2x
Further improvements can be gained by caching the results of
get_contents(), since the same Contents file is often parsed multiple
times in a single cron run. But this is already a significant
achievement.
2016-11-26 16:57:05 +01:00
Yorhel
de28175cd3 Misc. indexing fixes 2016-11-20 16:41:08 +01:00
Yorhel
46a6e2ff7c Use Rust indexer for Ubuntu + script cleanup 2016-11-20 15:01:22 +01:00
Yorhel
2ee2f7495b Reorganize indexing scripts + use Rust for Debian 2016-11-20 12:34:02 +01:00
Yorhel
5d44d0e2ec Indexer: Add --dryrun and workarounds for old deb repos 2016-11-20 11:39:00 +01:00
Yorhel
ecb1a9e25b Indexer: Support reading date from .deb archives 2016-11-20 09:01:33 +01:00
Yorhel
a1e5a2d80d Indexer: Improve logging + cache management 2016-11-20 07:31:55 +01:00
Yorhel
4bdd91f65e Indexer: Initial support for debian repos 2016-11-19 15:27:24 +01:00
Yorhel
50fe17a604 Indexer: Support .deb archives 2016-11-15 21:15:35 +01:00
Yorhel
1f05463c3a About page: Remove TOC feature as planned 2016-11-09 19:01:24 +01:00
Yorhel
aa01365e60 Move nav menu a bit up to create space
This is where the old nav menu used to be. This involved shrinking the
width of the locations/versions selector, but that never needed the full
page width anyway. Unfortunately I suck at CSS so the nav menu and
selector thing won't look too great on smaller screen sizes; but that's
just a minor visual uglyness.
2016-11-09 18:58:34 +01:00
Yorhel
09af881767 Add TOC listing + more section/lang select back into a nav menu 2016-11-09 18:43:10 +01:00
Yorhel
20141aa980 indexer: Improve charset detection + lower file cache time 2016-11-09 18:41:53 +01:00
Yorhel
7d2abfb3a4 indexer: Fix storing locale as NULL when empty
Perhaps it's better to get rid of NULL and make empty the default value.
But for now this'll do.
2016-11-06 16:24:45 +01:00
Yorhel
cb81bedac1 Add arch/encoding metadata to DB + Fetch Arch Linux x86_64
The encoding metadata will be very useful in finding badly decoded man
pages. The package 'arch' is necessary to properly identify which
package was used, which is not obvious now that I'm going to switch more
systems to the (more common) x86_64 arch.
2016-11-06 16:05:16 +01:00
Yorhel
b8a1945d38 Merge branch 'indexer' 2016-11-06 15:26:42 +01:00
Yorhel
5e39af459f Replace old Arch Linux scripts with new indexer 2016-11-06 15:26:20 +01:00
Yorhel
1ca43665a1 indexer: Add file caching + Arch Linux indexing 2016-11-06 13:34:22 +01:00
Yorhel
35fab522d6 Indexer: Support HTTP fetching + misc improvements 2016-11-06 09:21:53 +01:00
Yorhel
aff68205b0 Add postgres package indexing + cli options 2016-11-05 10:22:31 +01:00
Yorhel
0cab758665 Add support for man page reading & decoding 2016-10-30 11:06:14 +01:00
Yorhel
c8bb4da246 Use libarchive3-sys crate directly + improve archread API
This all should offer a more convenient and robust interface to handle
all sorts of archives.
2016-10-29 09:33:39 +02:00
Yorhel
9db73b2709 Fix CSS I accidentally removed 2016-10-26 21:31:47 +02:00
Yorhel
863fae2476 Add link to manpag.es 2016-10-26 19:27:57 +02:00
Yorhel
25a39c6fe4 Improved pagination on package info pages 2016-10-26 19:25:23 +02:00
Yorhel
022e9acc4f WIP: Rewritten man page indexer in Rust
Currently just figuring out how to read archives. Turns out to not be as
simple as I had expected.
2016-10-22 14:54:37 +02:00
Yorhel
965aa9a2f6 Add Ubuntu 16.10 2016-10-19 07:30:49 +02:00
Yorhel
7535218a06 Add FreeBSD 11.0 2016-10-18 07:09:27 +02:00
Yorhel
a7352d27b9 Fix possible wrapping of MANNEDINCLUDE by removing space
This doesn't really guarantee that it won't wrap, but fixes at least one
man page.

- https://manned.org/BlockSelectionDCOPInterface/6dfdf921
2016-10-16 10:28:44 +02:00
Yorhel
5436435c3f Improve handling of man names with special characters
The 'source' link was broken for mans with [ or ] characters.
All links were broken for mans with space characters.

Man page of the week:
https://manned.org/KGenericFactory_%20KTypeList_%20Product,%20ProductListTail%20_,%20KTypeList_%20ParentType,%20ParentTypeListTail%20_%20_/dfc33ca6

There's a 5 man pages left with a '%' or '#' character. I've no idea if
it's worth handling those; A fix for these isn't going to be as trivial
as this commit.
2016-10-16 10:19:27 +02:00
Yorhel
8a0fac08b6 DB cleanup: Remove some non-manpages & fix wrongly-detected locales 2016-10-16 10:03:34 +02:00
Yorhel
17fc298217 Fix handling of URLs ending in a ⟩
I've known about this issue before, but didn't realize it was so
widespread. This fixes many links.
2016-10-16 09:11:15 +02:00
Yorhel
7d31f41ba8 Add FreeBSD 10.3 2016-10-15 22:37:58 +02:00