Commit graph

29 commits

Author SHA1 Message Date
Yorhel
7aa89145ca indexer: Re-use memory buffer when reading RPM repo data
This avoids reading the entire uncompressed XML into a buffer.
2018-05-04 15:25:35 +02:00
Yorhel
2c7bf1507a indexer: Update crates to latest version
With the exception of Hyper, because the new tokio-based version is...
different.
2018-03-25 10:36:29 +02:00
Yorhel
8235fb28b8 indexer: Fix link resolution and hardlink handling for rpm
Unlike tar, cpio does not have a separate entry for each directory, so
the link resolution can't assume that directory entries exist for each
path component.

I also mistakenly assumed that cpio handled hardlinks similarly to tar,
but that's clearly not the case. libarchive does help a bit, but these
differences still suck.
2017-01-18 13:07:42 +01:00
Yorhel
608f79eb93 indexer: Add support for indexing RPM repositories
This code hasn't been thoroughly tested, I'll see how things go when
indexing a live repo.

And XML parsing sucks in every language.
2017-01-17 17:05:03 +01:00
Yorhel
f77db5f541 indexer: Add bare RPM directory indexing
This is for a few special cases, most RPM repos will have proper
metadata and all.
2017-01-17 12:50:25 +01:00
Yorhel
d720441fb4 indexer: Rust crate updates 2017-01-17 11:01:11 +01:00
Yorhel
8d6e7bc2d8 indexer: Prioritize 7bit encodings when decoding man pages
Fixes parsing of https://manned.org/xshisen/ae5d469f
2016-12-29 09:27:19 +01:00
Yorhel
eac4b6ac77 Dont index ELF binaries + remove some non-man-pages 2016-12-18 16:35:25 +01:00
Yorhel
d153004532 indexer: Support FreeBSD 9.3+; remove now obsolete add_index.pl 2016-12-18 15:08:56 +01:00
Yorhel
b9764fce4a indexer: Remove openssl + replace siphash with sha1 in cache filename
HTTPS isn't used, so removing it saves some space.

The std SipHash API has been deprecated, and since hashing performance
isn't exactly critical in this case I've replaced it with SHA1, which
was already being used in man.rs.
2016-12-11 13:41:10 +01:00
Yorhel
defaa032f8 indexer: Support for indexing FreeBSD <9.3 repositories 2016-12-11 10:59:54 +01:00
Yorhel
1ca0cd4325 Indexer: Remove pointless check 2016-11-27 10:59:31 +01:00
Yorhel
b79ecfb284 indexer: Fix bug in Contents file parsing + decrease cron verbosity
Turns out that not all Contents files heave a header.
2016-11-27 10:48:35 +01:00
Yorhel
eb15b6e2c7 indexer: Improve Debian Contents file parsing performance by 5.2x
Further improvements can be gained by caching the results of
get_contents(), since the same Contents file is often parsed multiple
times in a single cron run. But this is already a significant
achievement.
2016-11-26 16:57:05 +01:00
Yorhel
de28175cd3 Misc. indexing fixes 2016-11-20 16:41:08 +01:00
Yorhel
5d44d0e2ec Indexer: Add --dryrun and workarounds for old deb repos 2016-11-20 11:39:00 +01:00
Yorhel
ecb1a9e25b Indexer: Support reading date from .deb archives 2016-11-20 09:01:33 +01:00
Yorhel
a1e5a2d80d Indexer: Improve logging + cache management 2016-11-20 07:31:55 +01:00
Yorhel
4bdd91f65e Indexer: Initial support for debian repos 2016-11-19 15:27:24 +01:00
Yorhel
50fe17a604 Indexer: Support .deb archives 2016-11-15 21:15:35 +01:00
Yorhel
20141aa980 indexer: Improve charset detection + lower file cache time 2016-11-09 18:41:53 +01:00
Yorhel
7d2abfb3a4 indexer: Fix storing locale as NULL when empty
Perhaps it's better to get rid of NULL and make empty the default value.
But for now this'll do.
2016-11-06 16:24:45 +01:00
Yorhel
cb81bedac1 Add arch/encoding metadata to DB + Fetch Arch Linux x86_64
The encoding metadata will be very useful in finding badly decoded man
pages. The package 'arch' is necessary to properly identify which
package was used, which is not obvious now that I'm going to switch more
systems to the (more common) x86_64 arch.
2016-11-06 16:05:16 +01:00
Yorhel
1ca43665a1 indexer: Add file caching + Arch Linux indexing 2016-11-06 13:34:22 +01:00
Yorhel
35fab522d6 Indexer: Support HTTP fetching + misc improvements 2016-11-06 09:21:53 +01:00
Yorhel
aff68205b0 Add postgres package indexing + cli options 2016-11-05 10:22:31 +01:00
Yorhel
0cab758665 Add support for man page reading & decoding 2016-10-30 11:06:14 +01:00
Yorhel
c8bb4da246 Use libarchive3-sys crate directly + improve archread API
This all should offer a more convenient and robust interface to handle
all sorts of archives.
2016-10-29 09:33:39 +02:00
Yorhel
022e9acc4f WIP: Rewritten man page indexer in Rust
Currently just figuring out how to read archives. Turns out to not be as
simple as I had expected.
2016-10-22 14:54:37 +02:00