Yorhel
82a626b7d4
indexer+www: Support Alpine Linux repos
2021-12-11 16:18:07 +01:00
Yorhel
c9e81a8922
indexer: More crate updates + warning fixes + 2018 edition
2021-12-11 14:56:22 +01:00
Yorhel
c48feedc85
indexer: Switch to ureq + debloat stuff a bit
...
And stop using the "url" crate directly, its API is too unstable for it
to be worth using.
...that applies to several other crates as well, but meh.
2021-12-11 12:26:57 +01:00
Yorhel
4588e67b64
Make the Rust garbage compile again
2021-12-11 11:53:26 +01:00
Yorhel
ce38ff885f
indexer: Don't overwrite man page contents when hash already exist
...
Performance improvement. The ON CONFLICT DO UPDATE was primarily to make
sure that old man pages would get fixed when the indexer did a better
job at detecting the encoding, but there haven't been any relevant fixes
to the indexer lately so this forced-update won't do much now.
2019-05-25 08:47:21 +02:00
Yorhel
2974ee929e
Rust: hyper -> reqwest for the indexer
...
Since Hyper doesn't provide a synchronous API anymore.
2019-05-25 08:44:45 +02:00
Yorhel
f0df5092c3
Rust dep updates
2019-05-25 08:27:23 +02:00
Yorhel
7aa89145ca
indexer: Re-use memory buffer when reading RPM repo data
...
This avoids reading the entire uncompressed XML into a buffer.
2018-05-04 15:25:35 +02:00
Yorhel
2c7bf1507a
indexer: Update crates to latest version
...
With the exception of Hyper, because the new tokio-based version is...
different.
2018-03-25 10:36:29 +02:00
Yorhel
8235fb28b8
indexer: Fix link resolution and hardlink handling for rpm
...
Unlike tar, cpio does not have a separate entry for each directory, so
the link resolution can't assume that directory entries exist for each
path component.
I also mistakenly assumed that cpio handled hardlinks similarly to tar,
but that's clearly not the case. libarchive does help a bit, but these
differences still suck.
2017-01-18 13:07:42 +01:00
Yorhel
608f79eb93
indexer: Add support for indexing RPM repositories
...
This code hasn't been thoroughly tested, I'll see how things go when
indexing a live repo.
And XML parsing sucks in every language.
2017-01-17 17:05:03 +01:00
Yorhel
f77db5f541
indexer: Add bare RPM directory indexing
...
This is for a few special cases, most RPM repos will have proper
metadata and all.
2017-01-17 12:50:25 +01:00
Yorhel
d720441fb4
indexer: Rust crate updates
2017-01-17 11:01:11 +01:00
Yorhel
8d6e7bc2d8
indexer: Prioritize 7bit encodings when decoding man pages
...
Fixes parsing of https://manned.org/xshisen/ae5d469f
2016-12-29 09:27:19 +01:00
Yorhel
eac4b6ac77
Dont index ELF binaries + remove some non-man-pages
2016-12-18 16:35:25 +01:00
Yorhel
d153004532
indexer: Support FreeBSD 9.3+; remove now obsolete add_index.pl
2016-12-18 15:08:56 +01:00
Yorhel
b9764fce4a
indexer: Remove openssl + replace siphash with sha1 in cache filename
...
HTTPS isn't used, so removing it saves some space.
The std SipHash API has been deprecated, and since hashing performance
isn't exactly critical in this case I've replaced it with SHA1, which
was already being used in man.rs.
2016-12-11 13:41:10 +01:00
Yorhel
defaa032f8
indexer: Support for indexing FreeBSD <9.3 repositories
2016-12-11 10:59:54 +01:00
Yorhel
1ca0cd4325
Indexer: Remove pointless check
2016-11-27 10:59:31 +01:00
Yorhel
b79ecfb284
indexer: Fix bug in Contents file parsing + decrease cron verbosity
...
Turns out that not all Contents files heave a header.
2016-11-27 10:48:35 +01:00
Yorhel
eb15b6e2c7
indexer: Improve Debian Contents file parsing performance by 5.2x
...
Further improvements can be gained by caching the results of
get_contents(), since the same Contents file is often parsed multiple
times in a single cron run. But this is already a significant
achievement.
2016-11-26 16:57:05 +01:00
Yorhel
de28175cd3
Misc. indexing fixes
2016-11-20 16:41:08 +01:00
Yorhel
5d44d0e2ec
Indexer: Add --dryrun and workarounds for old deb repos
2016-11-20 11:39:00 +01:00
Yorhel
ecb1a9e25b
Indexer: Support reading date from .deb archives
2016-11-20 09:01:33 +01:00
Yorhel
a1e5a2d80d
Indexer: Improve logging + cache management
2016-11-20 07:31:55 +01:00
Yorhel
4bdd91f65e
Indexer: Initial support for debian repos
2016-11-19 15:27:24 +01:00
Yorhel
50fe17a604
Indexer: Support .deb archives
2016-11-15 21:15:35 +01:00
Yorhel
20141aa980
indexer: Improve charset detection + lower file cache time
2016-11-09 18:41:53 +01:00
Yorhel
7d2abfb3a4
indexer: Fix storing locale as NULL when empty
...
Perhaps it's better to get rid of NULL and make empty the default value.
But for now this'll do.
2016-11-06 16:24:45 +01:00
Yorhel
cb81bedac1
Add arch/encoding metadata to DB + Fetch Arch Linux x86_64
...
The encoding metadata will be very useful in finding badly decoded man
pages. The package 'arch' is necessary to properly identify which
package was used, which is not obvious now that I'm going to switch more
systems to the (more common) x86_64 arch.
2016-11-06 16:05:16 +01:00
Yorhel
1ca43665a1
indexer: Add file caching + Arch Linux indexing
2016-11-06 13:34:22 +01:00
Yorhel
35fab522d6
Indexer: Support HTTP fetching + misc improvements
2016-11-06 09:21:53 +01:00
Yorhel
aff68205b0
Add postgres package indexing + cli options
2016-11-05 10:22:31 +01:00
Yorhel
0cab758665
Add support for man page reading & decoding
2016-10-30 11:06:14 +01:00
Yorhel
c8bb4da246
Use libarchive3-sys crate directly + improve archread API
...
This all should offer a more convenient and robust interface to handle
all sorts of archives.
2016-10-29 09:33:39 +02:00
Yorhel
022e9acc4f
WIP: Rewritten man page indexer in Rust
...
Currently just figuring out how to read archives. Turns out to not be as
simple as I had expected.
2016-10-22 14:54:37 +02:00