manned

Author	SHA1	Message	Date
Yorhel	d19c56f285	Correctly handle a few more mis-identified locales	2021-12-16 13:44:39 +01:00
Yorhel	f376f1f137	Large-ish SQL schema revamp/optimizations Primarily aimed at reducing the size of the old 'man' (now: files) table, using smaller integers to refer to man contents and text fields, and storing a shorthash as an integer for quick lookups. This better normalization also removes the need to keep a separate 'man_index' cache for the search function. The old schema wasn't necessarily bad, but I was in the mood for some optimizations. And a little cleanup. Prolly introduces a bunch of new bugs, I haven't tested this too well.	2021-12-14 15:08:54 +01:00
Yorhel	7648603685	Recognize .zst-compressed man pages + fix SQL basename_from_filename() to recognize .xz Also greatly simplified basename_from_filename() because apparently I couldn't write regexes back then. (And the removed REFERENCES line is to sync schema.sql with the actual state of the DB, which doesn't have that constraint for some reason. I'll prolly fix that later)	2021-12-13 18:16:16 +01:00
Yorhel	b27d55215a	Arch: Mark deleted packages as dead and hide them from listings We've got a lot of packages in the DB that have long been removed from the Arch repos. These are still indexed, but won't clutter the package listing anymore. Also fixed an issue with packages.id numbers getting rather large because the indexer allocates a new ID for every package on every update.	2021-12-13 08:18:17 +01:00
Yorhel	82a626b7d4	indexer+www: Support Alpine Linux repos	2021-12-11 16:18:07 +01:00
Yorhel	c9e81a8922	indexer: More crate updates + warning fixes + 2018 edition	2021-12-11 14:56:22 +01:00
Yorhel	c48feedc85	indexer: Switch to ureq + debloat stuff a bit And stop using the "url" crate directly, its API is too unstable for it to be worth using. ...that applies to several other crates as well, but meh.	2021-12-11 12:26:57 +01:00
Yorhel	4588e67b64	Make the Rust garbage compile again	2021-12-11 11:53:26 +01:00
Yorhel	ce38ff885f	indexer: Don't overwrite man page contents when hash already exist Performance improvement. The ON CONFLICT DO UPDATE was primarily to make sure that old man pages would get fixed when the indexer did a better job at detecting the encoding, but there haven't been any relevant fixes to the indexer lately so this forced-update won't do much now.	2019-05-25 08:47:21 +02:00
Yorhel	2974ee929e	Rust: hyper -> reqwest for the indexer Since Hyper doesn't provide a synchronous API anymore.	2019-05-25 08:44:45 +02:00
Yorhel	f0df5092c3	Rust dep updates	2019-05-25 08:27:23 +02:00
Yorhel	7aa89145ca	indexer: Re-use memory buffer when reading RPM repo data This avoids reading the entire uncompressed XML into a buffer.	2018-05-04 15:25:35 +02:00
Yorhel	2c7bf1507a	indexer: Update crates to latest version With the exception of Hyper, because the new tokio-based version is... different.	2018-03-25 10:36:29 +02:00
Yorhel	8235fb28b8	indexer: Fix link resolution and hardlink handling for rpm Unlike tar, cpio does not have a separate entry for each directory, so the link resolution can't assume that directory entries exist for each path component. I also mistakenly assumed that cpio handled hardlinks similarly to tar, but that's clearly not the case. libarchive does help a bit, but these differences still suck.	2017-01-18 13:07:42 +01:00
Yorhel	608f79eb93	indexer: Add support for indexing RPM repositories This code hasn't been thoroughly tested, I'll see how things go when indexing a live repo. And XML parsing sucks in every language.	2017-01-17 17:05:03 +01:00
Yorhel	f77db5f541	indexer: Add bare RPM directory indexing This is for a few special cases, most RPM repos will have proper metadata and all.	2017-01-17 12:50:25 +01:00
Yorhel	d720441fb4	indexer: Rust crate updates	2017-01-17 11:01:11 +01:00
Yorhel	8d6e7bc2d8	indexer: Prioritize 7bit encodings when decoding man pages Fixes parsing of https://manned.org/xshisen/ae5d469f	2016-12-29 09:27:19 +01:00
Yorhel	eac4b6ac77	Dont index ELF binaries + remove some non-man-pages	2016-12-18 16:35:25 +01:00
Yorhel	d153004532	indexer: Support FreeBSD 9.3+; remove now obsolete add_index.pl	2016-12-18 15:08:56 +01:00
Yorhel	b9764fce4a	indexer: Remove openssl + replace siphash with sha1 in cache filename HTTPS isn't used, so removing it saves some space. The std SipHash API has been deprecated, and since hashing performance isn't exactly critical in this case I've replaced it with SHA1, which was already being used in man.rs.	2016-12-11 13:41:10 +01:00
Yorhel	defaa032f8	indexer: Support for indexing FreeBSD <9.3 repositories	2016-12-11 10:59:54 +01:00
Yorhel	1ca0cd4325	Indexer: Remove pointless check	2016-11-27 10:59:31 +01:00
Yorhel	b79ecfb284	indexer: Fix bug in Contents file parsing + decrease cron verbosity Turns out that not all Contents files heave a header.	2016-11-27 10:48:35 +01:00
Yorhel	eb15b6e2c7	indexer: Improve Debian Contents file parsing performance by 5.2x Further improvements can be gained by caching the results of get_contents(), since the same Contents file is often parsed multiple times in a single cron run. But this is already a significant achievement.	2016-11-26 16:57:05 +01:00
Yorhel	de28175cd3	Misc. indexing fixes	2016-11-20 16:41:08 +01:00
Yorhel	5d44d0e2ec	Indexer: Add --dryrun and workarounds for old deb repos	2016-11-20 11:39:00 +01:00
Yorhel	ecb1a9e25b	Indexer: Support reading date from .deb archives	2016-11-20 09:01:33 +01:00
Yorhel	a1e5a2d80d	Indexer: Improve logging + cache management	2016-11-20 07:31:55 +01:00
Yorhel	4bdd91f65e	Indexer: Initial support for debian repos	2016-11-19 15:27:24 +01:00
Yorhel	50fe17a604	Indexer: Support .deb archives	2016-11-15 21:15:35 +01:00
Yorhel	20141aa980	indexer: Improve charset detection + lower file cache time	2016-11-09 18:41:53 +01:00
Yorhel	7d2abfb3a4	indexer: Fix storing locale as NULL when empty Perhaps it's better to get rid of NULL and make empty the default value. But for now this'll do.	2016-11-06 16:24:45 +01:00
Yorhel	cb81bedac1	Add arch/encoding metadata to DB + Fetch Arch Linux x86_64 The encoding metadata will be very useful in finding badly decoded man pages. The package 'arch' is necessary to properly identify which package was used, which is not obvious now that I'm going to switch more systems to the (more common) x86_64 arch.	2016-11-06 16:05:16 +01:00
Yorhel	1ca43665a1	indexer: Add file caching + Arch Linux indexing	2016-11-06 13:34:22 +01:00
Yorhel	35fab522d6	Indexer: Support HTTP fetching + misc improvements	2016-11-06 09:21:53 +01:00
Yorhel	aff68205b0	Add postgres package indexing + cli options	2016-11-05 10:22:31 +01:00
Yorhel	0cab758665	Add support for man page reading & decoding	2016-10-30 11:06:14 +01:00
Yorhel	c8bb4da246	Use libarchive3-sys crate directly + improve archread API This all should offer a more convenient and robust interface to handle all sorts of archives.	2016-10-29 09:33:39 +02:00
Yorhel	022e9acc4f	WIP: Rewritten man page indexer in Rust Currently just figuring out how to read archives. Turns out to not be as simple as I had expected.	2016-10-22 14:54:37 +02:00

40 commits