Get rid of package categories

Whether or not the package name itself or the (category,name) tuple
uniquely identified a package within a system has been a source of
confusion for a long time. Back in
03d278e4ff I ended up playing playing it
"safe" by going for (category,name), but in practice this doesn't make a
whole lot of sense. While it's *possible* for the same package name to
refer to completely different packages in different "categories", in
reality distributions can't sanely support this anyway.

For distributions where the category referred to a repository, the only
cases where the same package name was used in different repos was when
the package has moved from one repo to another. Those should certainly
not be treated as different packages.

For distributions where the category really referred to a category,
there's the Debian approach where the category is purely a tag and
doesn't help identify the package in any way, and then there's FreeBSD
where the category technically ought to be part of the name.  There were
a few cases where FreeBSD used categories to separate out different
versions of the same package (e.g. ipv6 vs non-ipv6), but none were
relevant for man pages so I ended up merging those as well.

Getting rid of the categories simplifies and shortens URLs, unclutters
the UI a little bit and merges the packages in listings that should've
been merged all along.

Migration script:

  -- Merge packages that are in multiple categories.
  -- All versions are moved to the package with the lowest ID.
  -- If the same version already exists in a lower ID, the higher-ID version is deleted.
  BEGIN;
  WITH migrate(old, new, second) AS (
    SELECT q.id, MIN(p.id), MAX(p.id)
      FROM packages p
      JOIN packages q ON q.id > p.id AND p.system = q.system AND p.name = q.name
     GROUP BY q.id
  ), ded(n) AS (
    UPDATE packages SET dead = false
      FROM migrate m
      JOIN packages q ON q.id = m.old
     WHERE packages.id = m.new AND packages.dead AND NOT q.dead
    RETURNING 1
  ), mov(n) AS (
    UPDATE package_versions SET package = m.new
      FROM migrate m
     WHERE package_versions.package = m.old
       AND NOT EXISTS(
          SELECT 1
            FROM package_versions v
           WHERE v.package IN(m.new, m.second)
             AND v.version = package_versions.version)
    RETURNING 1
  ), del(n) AS (
    DELETE FROM packages WHERE id IN(SELECT old FROM migrate)
    RETURNING 1
  ) SELECT (SELECT count(*) FROM migrate) AS migrate,
           (SELECT count(*) FROM ded) AS ded,
           (SELECT count(*) FROM mov) AS mov,
           (SELECT count(*) FROM del) AS del;

  ALTER TABLE packages DROP CONSTRAINT packages_system_name_category_key;
  CREATE UNIQUE INDEX packages_system_name_key ON packages (system, name);
  ALTER TABLE packages DROP COLUMN category;
  COMMIT;
This commit is contained in:
Yorhel 2024-04-28 10:37:02 +02:00
parent bc26633fc7
commit 83ab6c3671
16 changed files with 152 additions and 182 deletions

View file

@ -91,32 +91,36 @@ sub sql_and { @_ ? sql_join 'AND', map sql('(', $_, ')'), @_ : sql '1=1' }
sub sql_or { @_ ? sql_join 'OR', map sql('(', $_, ')'), @_ : sql '1=0' }
# Returns ($pkg_obj, $ver_str, $should_redir)
sub pkg_frompath {
my($sys_where, $path) = @_;
# $path should be "$category/$name" or "$category/$name/$version", since
# $category may contain a slash, let's try both options.
# $path could either be:
# $name
# $name/$version
# $category/$name (deprecated)
# $category/$name/$version (deprecated)
# $category may contain a slash. We don't have the categories in the
# database anymore, so we'll just provide a redirect for anything that
# looks like it might have been a category.
# $name currently never contains a slash but may do so in the future, so
# let's also handle that.
my sub lookup {
my($cat, $name) = @_;
tuwf->dbRowi('SELECT id, system, name, category FROM', $packages_with_man, 'p WHERE', $sys_where, 'AND category =', \$cat, 'AND name =', \$name);
}
my @comp = split '/', $path;
my @names = map join('/', @$_), map +([@comp[$_..$#comp]], [@comp[$_..$#comp-1]]), 0..$#comp;
# $category/$name
# e.g. contrib/games/alien
if($path =~ m{^(.+)/([^/]+)$}) {
my $pkg = lookup $1, $2;
return ($pkg, '') if $pkg->{id};
}
my $pkg = tuwf->dbRowi('
SELECT id, system, name
FROM', $packages_with_man, 'p
WHERE', $sys_where, 'AND name IN', \@names, '
ORDER BY system DESC, length(name) DESC
LIMIT 1
');
# $category/$name/$version
# e.g. contrib/games/alien/10.2
if($path =~ m{^(.+)/([^/]+)/([^/]+)$}) {
my $pkg = lookup $1, $2;
return ($pkg, $3) if $pkg->{id};
}
return (undef, '', 0) if !$pkg->{id};
(undef, '');
my $ver = $path =~ m{\Q$pkg->{name}\E/([^/]+)$} ? $1 : '';
($pkg, $ver, $path !~ /^\Q$pkg->{name}/);
}
@ -170,7 +174,7 @@ sub man_pref {
), f_pkgdate AS(
SELECT * FROM f_secorder a WHERE NOT EXISTS(SELECT 1 FROM f_secorder b WHERE (a.ver).released < (b.ver).released)
)
SELECT (pkg).system, (pkg).category, (pkg).name AS package, (ver).version, (ver).released, (ver).id AS verid,
SELECT (pkg).system, (pkg).name AS package, (ver).version, (ver).released, (ver).id AS verid,
name, section, filename, locale, shorthash, content
FROM f_pkgdate ORDER BY shorthash LIMIT 1
});
@ -396,12 +400,12 @@ TUWF::get '/info/about' => sub {
Will get the latest version of a man page from the given system, e.g.:<br>
<a href="/man/ubuntu/rsync">/man/ubuntu/rsync</a><br>
<a href="/man/ubuntu-xenial/rsync">/man/ubuntu-xenial/rsync</a></dd>
<dt><code>/man/&lt;system>/&lt;category>/&lt;package>/&lt;name>[.&lt;section>]</code></dt><dd>
<dt><code>/man/&lt;system>/&lt;package>/&lt;name>[.&lt;section>]</code></dt><dd>
Will get the latest version of a man page from the given package, e.g.:<br>
<a href="/man/ubuntu-xenial/net/rsync/rsync">/man/ubuntu-xenial/net/rsync/rsync</a></dd>
<dt><code>/man/&lt;system>/&lt;category>/&lt;package>/&lt;version>/&lt;name>[.&lt;section>]</code></dt><dd>
<a href="/man/ubuntu-xenial/rsync/rsync">/man/ubuntu-xenial/rsync/rsync</a></dd>
<dt><code>/man/&lt;system>/&lt;package>/&lt;version>/&lt;name>[.&lt;section>]</code></dt><dd>
Will get the man page from a specific package version, e.g.:<br>
<a href="/man/ubuntu-xenial/net/rsync/3.1.1-3ubuntu1/rsync">/man/ubuntu-xenial/net/rsync/3.1.1-3ubuntu1/rsync</a></dd>
<a href="/man/ubuntu-xenial/rsync/3.1.1-3ubuntu1/rsync">/man/ubuntu-xenial/rsync/3.1.1-3ubuntu1/rsync</a></dd>
<dt><code>/man.&lt;language>/...</code></dt><dd>
Adding a language code to the <code>/man/</code> component will select
the man page in the requested language. The man page has to be available
@ -427,7 +431,7 @@ TUWF::get '/info/about' => sub {
In all of the above URL formats, you can change <code>/man</code> with
<code>/raw</code> to get the raw UTF-8 encoded man page source, e.g.:<br>
<a href="/raw/socket.7">/raw/socket.7</a><br>
<a href="/raw/ubuntu-xenial/net/rsync/3.1.1-3ubuntu1/rsync">/raw/ubuntu-xenial/net/rsync/3.1.1-3ubuntu1/rsync</a><br>
<a href="/raw/ubuntu-xenial/rsync/3.1.1-3ubuntu1/rsync">/raw/ubuntu-xenial/rsync/3.1.1-3ubuntu1/rsync</a><br>
<a href="/raw.de/faked-tcp">/raw.de/faked-tcp</a><br>
<a href="/raw.910be0ed/fedora/ls">/raw.910be0ed/fedora/ls</a></dd>
<dt><code>/&lt;name>/&lt;8-hex-digits></code></dt><dd>
@ -444,12 +448,12 @@ TUWF::get '/info/about' => sub {
<p>Linking to individual packages is also possible. These pages will show a
listing of all manual pages available in the given package.</p>
<dl>
<dt><code>/pkg/&lt;system>/&lt;category>/&lt;package></code></dt><dd>
<dt><code>/pkg/&lt;system>/&lt;package></code></dt><dd>
For the latest version of a package (e.g. <a
href="/pkg/arch/core/coreutils">/pkg/arch/core/coreutils</a>).</dd>
<dt><code>/pkg/&lt;system>/&lt;category>/&lt;package>/&lt;version></code></dt><dd>
href="/pkg/arch/coreutils">/pkg/arch/coreutils</a>).</dd>
<dt><code>/pkg/&lt;system>/&lt;package>/&lt;version></code></dt><dd>
For a particular version of a package (e.g. <a
href="/pkg/arch/core/coreutils/8.25-2">/pkg/arch/core/coreutils/8.25-2</a>).</dd>
href="/pkg/arch/coreutils/8.25-2">/pkg/arch/coreutils/8.25-2</a>).</dd>
</dl>
<p>This site only indexes packages that actually have manual pages,
linking to a package that doesn't have any will result in a 404 page.</p>
@ -645,17 +649,17 @@ TUWF::get '/xml/search.xml' => sub {
# shorthash => 8-char hex
# lang => language code
# system => system shortname
# category => package category
# package => name of the package
# version => package version
# man => name of the man page
# section => man page section
#
# URL format:
# /$fmt[.$shorthash][.$lang][/$system[/$category/$package[/$version]]]/$man[.$section]
# /$fmt[.$shorthash][.$lang][/$system[[/$category]/$package[/$version]]]/$man[.$section]
#
# Note that the URL format has some ambiguity:
# - $category may contain a slash, so a database lookup is required to
# - $category (deprecated, only used for compatibility with old URLs) and
# $package may contain a slash, so a database lookup is required to
# disambiguate between URLs with [/$version] and those without.
# - $man may contain a dot, so a database lookup is required to disambiguate
# between URLs with [.$section] and those without
@ -671,7 +675,7 @@ package ManUrl {
my($o)=@_;
"/$o->{fmt}".(defined $o->{shorthash} ? ".$o->{shorthash}" : '').(defined $o->{lang} ? ".$o->{lang}" : '')
.(defined $o->{system} ? ("/$o->{system}"
.(defined $o->{category} ? ("/$o->{category}/$o->{package}"
.(defined $o->{package} ? ("/$o->{package}"
.(defined $o->{version} ? "/$o->{version}" : '')) : '')) : '')
.'/'.$o->mansect
};
@ -852,7 +856,7 @@ sub man_page {
'data-hasversions' => $hasversions?1:0,
sub {
li_ sub { a_ href => $url->set(fmt => 'raw'), 'source' };
li_ sub { a_ href => $url->set(system => sysbyid->{$man->{system}}{short}, category => undef, shorthash => shorthash_to_hex $man->{shorthash}), 'permalink' };
li_ sub { a_ href => $url->set(system => sysbyid->{$man->{system}}{short}, package => undef, shorthash => shorthash_to_hex $man->{shorthash}), 'permalink' };
li_ sub { a_ href => "/loc/$content->{hash}", 'locations' };
}
};
@ -907,10 +911,7 @@ TUWF::get qr{/(?<fmt>man|txt|raw)(?:\.(?<shorthash>[a-fA-F0-9]{8}))?(?:\.(?<lang
push @where, sql 'system IN', $sysid;
}
# $path is now either:
# 1. $category/$package
# 2. $cagegory/$package/$version
my($pkg, $ver) = length $path ? pkg_frompath sql_and(@where), $path : (undef,undef);
my($pkg, $ver, $redir) = length $path ? pkg_frompath sql_and(@where), $path : (undef,undef);
return tuwf->resNotFound if length $path && !$pkg;
push @where, sql 'p.id =', \$pkg->{id} if $pkg;
push @where, sql 'v.version =', \$ver if length $ver;
@ -926,12 +927,12 @@ TUWF::get qr{/(?<fmt>man|txt|raw)(?:\.(?<shorthash>[a-fA-F0-9]{8}))?(?:\.(?<lang
shorthash => $shorthash,
lang => $lang,
system => length $system ? $system : undef,
category => $pkg ? $pkg->{category} : undef,
package => $pkg ? $pkg->{name} : undef,
version => length $ver ? $ver : undef,
man => length $section ? $man->{name} : $name,
section => length $section ? $section : undef,
);
return tuwf->resRedirect($url, 'perm') if $redir;
man_page $man, $url;
};
@ -950,7 +951,7 @@ TUWF::get qr{/pkg/([^/]+)} => sub {
my $where = sql 'NOT dead AND system =', \$sys->{id}, $f->{c} ne 'all' ? ('AND match_firstchar(name,', \$f->{c}, ')') : ();
my $count = tuwf->dbVali('SELECT count(*) FROM', $packages_with_man, 'p WHERE', $where);
my $pkg = tuwf->dbPagei({ results => 200, page => $f->{p} },
'SELECT id, system, name, category, dead FROM', $packages_with_man, 'p WHERE', $where, 'ORDER BY name, category'
'SELECT id, system, name FROM', $packages_with_man, 'p WHERE', $where, 'ORDER BY name'
);
framework_ title => $sys->{full}, mainclass => 'pkglist', sub {
@ -970,8 +971,7 @@ TUWF::get qr{/pkg/([^/]+)} => sub {
paginate_ "/pkg/$short?c=$f->{c};p=", $count, 200, $f->{p};
ul_ sub {
li_ sub {
a_ href => "/pkg/$short/$_->{category}/$_->{name}", $_->{name};
small_ ' '.$_->{category};
a_ href => "/pkg/$short/$_->{name}", $_->{name};
} for @$pkg;
};
paginate_ "/pkg/$short?c=$f->{c};p=", $count, 200, $f->{p};
@ -979,15 +979,16 @@ TUWF::get qr{/pkg/([^/]+)} => sub {
};
# Package info: /pkg/$system/$category/$name (/$version); $category may contain a slash, too.
# Package info: /pkg/$system[/$category]/$name[/$version]; $category and $name may contain slashes, too.
TUWF::get qr{/pkg/([^/]+)/(.+)} => sub {
my ($short, $path) = tuwf->captures(1,2);
my $sys = sysbyshort->{$short};
return tuwf->resNotFound if !$sys;
my($pkg, $ver) = pkg_frompath(sql('system =', \$sys->{id}), $path);
my($pkg, $ver, $redir) = pkg_frompath(sql('system =', \$sys->{id}), $path);
return tuwf->resNotFound if !$pkg;
return tuwf->resRedirect("/pkg/$short/$pkg->{name}".($ver?"/$ver":''), 'perm') if $redir;
my $vers = tuwf->dbAlli('
SELECT id, version, released
@ -1023,8 +1024,8 @@ TUWF::get qr{/pkg/([^/]+)/(.+)} => sub {
# Latest version of this package determines last modification date of the page.
tuwf->resLastMod($vers->[0]{released});
my $subtitle = " / $pkg->{category} / $pkg->{name}";
my $pkgpath = "$sys->{short}/$pkg->{category}/$pkg->{name}";
my $subtitle = " / $pkg->{name}";
my $pkgpath = "$sys->{short}/$pkg->{name}";
framework_ title => "$sys->{full}$subtitle $sel->{version}", mainclass => 'pkgpage', sub {
h1_ sub {
a_ href => "/pkg/$sys->{short}", $sys->{full};
@ -1061,17 +1062,8 @@ TUWF::get qr{/pkg/([^/]+)/(.+)} => sub {
}
};
# /browse/<pkg> has been moved to /pkg/ with the package category added to the path
TUWF::get qr{/browse/([^/]+)} => sub { tuwf->resRedirect('/pkg/'.tuwf->capture(1), 'perm') };
TUWF::get qr{/browse/([^/]+)/([^/]+)(?:/([^/]+))?} => sub {
my($sys, $name, $ver) = tuwf->captures(1,2,3);
$sys = sysbyshort->{$sys};
return tuwf->resNotFound if !$sys;
my $pkgs = tuwf->dbRowi('SELECT category FROM packages WHERE system =', \$sys->{id}, 'AND name =', \$name, 'LIMIT 1');
return tuwf->resNotFound if !defined $pkgs->{category};
tuwf->resRedirect("/pkg/$sys->{short}/$pkgs->{category}/$name".($ver ? "/$ver" :''), 'perm');
};
# /browse/<pkg> has been moved to /pkg/.
TUWF::get qr{/browse/(.+)} => sub { tuwf->resRedirect('/pkg/'.tuwf->capture(1), 'perm') };
# Redirect for the system selection box, for visitors who have disabled JS.
TUWF::get qr{/sysredir/([^/]+)} => sub { tuwf->resRedirect('/man/'.(tuwf->reqGet('system')//'arch').'/'.tuwf->capture(1), 'temp') };
@ -1092,7 +1084,7 @@ TUWF::get qr{/loc/([a-fA-F0-9]{40})}, sub {
my $maxpersys = 500;
my $l = tuwf->dbAlli('
SELECT p.system, p.category, p.name AS package, v.version, f.filename, f.shorthash, m.name, m.section
SELECT p.system, p.name AS package, v.version, f.filename, f.shorthash, m.name, m.section
FROM files f
JOIN mans m ON m.id = f.man
JOIN package_versions v ON v.id = f.pkgver
@ -1153,8 +1145,7 @@ TUWF::get qr{/loc/([a-fA-F0-9]{40})}, sub {
txt_ $sys->{release};
} if $sys->{release};
td_ sub {
a_ href => "/pkg/$sys->{short}/$_->{category}/$_->{package}/$_->{version}", $_->{package}.'-'.$_->{version};
small_ ' '.$_->{category};
a_ href => "/pkg/$sys->{short}/$_->{package}/$_->{version}", $_->{package}.'-'.$_->{version};
};
td_ $_->{filename};
} for @{$sys{$sysname}}[0..min $maxpersys, $#{$sys{$sysname}}];
@ -1176,7 +1167,7 @@ TUWF::get '/json/tree.json' => sub {
return tuwf->resNotFound() if !$f->{hash} && !($f->{section} && $f->{name});
my $l = tuwf->dbAlli("
SELECT p.system, p.category, p.name AS package, v.version, v.released, v.id AS verid, m.name, m.section, f.filename, f.shorthash, l.locale
SELECT p.system, p.name AS package, v.version, v.released, v.id AS verid, m.name, m.section, f.filename, f.shorthash, l.locale
FROM files f
JOIN locales l ON l.id = f.locale
JOIN mans m ON m.id = f.man
@ -1212,14 +1203,14 @@ TUWF::get '/json/tree.json' => sub {
}
if(!$pkg || $m->{package} ne $pkg->{name}) {
$pkg = { name => $m->{package}, i => $m->{category}, table => [] };
$pkg = { name => $m->{package}, table => [] };
$pkgver = undef;
push @{$sysver->{childs}}, $pkg;
}
push @{$pkg->{table}}, [
$pkgver && $pkgver eq $m->{version} ? {name=>''} :
{name => $m->{version}, href => "/pkg/".sysbyid->{$m->{system}}{short}."/$m->{category}/$m->{package}/$m->{version}"},
{name => $m->{version}, href => "/pkg/".sysbyid->{$m->{system}}{short}."/$m->{package}/$m->{version}"},
{ name => "$m->{name}($m->{section})",
$f->{hash} || $cur == $m->{shorthash} ? ()
: (href => sprintf('/%s/%s', $m->{name}, shorthash_to_hex $m->{shorthash}))