Add SQLObject article
This commit is contained in:
parent
bc5ca25127
commit
41eb0a6f42
7 changed files with 676 additions and 3 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -48,6 +48,7 @@ pub/doc/easyipc.html
|
||||||
pub/doc/funcweb.html
|
pub/doc/funcweb.html
|
||||||
pub/doc/pwlookup.html
|
pub/doc/pwlookup.html
|
||||||
pub/doc/sqlaccess.html
|
pub/doc/sqlaccess.html
|
||||||
|
pub/doc/sqlobject.html
|
||||||
pub/dump.html
|
pub/dump.html
|
||||||
pub/dump/awshrink.html
|
pub/dump/awshrink.html
|
||||||
pub/dump/btrfssize.html
|
pub/dump/btrfssize.html
|
||||||
|
|
|
||||||
1
Makefile
1
Makefile
|
|
@ -21,6 +21,7 @@ PAGES=\
|
||||||
"doc/funcweb.md"\
|
"doc/funcweb.md"\
|
||||||
"doc/pwlookup.md"\
|
"doc/pwlookup.md"\
|
||||||
"doc/sqlaccess.md"\
|
"doc/sqlaccess.md"\
|
||||||
|
"doc/sqlobject.md"\
|
||||||
"dump.md"\
|
"dump.md"\
|
||||||
"dump/awshrink.md"\
|
"dump/awshrink.md"\
|
||||||
"dump/btrfssize.md"\
|
"dump/btrfssize.md"\
|
||||||
|
|
|
||||||
|
|
@ -6,6 +6,10 @@ rare occasions are published on this page.
|
||||||
|
|
||||||
## Articles That May As Well Be Considered Blog Posts
|
## Articles That May As Well Be Considered Blog Posts
|
||||||
|
|
||||||
|
`2019-08-13` - [From SQL to Nested Data Structures](/doc/sqlobject)
|
||||||
|
: How to easily fetch complex nested data structures from a normalized
|
||||||
|
relational database.
|
||||||
|
|
||||||
`2019-05-14` - [Fast Key Lookup with a Small Read-Only Database](/doc/pwlookup)
|
`2019-05-14` - [Fast Key Lookup with a Small Read-Only Database](/doc/pwlookup)
|
||||||
: How to quickly check if a password is in a large (but nicely compressed)
|
: How to quickly check if a password is in a large (but nicely compressed)
|
||||||
dictionary.
|
dictionary.
|
||||||
|
|
|
||||||
470
dat/doc/sqlobject.md
Normal file
470
dat/doc/sqlobject.md
Normal file
|
|
@ -0,0 +1,470 @@
|
||||||
|
% From SQL to Nested Data Structures
|
||||||
|
|
||||||
|
(Published on **2019-08-13**)
|
||||||
|
|
||||||
|
This is the most typical problem you'll find in every application that talks
|
||||||
|
with an SQL database: You want to fetch a listing of some kind of database
|
||||||
|
entry, but the entries themselves may contain nested structured data.
|
||||||
|
|
||||||
|
To demonstrate what I mean, let's start with a simple example of a game
|
||||||
|
database[^1]: We have a table with rows for each top-level game entry and a
|
||||||
|
table for releases of said games. There a one-to-many relation between games
|
||||||
|
and releases.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE games (
|
||||||
|
game_id integer PRIMARY KEY,
|
||||||
|
title text
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE releases (
|
||||||
|
release_id integer PRIMARY KEY,
|
||||||
|
game_id integer REFERENCES games,
|
||||||
|
release_date date
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Now suppose we want to display a listing of games with their release dates. It
|
||||||
|
would be nice if the database could give us a single JSON document with the
|
||||||
|
following structure:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"game_id": 7,
|
||||||
|
"title": "Tsukihime",
|
||||||
|
"releases": [
|
||||||
|
{ "release_id": 339, "release_date": "2000-12-29" },
|
||||||
|
{ "release_id": 341, "release_date": "2006-12-28" }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ ... }
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
But relational databases are designed to only deal with rows and columns, so
|
||||||
|
getting that JSON structure with a single query is not going to happen. Some
|
||||||
|
databases do support structured column types, but dynamically constructing
|
||||||
|
those can be unwieldy and/or slow.
|
||||||
|
|
||||||
|
A common and simple approach to fetch nested data is to first fetch the list of
|
||||||
|
games from the database, then iterate over the resulting rows and fetch the
|
||||||
|
relevant release entries at each iteration. I'll be using Perl 5 and
|
||||||
|
[DBIx::Simple](https://metacpan.org/pod/DBIx::Simple) in this article, but the
|
||||||
|
ideas should translate to any other dynamic programming language. Here's an
|
||||||
|
example of such iteration:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
for my $game (@games) {
|
||||||
|
$game->{releases} = [
|
||||||
|
$db->query(
|
||||||
|
'SELECT release_id, release_date FROM releases WHERE game_id = ?',
|
||||||
|
$game->{game_id}
|
||||||
|
)->hashes
|
||||||
|
];
|
||||||
|
}
|
||||||
|
# `@games` now has the requested structure.
|
||||||
|
```
|
||||||
|
|
||||||
|
This approach works, but it's not super convenient and it can be really slow
|
||||||
|
when fetching many rows. Of course, this is not a new problem. Almost every
|
||||||
|
programming language has several libraries and frameworks available to it to
|
||||||
|
help with fetching complex data structures from SQL databases, ranging from a
|
||||||
|
few helper functions to complete
|
||||||
|
[ORMs](https://en.wikipedia.org/wiki/Object-relational_mapping). What I propose
|
||||||
|
here is not a groundbreaking new technique, nor am I going to pitch a
|
||||||
|
particular library. Instead, I describe a simple yet generic API to deal with
|
||||||
|
the problem.
|
||||||
|
|
||||||
|
|
||||||
|
## Constructing SQL queries in Perl
|
||||||
|
|
||||||
|
But before I get to the solution, let me first briefly cover how one can
|
||||||
|
dynamically construct and compose SQL queries. Having such an abstraction is
|
||||||
|
useful in general, but these concepts will also simplify the rest of this
|
||||||
|
article.
|
||||||
|
|
||||||
|
SQL queries are treated as strings in most programming languages and one can
|
||||||
|
dynamically construct and compose queries by simple string concatenation.
|
||||||
|
There are several problems with actually constructing queries that way, but,
|
||||||
|
for our purposes, it almost suffices. Almost. The biggest problem in terms of
|
||||||
|
composability is that, not only do you need to concatenate strings to get the
|
||||||
|
final query, you also need to keep track of bind parameters. And that can get
|
||||||
|
annoying real quick when your queries are dynamic.
|
||||||
|
|
||||||
|
Here's a simple example of using string interpolation to abstract common
|
||||||
|
functionality into a separate function, without losing much on flexibility:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
# Our abstraction to fetch game entries.
|
||||||
|
sub fetch_games {
|
||||||
|
my $game_ids = shift;
|
||||||
|
$db->query("SELECT * FROM games WHERE game_id IN($game_ids)");
|
||||||
|
# We could be doing some transformations here.
|
||||||
|
# Sorting, pagination, fetching extra information, etc.
|
||||||
|
}
|
||||||
|
|
||||||
|
# We can use that function to fetch games that have already been released.
|
||||||
|
my $ids = "SELECT game_id FROM releases WHERE release_date <= CURRENT_DATE";
|
||||||
|
my $released_games = fetch_games($ids);
|
||||||
|
```
|
||||||
|
|
||||||
|
But what if we wanted to replace that `CURRENT_DATE` with a user-provided
|
||||||
|
value? Doing that without fear of arbitrary SQL injection involves using a bind
|
||||||
|
parameter, which our `fetch_games()` function doesn't support. Fortunately,
|
||||||
|
there's a solution. Or rather, there are many solutions, but let's stick with
|
||||||
|
one I'm familiar with: [SQL::Interp](https://metacpan.org/pod/SQL::Interp).
|
||||||
|
That module provides, among other things, a wonderful function called `sql()`
|
||||||
|
that takes a list of SQL strings and variable references (which are converted
|
||||||
|
into bind parameters) and returns a single value that can be interpolated
|
||||||
|
elsewhere. Let's use it to transform the above example:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
use SQL::Inerp 'sql';
|
||||||
|
|
||||||
|
sub fetch_games {
|
||||||
|
my $game_ids = shift;
|
||||||
|
# Note the use of `iquery` instead of `query`,
|
||||||
|
# this is a DBIx::Simple wrapper around `sql_interp()`.
|
||||||
|
$db->iquery("SELECT * FROM games WHERE game_id IN(", $game_ids, ")");
|
||||||
|
}
|
||||||
|
|
||||||
|
my $latest_date = '2004-01-20'; # Our user input
|
||||||
|
|
||||||
|
# This is our "SQL + bind parameters" in a single variable.
|
||||||
|
my $ids = sql("SELECT game_id FROM releases WHERE release_date <=", \$latest_date);
|
||||||
|
|
||||||
|
# ...which can be passed to fetch_games():
|
||||||
|
my $games_with_release_before_latest_date = fetch_games($ids);
|
||||||
|
```
|
||||||
|
|
||||||
|
That gives us safe variable interpolation into our SQL queries and we can
|
||||||
|
compose queries as if they were normal strings[^2]. You can do a lot more
|
||||||
|
powerful query construction tricks with SQL::Interp, but I'll not cover that
|
||||||
|
here. Its documentation has several examples and there are hints at even more
|
||||||
|
fun tricks in the
|
||||||
|
[SQL::Interpolate](https://metacpan.org/pod/SQL::Interpolate::Macro#Builtin-Macros)
|
||||||
|
docs (which is a different and older module, but those examples are easily
|
||||||
|
ported).
|
||||||
|
|
||||||
|
|
||||||
|
## Structured documents
|
||||||
|
|
||||||
|
With that out of the way, we can go back to fetching nested structures from a
|
||||||
|
relational database. Let's build upon the iteration example given earlier and
|
||||||
|
improve its performance. Instead of running a separate query to fetch the
|
||||||
|
releases for each game, let's fetch the releases of all the relevant games in a
|
||||||
|
single query. We select the necessary rows with an `IN(id1,id2,,...)` clause
|
||||||
|
and then merge the returned data back into the `@games` structure, as follows:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
# List of game_ids, which we can use for our IN() clause.
|
||||||
|
my @game_ids = map $_->{game_id}, @games;
|
||||||
|
|
||||||
|
# Fetch all the releases linked to @game_ids.
|
||||||
|
my @releases = $db->iquery(
|
||||||
|
'SELECT game_id, release_id, release_date FROM releases WHERE game_id IN',
|
||||||
|
\@game_ids
|
||||||
|
)->hashes;
|
||||||
|
|
||||||
|
# Create a 'game_id' => [releases] lookup table for quick access
|
||||||
|
my %releases;
|
||||||
|
for my $release (@releases) {
|
||||||
|
# Add this release to %releases and remove the 'game_id' column,
|
||||||
|
# which was only needed for this merging step.
|
||||||
|
push @{$releases{ delete $release->{game_id} }}, $release;
|
||||||
|
}
|
||||||
|
|
||||||
|
# Now merge the release information back into @games
|
||||||
|
for my $game (@games) {
|
||||||
|
$game->{releases} = $releases{ $game->{game_id} } || [];
|
||||||
|
}
|
||||||
|
# `@games` now has the requested structure.
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
It works and it's beautifully fast, but it's also verbose and not something
|
||||||
|
you'd like to type out every time you want to query the database. Okay, I
|
||||||
|
admit, it's more verbose than necessary because I wanted this code to be
|
||||||
|
readable by people who aren't experts in Perl. It can be done with less code
|
||||||
|
but the number of steps won't change much. Rather than playing code golf, let's
|
||||||
|
do what every sensible programmer does when they notice repetition: Let's
|
||||||
|
abstract the repeated bits into a separate function.
|
||||||
|
|
||||||
|
Let's first define the inputs of the function, we'll need:
|
||||||
|
|
||||||
|
- `@games`: The list of objects that we want to extend.
|
||||||
|
- `"releases"`: The name of the field we are adding to the game objects.
|
||||||
|
- `"game_id"`: The column used for querying and merging.
|
||||||
|
- A function that, given a list of game ids, returns a query for fetching the
|
||||||
|
release information.
|
||||||
|
|
||||||
|
The function would return a modified list of objects with the new information
|
||||||
|
embedded inside. Since we'll be passing the objects to the function as a
|
||||||
|
reference and since it's easier to modify data in-place in Perl, that's what
|
||||||
|
we'll be doing. In other (more functional) languages, it may be easier to avoid
|
||||||
|
mutating the given objects and return a list of fresh new objects instead.
|
||||||
|
|
||||||
|
Here's what that function looks like. It does exactly the same as the code
|
||||||
|
listed above[^3], so I figure I can get away with a shorter and slightly less
|
||||||
|
readable version this time:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
sub enrich {
|
||||||
|
my($field_name, $merge_field, $sql, @objects) = @_;
|
||||||
|
my %ids = map +($_->{$merge_field}, []), @objects;
|
||||||
|
return if !keys %ids;
|
||||||
|
my @result = $db->iquery( $sql->([keys %ids]) )->hashes;
|
||||||
|
push @{$ids{ delete $_->{$merge_field} }}, $_ for (@result);
|
||||||
|
$_->{ $field_name } = $ids{ $_->{$merge_field} } for (@objects);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And here's how we can rewrite the previous example by using that function:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
enrich 'releases', 'game_id', sub {
|
||||||
|
sql 'SELECT game_id, release_id, release_date FROM releases WHERE game_id IN', $_[0]
|
||||||
|
}, @games;
|
||||||
|
|
||||||
|
# `@games` now has the requested structure.
|
||||||
|
```
|
||||||
|
|
||||||
|
It's short, fast and readable. At least, it is once you get used to the
|
||||||
|
`enrich` function - I admit it may not be very intuitive the first time you see
|
||||||
|
it.
|
||||||
|
|
||||||
|
|
||||||
|
## More nesting
|
||||||
|
|
||||||
|
Let's expand our example database with a new table. Each release may be
|
||||||
|
available for one or more platforms:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE releases_platforms (
|
||||||
|
release_id integer REFERENCES releases,
|
||||||
|
platform text
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
We also want that list of platforms to be included in our nested data
|
||||||
|
structure, so that it looks like this:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"game_id": 7,
|
||||||
|
"title": "Tsukihime",
|
||||||
|
"releases": [
|
||||||
|
{
|
||||||
|
"release_id": 339,
|
||||||
|
"release_date": "2000-12-29",
|
||||||
|
"platforms": [
|
||||||
|
{ "platform": "Linux" },
|
||||||
|
{ "platform": "Windows" }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ ... }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ ... }
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Fortunately, the `enrich` function can easily handle nested objects: all we
|
||||||
|
have to do is pass a list of *release* objects to `enrich` instead of *game*
|
||||||
|
objects, and we can do that with a simple `map`:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
enrich 'platforms', 'release_id', sub {
|
||||||
|
sql 'SELECT release_id, platform FROM releases_platforms WHERE release_id IN', $_[0]
|
||||||
|
}, map @{$_->{releases}}, @games;
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it, we now have the data structure as described above. It is possible to
|
||||||
|
nest arbitrarily deep with this approach.
|
||||||
|
|
||||||
|
This works because `enrich` modifies the data structure in-place. A version
|
||||||
|
that copies the given data structure would be slightly more complex, but that's
|
||||||
|
a surmountable problem.
|
||||||
|
|
||||||
|
|
||||||
|
## Variations
|
||||||
|
|
||||||
|
That array of `platform` objects in the previous example is more verbose than
|
||||||
|
we'd like. It would be nicer if we could flatten that to an array of platform
|
||||||
|
strings, so that a release object looks more like this:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"release_id": 339,
|
||||||
|
"release_date": "2000-12-29",
|
||||||
|
"platforms": [ "Linux", "Windows" ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course, such a variation of `enrich` could be written quite easily:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
sub enrich_flatten {
|
||||||
|
# This is the same as the original enrich()
|
||||||
|
my($field_name, $merge_field, $sql, @objects) = @_;
|
||||||
|
my %ids = map +($_->{$merge_field}, []), @objects;
|
||||||
|
return if !keys %ids;
|
||||||
|
my @result = $db->iquery( $sql->([keys %ids]) )->hashes;
|
||||||
|
|
||||||
|
# This is the actual merge strategy, which we'll modify slightly:
|
||||||
|
push @{$ids{ delete $_->{$merge_field} }}, values %$_ for (@result);
|
||||||
|
$_->{ $field_name } = $ids{ $_->{$merge_field} } for (@objects);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage is exactly the same:
|
||||||
|
|
||||||
|
|
||||||
|
```perl
|
||||||
|
enrich_flatten 'platforms', 'release_id', sub {
|
||||||
|
sql 'SELECT release_id, platform FROM releases_platforms WHERE release_id IN', $_[0]
|
||||||
|
}, map @{$_->{releases}}, @games;
|
||||||
|
```
|
||||||
|
|
||||||
|
Other merge strategies could be implemented in the same way. For example, one
|
||||||
|
could imagine a scenario where, instead of a one-to-many relation as in the
|
||||||
|
previous examples, there is a one-to-one mapping between the list of objects
|
||||||
|
and the query results. An alternative merge strategy in that case could be to
|
||||||
|
copy the columns of the new query into the already existing object. You could
|
||||||
|
achieve the exact same effect in the original SQL query by adding a plain old
|
||||||
|
`JOIN`, but dynamic application-level joins could still have their uses in
|
||||||
|
abstracting common functionality or structuring a larger code base.
|
||||||
|
|
||||||
|
This version of `enrich` has a limitation that the `$merge_field` is used to
|
||||||
|
identify both the field in the original objects and the column in the query
|
||||||
|
results. This is fine in the example database - I deliberately used full
|
||||||
|
*game_id* and *release_id* column names rather than calling them just *id* -
|
||||||
|
but there will no doubt come a time when this limitation starts to get annoying
|
||||||
|
and you'll want to support different column names for object and result
|
||||||
|
queries. It's easy to support that, in any case.
|
||||||
|
|
||||||
|
|
||||||
|
## A more complex example
|
||||||
|
|
||||||
|
This `enrich` function is cute and all, but the examples I've given so far can
|
||||||
|
be implemented just as easily with your average ORM. Yet there is an important
|
||||||
|
difference (apart from having one less dependency): `enrich` provides for
|
||||||
|
easier ad-hoc flexibility, meaning: you can use the full power of SQL in the
|
||||||
|
enrichment queries.
|
||||||
|
|
||||||
|
Let's add some joins and filters to our example. Let's say we want the
|
||||||
|
following structure:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"game_id": 7,
|
||||||
|
"title": "Tsukihime",
|
||||||
|
"platforms": [ "Windows", "Linux" ],
|
||||||
|
"releases": [
|
||||||
|
{
|
||||||
|
"release_id": 339,
|
||||||
|
"release_date": "2000-12-29",
|
||||||
|
"platform_count": 2
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ ... }
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
That is, we want to have a per-game "platforms" array rather than per-release,
|
||||||
|
and instead of the platform list for releases, we just want a single
|
||||||
|
"platform_count". Let's say we want the list of releases ordered by release
|
||||||
|
date and we also want to be able to filter on the release date. Here's a full
|
||||||
|
example to achieve all of that:
|
||||||
|
|
||||||
|
```perl
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
# This is the per-game platforms list.
|
||||||
|
enrich_flatten 'platforms', 'game_id', sub {
|
||||||
|
sql 'SELECT DISTINCT game_id, platform
|
||||||
|
FROM releases
|
||||||
|
JOIN releases_platforms USING (release_id)
|
||||||
|
WHERE game_id IN', $_[0]
|
||||||
|
}, @games;
|
||||||
|
|
||||||
|
my $latest_date = '2004-01-20'; # Our user input
|
||||||
|
|
||||||
|
# This is the releases list.
|
||||||
|
enrich 'releases', 'game_id', sub {
|
||||||
|
sql 'SELECT game_id, release_id, release_date,
|
||||||
|
(SELECT COUNT(*) FROM releases_platforms rp
|
||||||
|
WHERE rp.release_id = r.release_id) AS platform_count
|
||||||
|
FROM releases r
|
||||||
|
WHERE game_id IN', $_[0], '
|
||||||
|
AND release_date <=', \$latest_date, '
|
||||||
|
ORDER BY release_date'
|
||||||
|
}, @games;
|
||||||
|
```
|
||||||
|
|
||||||
|
It's not the most elegant piece of code ever written, but it does the trick.
|
||||||
|
The alternatives aren't likely to be much better.
|
||||||
|
|
||||||
|
|
||||||
|
## Final notes
|
||||||
|
|
||||||
|
I've been using this approach - with a slightly different API - for the
|
||||||
|
experimental rewrite of [VNDB.org](https://vndb.org/) and I'm happy with the
|
||||||
|
result. The main advantage of the `enrich` family of functions is that you can
|
||||||
|
easily and efficiently run complex queries and merge the results to construct
|
||||||
|
complex JSON documents - ad-hoc and without any boilerplate, as there's no need
|
||||||
|
to define, declare or even name your data structure up front. This sometimes
|
||||||
|
makes it harder to follow the code, as you may have to visualize a monstrously
|
||||||
|
complex data structure in your mind, but this approach can be incredibly
|
||||||
|
helpful for quick prototyping. Good documentation and/or a good debugging
|
||||||
|
strategy can make up for lack of boilerplate.
|
||||||
|
|
||||||
|
The exact API for the `enrich()` functions is something I've not fully decided
|
||||||
|
on yet. I've not put that much thought into the order of its arguments, the
|
||||||
|
need for "variations" for different scenarios could be reduced by adding the
|
||||||
|
merge strategy as a separate argument, at which point named arguments may be
|
||||||
|
necessary to keep things readable, etc. The exact API doesn't matter all that
|
||||||
|
much, it's the idea that matters.
|
||||||
|
|
||||||
|
As I mentioned in the introduction: while I've been using Perl for these
|
||||||
|
examples, I'm convinced that this idea can be easily ported to other dynamic
|
||||||
|
languages. I did in fact experiment with porting this idea to Elixir and it was
|
||||||
|
every bit as convenient, in a way that
|
||||||
|
[Ecto](https://hexdocs.pm/ecto/Ecto.html)'s built-in ORM functionality wasn't.
|
||||||
|
|
||||||
|
It's quite possible that this abstraction is already available in some
|
||||||
|
languages and libraries and that I've just missed it. It's also possible that
|
||||||
|
this is, in fact, a terrible idea in the long run. Only way to find out is to
|
||||||
|
give it a try. On that note, if you feel like playing around with the examples
|
||||||
|
in this article, I wrote [a little script](/download/code/sqlobject.pl) to test
|
||||||
|
that the examples actually work, which also makes for a good base for quick
|
||||||
|
experimentation.
|
||||||
|
|
||||||
|
|
||||||
|
[^1]: This is a rather simplified schema inspired by
|
||||||
|
[VNDB.org](https://vndb.org/) - the project that made me experiment with all
|
||||||
|
this.
|
||||||
|
[^2]: I'm sure that at this point you'd call me out as a madman for suggesting
|
||||||
|
that treating SQL as strings is an acceptable strategy for constructing
|
||||||
|
dynamic queries, and I'd actually be sympathetic to that view - I'm well
|
||||||
|
aware that it's rather brittle. So now you're going to try and convince me
|
||||||
|
that your favorite query builder library is the *ultimate* solution to this
|
||||||
|
problem and that everyone should use it. Except it's not available for Perl
|
||||||
|
and/or it has severe limitations and/or it has so much boilerplate that it's
|
||||||
|
an utter waste of time for quick prototyping. I've played around with query
|
||||||
|
builders in various languages, but the only one I actually kinda liked was
|
||||||
|
Elixir's [Ecto](https://hexdocs.pm/ecto/Ecto.html). But alas, I'm stuck with
|
||||||
|
Perl for now, and string interpolation isn't all that bad.
|
||||||
|
[^3]: I lied, this version does two more things: It removes duplicate
|
||||||
|
identifiers from the `IN` clause and does an early return if there are no
|
||||||
|
identifiers to fetch.
|
||||||
|
|
@ -20,6 +20,10 @@ the incidental article on this site. Enjoy your stay!
|
||||||
<!-- These announcements are parsed by mkfeed.pl, see that file for formatting -->
|
<!-- These announcements are parsed by mkfeed.pl, see that file for formatting -->
|
||||||
## Announcements <a href="/feed.atom"><img src="/img/feed_icon.png" alt="Atom feed"></a>
|
## Announcements <a href="/feed.atom"><img src="/img/feed_icon.png" alt="Atom feed"></a>
|
||||||
|
|
||||||
|
`2019-08-13` - New article: From SQL to Nested Data Structures <!-- link: /doc/sqlobject -->
|
||||||
|
: How to easily fetch complex nested data structures from a normalized
|
||||||
|
relational database. [Read more.](/doc/sqlobject)
|
||||||
|
|
||||||
`2019-08-05` - ncdu 1.14.1 released <!-- tags: ncdu, link: /ncdu -->
|
`2019-08-05` - ncdu 1.14.1 released <!-- tags: ncdu, link: /ncdu -->
|
||||||
: A minor bugfix release. This fixes the `--exclude-caches` option,
|
: A minor bugfix release. This fixes the `--exclude-caches` option,
|
||||||
intermittend early exit on OS X and improves handling of out-of-memory
|
intermittend early exit on OS X and improves handling of out-of-memory
|
||||||
|
|
|
||||||
|
|
@ -18,9 +18,9 @@ each OS and distribution to make the installation process a bit easier for
|
||||||
everyone.
|
everyone.
|
||||||
|
|
||||||
If your system is missing from this page or if you're still having trouble,
|
If your system is missing from this page or if you're still having trouble,
|
||||||
don't hesitate to join the support hub at `adc://dc.blicky.net/` or send me a
|
don't hesitate to join the support hub at `adcs://dc.blicky.net:2780/` or send
|
||||||
mail at [projects@yorhel.nl](mailto:projects@yorhel.nl). Contributions to this
|
me a mail at [projects@yorhel.nl](mailto:projects@yorhel.nl). Contributions to
|
||||||
page are of course highly welcomed as well. :-)
|
this page are of course highly welcomed as well. :-)
|
||||||
|
|
||||||
## Statically linked binaries
|
## Statically linked binaries
|
||||||
|
|
||||||
|
|
|
||||||
193
pub/download/code/sqlobject.pl
Normal file
193
pub/download/code/sqlobject.pl
Normal file
|
|
@ -0,0 +1,193 @@
|
||||||
|
#!/usr/bin/perl
|
||||||
|
|
||||||
|
# For info: https://dev.yorhel.nl/doc/sqlobject
|
||||||
|
# Date: 2019-08-13
|
||||||
|
# License: MIT
|
||||||
|
|
||||||
|
use v5.12;
|
||||||
|
use warnings;
|
||||||
|
use DBIx::Simple;
|
||||||
|
use SQL::Interp 'sql';
|
||||||
|
use Data::Dumper 'Dumper';
|
||||||
|
|
||||||
|
my $db = DBIx::Simple->connect('dbi:Pg:dbname=test', undef, undef, { RaiseError => 1 });
|
||||||
|
|
||||||
|
$db->query($_) for (split /;/, <<_);
|
||||||
|
DROP SCHEMA IF EXISTS sqlobject_test CASCADE;
|
||||||
|
CREATE SCHEMA sqlobject_test;
|
||||||
|
SET search_path TO sqlobject_test;
|
||||||
|
|
||||||
|
CREATE TABLE games (
|
||||||
|
game_id integer PRIMARY KEY,
|
||||||
|
title text
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE releases (
|
||||||
|
release_id integer PRIMARY KEY,
|
||||||
|
game_id integer REFERENCES games,
|
||||||
|
release_date date
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE releases_platforms (
|
||||||
|
release_id integer REFERENCES releases,
|
||||||
|
platform text
|
||||||
|
);
|
||||||
|
|
||||||
|
INSERT INTO games VALUES (7, 'Tsukihime');
|
||||||
|
INSERT INTO releases VALUES (339, 7, '2000-12-29');
|
||||||
|
INSERT INTO releases VALUES (341, 7, '2006-12-28');
|
||||||
|
INSERT INTO releases_platforms VALUES (339, 'Linux');
|
||||||
|
INSERT INTO releases_platforms VALUES (339, 'Windows');
|
||||||
|
INSERT INTO releases_platforms VALUES (341, 'Windows')
|
||||||
|
_
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
for my $game (@games) {
|
||||||
|
$game->{releases} = [
|
||||||
|
$db->query(
|
||||||
|
'SELECT release_id, release_date FROM releases WHERE game_id = ?',
|
||||||
|
$game->{game_id}
|
||||||
|
)->hashes
|
||||||
|
];
|
||||||
|
}
|
||||||
|
print Dumper \@games;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
# Our abstraction to fetch game entries.
|
||||||
|
my sub fetch_games {
|
||||||
|
my $game_ids = shift;
|
||||||
|
$db->query("SELECT * FROM games WHERE game_id IN($game_ids)")->hashes;
|
||||||
|
# We could be doing some transformations here.
|
||||||
|
# Sorting, pagination, fetching extra information, etc.
|
||||||
|
}
|
||||||
|
|
||||||
|
# We can use that function to fetch games that have already been released.
|
||||||
|
my $ids = "SELECT game_id FROM releases WHERE release_date <= CURRENT_DATE";
|
||||||
|
my @released_games = fetch_games($ids);
|
||||||
|
print Dumper \@released_games;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
|
||||||
|
my sub fetch_games {
|
||||||
|
my $game_ids = shift;
|
||||||
|
# Note the use of `iquery` instead of `query`,
|
||||||
|
# this is a DBIx::Simple wrapper around `sql_interp()`.
|
||||||
|
$db->iquery("SELECT * FROM games WHERE game_id IN(", $game_ids, ")")->hashes;
|
||||||
|
}
|
||||||
|
|
||||||
|
my $latest_date = '2019-01-20'; # Our user input
|
||||||
|
|
||||||
|
# This is our "SQL + bind parameters" in a single variable.
|
||||||
|
my $ids = sql("SELECT game_id FROM releases WHERE release_date <=", \$latest_date);
|
||||||
|
|
||||||
|
# ...which can be passed to fetch_games():
|
||||||
|
my @games_with_release_before_latest_date = fetch_games($ids);
|
||||||
|
print Dumper \@games_with_release_before_latest_date;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
# List of game_ids, which we can use for our IN() clause.
|
||||||
|
my @game_ids = map $_->{game_id}, @games;
|
||||||
|
|
||||||
|
# Fetch all the releases related to @game_ids.
|
||||||
|
my @releases = $db->iquery(
|
||||||
|
'SELECT game_id, release_id, release_date FROM releases WHERE game_id IN',
|
||||||
|
\@game_ids
|
||||||
|
)->hashes;
|
||||||
|
|
||||||
|
# Create a 'game_id' => [releases] lookup table for quick access
|
||||||
|
my %releases;
|
||||||
|
for my $release (@releases) {
|
||||||
|
# Add this release to %releases and remove the 'game_id' column,
|
||||||
|
# which was only needed for this merging step.
|
||||||
|
push @{$releases{ delete $release->{game_id} }}, $release;
|
||||||
|
}
|
||||||
|
|
||||||
|
# Now merge the release information back into @games
|
||||||
|
for my $game (@games) {
|
||||||
|
$game->{releases} = $releases{ $game->{game_id} } || [];
|
||||||
|
}
|
||||||
|
print Dumper \@games;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
sub enrich {
|
||||||
|
my($field_name, $merge_field, $sql, @objects) = @_;
|
||||||
|
my %ids = map +($_->{$merge_field}, []), @objects;
|
||||||
|
return if !keys %ids;
|
||||||
|
my @result = $db->iquery( $sql->([keys %ids]) )->hashes;
|
||||||
|
push @{$ids{ delete $_->{$merge_field} }}, $_ for (@result);
|
||||||
|
$_->{ $field_name } = $ids{ $_->{$merge_field} } for (@objects);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
sub enrich_flatten {
|
||||||
|
# This is the same as the original enrich()
|
||||||
|
my($field_name, $merge_field, $sql, @objects) = @_;
|
||||||
|
my %ids = map +($_->{$merge_field}, []), @objects;
|
||||||
|
return if !keys %ids;
|
||||||
|
my @result = $db->iquery( $sql->([keys %ids]) )->hashes;
|
||||||
|
|
||||||
|
# This is the actual merge strategy, which we'll modify slightly:
|
||||||
|
push @{$ids{ delete $_->{$merge_field} }}, values %$_ for (@result);
|
||||||
|
$_->{ $field_name } = $ids{ $_->{$merge_field} } for (@objects);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
enrich 'releases', 'game_id', sub {
|
||||||
|
sql 'SELECT game_id, release_id, release_date FROM releases WHERE game_id IN', $_[0]
|
||||||
|
}, @games;
|
||||||
|
|
||||||
|
# More nesting
|
||||||
|
enrich 'platforms', 'release_id', sub {
|
||||||
|
sql 'SELECT release_id, platform FROM releases_platforms WHERE release_id IN', $_[0]
|
||||||
|
}, map @{$_->{releases}}, @games;
|
||||||
|
|
||||||
|
# Flattened platforms
|
||||||
|
enrich_flatten 'platforms_flat', 'release_id', sub {
|
||||||
|
sql 'SELECT release_id, platform FROM releases_platforms WHERE release_id IN', $_[0]
|
||||||
|
}, map @{$_->{releases}}, @games;
|
||||||
|
|
||||||
|
print Dumper \@games;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
{
|
||||||
|
my @games = $db->query('SELECT game_id, title FROM games')->hashes;
|
||||||
|
|
||||||
|
# This is the per-game platforms list.
|
||||||
|
enrich_flatten 'platforms', 'game_id', sub {
|
||||||
|
sql 'SELECT DISTINCT game_id, platform
|
||||||
|
FROM releases
|
||||||
|
JOIN releases_platforms USING (release_id)
|
||||||
|
WHERE game_id IN', $_[0]
|
||||||
|
}, @games;
|
||||||
|
|
||||||
|
my $latest_date = '2004-01-20'; # Our user input
|
||||||
|
|
||||||
|
# This is the releases list.
|
||||||
|
enrich 'releases', 'game_id', sub {
|
||||||
|
sql 'SELECT game_id, release_id, release_date,
|
||||||
|
(SELECT COUNT(*) FROM releases_platforms rp
|
||||||
|
WHERE rp.release_id = r.release_id) AS platform_count
|
||||||
|
FROM releases r
|
||||||
|
WHERE game_id IN', $_[0], '
|
||||||
|
AND release_date <=', \$latest_date, '
|
||||||
|
ORDER BY release_date'
|
||||||
|
}, @games;
|
||||||
|
|
||||||
|
print Dumper \@games;
|
||||||
|
}
|
||||||
Loading…
Add table
Add a link
Reference in a new issue