Pg: Support custom type overrides with callbacks

This commit is contained in:
Yorhel 2025-02-28 11:23:37 +01:00
parent 327fd9ea50
commit 4686097d00
6 changed files with 227 additions and 85 deletions

4
FU.xs
View file

@ -21,6 +21,10 @@
#define BOOL_INTERNALS_sv_isbool_true(x) SvPVXtrue(x)
#endif
/* Disable key/value struct packing in khashl, so we can safely take a pointer
* to values inside the hash table. */
#define kh_packed
#include "c/khashl.h"
#include "c/common.c"
#include "c/jsonfmt.c"

View file

@ -44,8 +44,7 @@ my @tabs = (
td_ fu->headers->{$_};
} for sort keys fu->headers->%*;
};
h2_ 'Body';
p_ 'TODO';
# TODO: Body? Certainly useful for JSON
('Request')
},

123
FU/Pg.pm
View file

@ -555,9 +555,55 @@ send and receive everything as text!"
Instead, in the (default) C<binary> mode, the responsibility of converting
Postgres data to and from Perl values lies with this module. This allows for a
lot of type-specific conveniences, but has the downside of requiring special
code for each supported PostgreSQL type. Most of the Postgres core types are
supported by this module and convert in an intuitive way, but here's a few
type-specific notes:
code for every PostgreSQL type. Most of the core types are supported by this
module and convert in an intuitive way, but you can also configure each type
manually:
=over
=item $conn->set_type($target_type, $type)
=item $conn->set_type($target_type, send => $type, recv => $type)
Change how C<$target_type> is being converted when used as a bind parameter
(I<send>) or when received from query results (I<recv>). The two-argument
version is equivalent to setting I<send> and I<recv> to the same C<$type>.
Types can be specified either by their numeric I<Oid> or by name. In the latter
case, the name must exactly match the internal type name used by PostgreSQL.
Note that this "internal type name" does not always match the names used in
documentation. For example, I<smallint>, I<integer> and I<bigint> should be
specified as I<int2>, I<int4> and I<int8>, respectively, and the I<char> type
is internally called I<bpchar>. The full list of recognized types in your
database can be queried with:
SELECT oid, typname FROM pg_type;
The C<$target_type> does not have to exist in the database when this method is
called. This method only stores the type in its internal configuration, which
is consulted when executing a query that takes the type as bind parameter or
returns a column of that type.
The following arguments are supported for C<$type>:
=over
=item * I<undef>, to reset the conversion functions to their default.
=item * The numeric I<Oid> or name of a built-in type supported by this module,
to use those conversion functions.
=item * A subroutine reference that is called to perform the conversion. For
I<send>, the subroutine is given a Perl value as argument and expected to
return a binary string to be sent to Postgres. For I<recv>, the subroutine is
given a binary string received from Postgres and expected to return a Perl
value.
=back
=back
Some built-in types deserve a few additional notes:
=over
@ -576,14 +622,18 @@ that along as raw binary strings.
=item timestamp / timestamptz
These are converted to and from seconds since the Unix epoch as a floating
point value, similar to the C<time()> (or better: C<Time::HiRes::time()>)
functions.
point value, for easy comparison against C<time()> and related functions.
The timestamp types in Postgres have microsecond accuracy. Floating point can
represent that without loss for dates that are near enough to the epoch (still
seems to be fine in 2025, at least), but this conversion may be lossy for dates
far beyond or before the epoch.
Postgres internally represents timestamps as microseconds since 2000-01-01
stored in a 64-bit integer. If you prefer that, use:
$conn->set_type(timestamptz => 'int8');
=item date
Converted between strings in C<YYYY-MM-DD> format. Postgres accepts a bunch of
@ -598,6 +648,13 @@ While C<null> is a valid JSON value, there's currently no way to distinguish
that from SQL C<NULL>. When sending C<undef> as bind parameter, it is sent as
SQL C<NULL>.
If you prefer to work with JSON are raw text values instead, use:
$conn->set_type(json => 'text');
That doesn't I<quite> work for the C<jsonb> type. I mean, it works, but then
there's a single C<"\1"> byte prefixed to the string.
=item arrays
PostgreSQL arrays automatically convert to and from Perl arrays as you'd
@ -608,7 +665,19 @@ and all arrays sent to Postgres will use their default 1-based indexing.
=item records / row types
These are converted to and from hashrefs.
Typed records are converted to and from hashrefs. Untyped records (i.e. values
of the C<record> pseudo-type) are not supported.
=item domain types
These are recognized and automatically converted to and from their underlying
type. It may be tempting to use C<set_type()> to configure special type
conversions for domain types, but beware that PostgreSQL reports columns in the
C<SELECT> clause of a query as being of the I<underlying> type rather than the
domain type, so the conversions will not apply in that case. They do seem to
apply when the domain type is used as bind parameter, array element or record
field. This is an (intentional) limitation of PostgreSQL, sadly not something I
can work around.
=item geometric types
@ -624,55 +693,21 @@ These are converted to and from hashrefs.
=item tsvector / tsquery
=item range / multirange
=item Extension types
These are not supported at the moment. Not that they're hard to implement (I
think), I simply haven't looked into them yet. Open a bug report if you need
any of these.
=back
=head3 Overriding types
The default conversion for each type can be changed:
=over
=item $conn->set_type($affected_type, $type)
=item $conn->set_type($affected_type, send => $type, recv => $type)
Change how C<$affected_type> is being converted when used as a bind parameter
(I<send>) or when received from query results (I<recv>). The two-argument
version is equivalent to setting I<send> and I<recv> to the same C<$type>.
Types can be specified either by their numeric I<Oid> or by name. In the latter
case, the name must exactly match the internal type name used by PostgreSQL.
Note that this "internal type name" does not always match the names used in
documentation. For example, I<smallint>, I<integer> and I<bigint> should be
specified as I<int2>, I<int4> and I<int8>, respectively, and the I<char> type
is internally called I<bpchar>. The full list of recognized types in your
database can be queried with:
SELECT oid, typname FROM pg_type;
The C<$affected_type> does not actually have to exist in the database, this
method only stores the type in its internal configuration, which is consulted
upon executing a query that takes the type as bind parameter or when it returns
a column of that type.
The given C<$type> arguments must refer to a built-in type supported by this
module. Types can also be set to I<undef> to restore the conversion to its
default.
As a workaround, you can always switch back to the text format or use
C<set_type()> to configure appropriate conversions for these types.
=back
I<TODO:> Type override examples and a warning about domain types.
I<TODO:> Some handy special types for overriding common conversions.
I<TODO:> Support for custom types through callbacks.
I<TODO:> Methods to convert between the various formats.
I<TODO:> Methods to query type info.

View file

@ -27,6 +27,7 @@ static void fupg_prep_destroy(fupg_prep *p) {
typedef struct {
const fupg_type *send, *recv;
SV *sendcb, *recvcb;
} fupg_override;
#define fupg_name_hash(v) kh_hash_str((v).n)
@ -209,13 +210,26 @@ static void fupg_conn_destroy(pTHX_ fupg_conn *c) {
PQfinish(c->conn);
if (c->buf.sv) SvREFCNT_dec(c->buf.sv);
safefree(c->types);
fupg_oid_overrides_destroy(c->oidtypes);
fupg_name_overrides_destroy(c->nametypes);
khint_t k;
kh_foreach(c->oidtypes, k) {
SvREFCNT_dec(kh_val(c->oidtypes, k).sendcb);
SvREFCNT_dec(kh_val(c->oidtypes, k).recvcb);
}
fupg_oid_overrides_destroy(c->oidtypes);
kh_foreach(c->nametypes, k) {
SvREFCNT_dec(kh_val(c->nametypes, k).sendcb);
SvREFCNT_dec(kh_val(c->nametypes, k).recvcb);
}
fupg_name_overrides_destroy(c->nametypes);
kh_foreach(c->records, k) safefree(kh_val(c->records, k));
fupg_records_destroy(c->records);
kh_foreach(c->prep_map, k) fupg_prep_destroy(kh_key(c->prep_map, k));
fupg_prepared_destroy(c->prep_map);
safefree(c);
}
@ -367,9 +381,19 @@ static void fupg_prepared_unref(fupg_conn *c, fupg_prep *p) {
/* Type handling */
static const fupg_type *fupg_resolve_builtin(pTHX_ SV *name) {
static const fupg_type *fupg_resolve_builtin(pTHX_ SV *name, SV **cb) {
SvGETMAGIC(name);
*cb = NULL;
if (!SvOK(name)) return NULL;
if (SvROK(name)) {
SV *rv = SvRV(name);
if (SvTYPE(rv) == SVt_PVCV) {
*cb = SvREFCNT_inc(name);
return &fupg_type_perlcb;
}
}
UV uv;
const char *pv = SvPV_nomg_nolen(name);
const fupg_type *t = grok_atoUV(pv, &uv, NULL) && uv <= (UV)UINT_MAX
@ -381,8 +405,8 @@ static const fupg_type *fupg_resolve_builtin(pTHX_ SV *name) {
static void fupg_set_type(pTHX_ fupg_conn *c, SV *name, SV *sendsv, SV *recvsv) {
fupg_override o;
o.send = fupg_resolve_builtin(sendsv);
o.recv = fupg_resolve_builtin(recvsv);
o.send = fupg_resolve_builtin(sendsv, &o.sendcb);
o.recv = fupg_resolve_builtin(recvsv, &o.recvcb);
if ((o.send && o.send->send == fupg_send_array) || (o.recv && o.recv->recv == fupg_recv_array))
fu_confess("Cannot set a type to array, override the underlying element type instead");
/* Can't currently happen since we have no records in the builtin type
@ -393,18 +417,24 @@ static void fupg_set_type(pTHX_ fupg_conn *c, SV *name, SV *sendsv, SV *recvsv)
UV uv;
STRLEN len;
const char *pv = SvPV(name, len);
int k, i;
int k, absent;
fupg_override *so = NULL;
if (grok_atoUV(pv, &uv, NULL) && uv <= (UV)UINT_MAX) {
k = fupg_oid_overrides_put(c->oidtypes, (Oid)uv, &i);
kh_val(c->oidtypes, k) = o;
k = fupg_oid_overrides_put(c->oidtypes, (Oid)uv, &absent);
so = &kh_val(c->oidtypes, k);
} else if (len < sizeof(fupg_name)) {
fupg_name n;
strcpy(n.n, pv);
k = fupg_name_overrides_put(c->nametypes, n, &i);
kh_val(c->nametypes, k) = o;
k = fupg_name_overrides_put(c->nametypes, n, &absent);
so = &kh_val(c->nametypes, k);
} else {
fu_confess("Invalid type oid or name '%s'", pv);
}
if (!absent) {
SvREFCNT_dec(so->sendcb);
SvREFCNT_dec(so->recvcb);
}
*so = o;
}
@ -517,19 +547,19 @@ static const fupg_record *fupg_lookup_record(fupg_conn *c, Oid oid) {
#define FUPGT_SEND 2
#define FUPGT_RECV 4
static const fupg_type *fupg_override_get(fupg_conn *c, int flags, Oid oid, const fupg_name *name) {
static const fupg_type *fupg_override_get(fupg_conn *c, int flags, Oid oid, const fupg_name *name, SV **cb) {
khint_t k;
#define R(t) if (k != kh_end(c->t)) return flags & FUPGT_SEND ? kh_val(c->t, k).send : kh_val(c->t, k).recv
fupg_override *o;
if (name == NULL) {
k = fupg_oid_overrides_get(c->oidtypes, oid);
R(oidtypes);
o = k == kh_end(c->oidtypes) ? NULL : &kh_val(c->oidtypes, k);
} else {
k = fupg_name_overrides_get(c->nametypes, *name);
R(nametypes);
o = k == kh_end(c->nametypes) ? NULL : &kh_val(c->nametypes, k);
}
#undef R
return NULL;
if (!o) return NULL;
*cb = flags & FUPGT_SEND ? o->sendcb : o->recvcb;
return flags & FUPGT_SEND ? o->send : o->recv;
}
static void fupg_tio_setup(pTHX_ fupg_conn *conn, fupg_tio *tio, int flags, Oid oid, int *refresh_done) {
@ -547,12 +577,13 @@ static void fupg_tio_setup(pTHX_ fupg_conn *conn, fupg_tio *tio, int flags, Oid
* Some send/recv functions have slightly different behavior based on oid,
* in those cases this behavior is useful. */
SV *cb = NULL;
const fupg_type *e, *t;
e = t = fupg_override_get(conn, flags, oid, NULL);
e = t = fupg_override_get(conn, flags, oid, NULL, &cb);
if (!t) t = fupg_lookup_type(aTHX_ conn, refresh_done, oid);
if (!t) fu_confess("No type found with oid %u", oid);
tio->name = t->name.n;
if (!e && (e = fupg_override_get(conn, flags, 0, &t->name))) t = e;
if (!e && (e = fupg_override_get(conn, flags, 0, &t->name, &cb))) t = e;
if (flags & FUPGT_SEND && !t->send) fu_confess("Unable to send type '%s' (oid %u)", tio->name, oid);
if (flags & FUPGT_RECV && !t->recv) fu_confess("Unable to receive type '%s' (oid %u)", tio->name, oid);
@ -565,9 +596,14 @@ static void fupg_tio_setup(pTHX_ fupg_conn *conn, fupg_tio *tio, int flags, Oid
tio->send = t->send;
tio->recv = t->recv;
if (flags & FUPGT_SEND ? tio->send == fupg_send_array : tio->recv == fupg_recv_array) {
if (flags & FUPGT_SEND ? tio->send == fupg_send_perlcb : tio->recv == fupg_recv_perlcb) {
tio->cb = cb;
} else if (flags & FUPGT_SEND ? tio->send == fupg_send_array : tio->recv == fupg_recv_array) {
tio->arrayelem = safecalloc(1, sizeof(*tio->arrayelem));
fupg_tio_setup(aTHX_ conn, tio->arrayelem, flags, t->elemoid, refresh_done);
} else if (flags & FUPGT_SEND ? tio->send == fupg_send_record : tio->recv == fupg_recv_record) {
tio->record.info = fupg_lookup_record(conn, t->elemoid);
if (!tio->record.info) fu_confess("Unable to find attributes for record type '%s' (oid %u, relid %u)", tio->name, t->oid, t->elemoid);

View file

@ -32,6 +32,7 @@ struct fupg_tio {
const fupg_record *info;
fupg_tio *tio;
} record;
SV *cb;
};
};
@ -416,6 +417,52 @@ SENDFN(record) {
}
RECVFN(perlcb) {
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
mXPUSHs(newSVpvn(buf, len));
PUTBACK;
call_sv(ctx->cb, G_SCALAR);
SPAGAIN;
SV *ret = newSV(0);
sv_setsv(ret, POPs);
PUTBACK;
FREETMPS;
LEAVE;
return ret;
}
SENDFN(perlcb) {
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
XPUSHs(val);
PUTBACK;
call_sv(ctx->cb, G_SCALAR);
SPAGAIN;
SV *ret = POPs;
PUTBACK;
STRLEN len;
const char *buf = SvPV(ret, len);
fustr_write(out, buf, len);
FREETMPS;
LEAVE;
}
RECVFN(inet) { /* Also works for cidr */
char tmp[128];
if (len < 8) RERR("input data too short");
@ -726,6 +773,8 @@ static const fupg_type fupg_builtin[] = {
#define FUPG_BUILTIN (sizeof(fupg_builtin) / sizeof(fupg_type))
static const fupg_type fupg_type_perlcb = { 0, 0, {"$perl_cb"}, fupg_send_perlcb, fupg_recv_perlcb };
static const fupg_type *fupg_type_byoid(const fupg_type *list, int len, Oid oid) {
int i, b = 0, e = len-1;

View file

@ -9,6 +9,7 @@ plan skip_all => 'Please set FU_TEST_DB to a PostgreSQL connection string to run
my $conn = FU::Pg->connect($ENV{FU_TEST_DB});
$conn->_debug_trace(0);
is_deeply $conn->Q('SELECT', 1, '::int')->param_types, [23];
is_deeply $conn->Q('SELECT 1', IN([1,2,3]))->param_types, [1007];
is $conn->Q('SELECT 1', IN([1,2,3]))->val, 1;
@ -17,30 +18,48 @@ ok !eval { $conn->q('SELECT $1::aclitem', '')->exec; 1 };
like $@, qr/Unable to send type/;
$conn->set_type(int4 => recv => 'bytea');
is $conn->q('SELECT 5::int4')->val, "\0\0\0\5";
is_deeply $conn->q('SELECT ARRAY[5::int4]')->val, ["\0\0\0\5"];
subtest 'type overrides', sub {
$conn->set_type(int4 => recv => 'bytea');
is $conn->q('SELECT 5::int4')->val, "\0\0\0\5";
is_deeply $conn->q('SELECT ARRAY[5::int4]')->val, ["\0\0\0\5"];
$conn->set_type(int4 => send => 'bytea');
is $conn->q('SELECT $1::int4', "\0\0\0\5")->val, 5;
is_deeply $conn->q('SELECT $1::int4[]', ["\0\0\0\5"])->val, [5];
$conn->set_type(int4 => send => 'bytea');
is $conn->q('SELECT $1::int4', "\0\0\0\5")->val, 5;
is_deeply $conn->q('SELECT $1::int4[]', ["\0\0\0\5"])->val, [5];
$conn->set_type(int4 => 'int2');
ok !eval { $conn->q('SELECT 5::int4')->val };
like $@, qr/Error parsing value/;
ok !eval { $conn->q('SELECT $1::int4', 5)->val };
like $@, qr/insufficient data left in message/;
$conn->set_type(int4 => 'int2');
ok !eval { $conn->q('SELECT 5::int4')->val };
like $@, qr/Error parsing value/;
ok !eval { $conn->q('SELECT $1::int4', 5)->val };
like $@, qr/insufficient data left in message/;
$conn->set_type(int4 => undef);
is $conn->q('SELECT 5::int4')->val, 5;
$conn->set_type(int4 => undef);
is $conn->q('SELECT 5::int4')->val, 5;
ok !eval { $conn->set_type(int4 => 1007); };
like $@, qr/Cannot set a type to array/;
ok !eval { $conn->set_type(int4 => 1007); };
like $@, qr/Cannot set a type to array/;
ok !eval { $conn->set_type(int4 => 1); };
like $@, qr/No builtin type found/;
ok !eval { $conn->set_type(int4 => 1); };
like $@, qr/No builtin type found/;
};
{
subtest 'type override callback', sub {
$conn->set_type(text => recv => sub { length $_[0] });
is $conn->q('SELECT $1', 'a')->val, 1;
is $conn->q('SELECT $1', 'ab')->val, 2;
is $conn->q('SELECT $1', 'abc')->val, 3;
is $conn->q('SELECT $1', 'abcd')->val, 4;
$conn->set_type(text => send => sub { 'l'.length $_[0] });
is $conn->q('SELECT $1', 'a')->val, 'l1';
is $conn->q('SELECT $1', 'ab')->val, 'l2';
is $conn->q('SELECT $1', 'abc')->val, 'l3';
is $conn->q('SELECT $1', 'abcd')->val, 'l4';
};
subtest 'custom types', sub {
my $txn = $conn->txn;
is $txn->Q('SELECT 1', IN([1,2,3]))->val, 1;
@ -100,6 +119,6 @@ like $@, qr/No builtin type found/;
is $txn->q("SELECT dom FROM fupg_test_table")->val, 'bb';
$conn->set_type(fupg_test_enum => 21);
is $txn->q("SELECT dom FROM fupg_test_table")->val, 0x6262;
}
};
done_testing;