fu/FU/Util.pm
Yorhel 13eaeb1d4a jsonparse: Add max_depth, max_size and offset options
This completes all the functionality that I wanted from the JSON parser.
2025-02-01 11:01:49 +01:00

152 lines
4.8 KiB
Perl

package FU::Util 0.1;
use v5.36;
use FU::XS;
use Exporter 'import';
our @EXPORT_OK = qw/json_format json_parse/;
1;
__END__
=head1 NAME
FU::Util - Miscellaneous utility functions that really should have been part of
a core Perl installation but aren't for some reason because the Perl community
doesn't believe in the concept of a "batteries included" standard library.
</rant>
=head1 SYNOPSIS
use FU::Util qw/json_format/;
my $data = json_format [1, 2, 3];
=head1 DESCRIPTION
=head2 JSON parsing & formatting
This module comes with a custom C-based JSON parser and formatter. These
functions conform strictly to L<RFC-8259|https://tools.ietf.org/html/rfc8259>,
non-standard extensions are not supported and never will be. It also happens to
be pretty fast, refer to L<FU::Benchmarks> for some numbers.
JSON booleans are parsed into C<builtin::true> and C<builtin::false>. When
formatting, those builtin constants are the I<only> recognized boolean values -
alternative representations such as C<JSON::PP::true> and C<JSON::PP::false>
are not recognized and attempting to format such values will croak.
JSON numbers that are too large fit into a Perl integer are parsed into a
floating point value instead. This obviously loses precision, but is consistent
with C<JSON.parse()> in JavaScript land - except Perl does support the full
range of a 64bit integer. JSON numbers with a fraction or exponent are also
converted into floating point, which may lose precision as well.
L<Math::BigInt> and L<Math::BigFloat> are not currently supported. Attempting
to format a floating point C<NaN> or C<Inf> results in an error.
=over
=item json_parse($string, %options)
Parse a JSON string and return a Perl value. With the default options, this
function is roughly similar to:
JSON::PP->new->allow_nonref->core_bools-decode($string);
Croaks on invalid JSON, but the error messages are not super useful. This
function also throws an error on JSON objects with duplicate keys, which is
consistent with the default behavior of L<Cpanel::JSON::XS> but inconsistent
with other modules.
Supported C<%options>:
=over
=item utf8
Boolean, interpret the input C<$string> as a UTF-8 encoded byte string instead
of a Perl Unicode string.
=item max_depth
Maximum permitted nesting depth of arrays and objects. Defaults to 512.
=item max_size
Throw an error if the JSON data is larger than the given size in bytes.
Defaults to 1 GiB.
=item offset
Takes a reference to a scalar that indicates from which byte offset in
C<$string> to start parsing. On success, the offset is updated to point to the
next non-whitespace character or C<undef> if the string has been fully
consumed.
This option can be used to parse a stream of JSON values:
my $data = '{"obj":1}{"obj":2}';
my $offset = 0;
my $obj1 = json_parse($data, offset => \$offset);
# $obj1 = {obj=>1}; $offset = 9;
my $obj2 = json_parse($data, offset => \$offset);
# $obj2 = {obj=>2}; $offset = undef;
=back
=item json_format($scalar, %options)
Format a Perl value as JSON. With the default options, this function behaves
roughly similar to:
JSON::PP->new->allow_nonref->core_bools->convert_blessed->encode($scalar);
Some modules escape the slash character in encoded strings to prevent a
potential XSS vulnerability when embedding JSON inside C<< <script> ..
</script> >> tags. This function does I<not> do that because it might not even
be sufficient. The following is probably an improvement:
json_format($data) =~ s{</}{<\\/}rg =~ s/<!--/<\\u0021--/rg;
The following C<%options> are supported:
=over
=item canonical
Boolean, write hash keys in deterministic (sorted) order. This option currently
has no effect on tied hashes.
=item pretty
Boolean, format JSON with newlines and indentation for easier reading. Beauty
is in the eye of the beholder, this option currently follows the convention
used by L<JSON::XS> and others: 3 space indent and one space around the C<:>
separating object keys and values. The exact format might change in later
versions.
=item utf8
Boolean, returns a UTF-8 encoded byte string instead of a Perl Unicode string.
=item max_size
Maximum permitted size, in bytes, of the generated JSON string. Defaults to 1 GiB.
=item max_depth
Maximum permitted nesting depth of Perl values. Defaults to 512.
=back
=back
(Why the hell yet another JSON codec when CPAN is already full of them!? Well,
L<JSON::XS> is pretty cool but isn't going to be updated to support Perl's new
builtin booleans. L<JSON::PP> is slow and while L<Cpanel::JSON::XS> is
perfectly adequate, its codebase is too large and messy for my taste - too many
unnecessary features and C<#ifdef>s to support ancient perls and esoteric
configurations. Still, if you need anything not provided by these functions,
L<JSON::PP> and L<Cpanel::JSON::XS> are perfectly fine alternatives.
L<JSON::SIMD> and L<Mojo::JSON> also look like good and maintained candidates.)