jsonparse: Add max_depth, max_size and offset options

This completes all the functionality that I wanted from the JSON parser.
This commit is contained in:
Yorhel 2025-02-01 11:01:43 +01:00
parent abfbba3c10
commit 13eaeb1d4a
3 changed files with 105 additions and 9 deletions

View file

@ -18,9 +18,9 @@ doesn't believe in the concept of a "batteries included" standard library.
=head1 SYNOPSIS
use FU::Util qw/json_format/;
use FU::Util qw/json_format/;
my $data = json_format [1, 2, 3];
my $data = json_format [1, 2, 3];
=head1 DESCRIPTION
@ -51,7 +51,12 @@ to format a floating point C<NaN> or C<Inf> results in an error.
Parse a JSON string and return a Perl value. With the default options, this
function is roughly similar to:
JSON::PP->new->allow_nonref->core_bools-decode($string);
JSON::PP->new->allow_nonref->core_bools-decode($string);
Croaks on invalid JSON, but the error messages are not super useful. This
function also throws an error on JSON objects with duplicate keys, which is
consistent with the default behavior of L<Cpanel::JSON::XS> but inconsistent
with other modules.
Supported C<%options>:
@ -62,6 +67,31 @@ Supported C<%options>:
Boolean, interpret the input C<$string> as a UTF-8 encoded byte string instead
of a Perl Unicode string.
=item max_depth
Maximum permitted nesting depth of arrays and objects. Defaults to 512.
=item max_size
Throw an error if the JSON data is larger than the given size in bytes.
Defaults to 1 GiB.
=item offset
Takes a reference to a scalar that indicates from which byte offset in
C<$string> to start parsing. On success, the offset is updated to point to the
next non-whitespace character or C<undef> if the string has been fully
consumed.
This option can be used to parse a stream of JSON values:
my $data = '{"obj":1}{"obj":2}';
my $offset = 0;
my $obj1 = json_parse($data, offset => \$offset);
# $obj1 = {obj=>1}; $offset = 9;
my $obj2 = json_parse($data, offset => \$offset);
# $obj2 = {obj=>2}; $offset = undef;
=back
@ -70,14 +100,14 @@ of a Perl Unicode string.
Format a Perl value as JSON. With the default options, this function behaves
roughly similar to:
JSON::PP->new->allow_nonref->core_bools->convert_blessed->encode($scalar);
JSON::PP->new->allow_nonref->core_bools->convert_blessed->encode($scalar);
Some modules escape the slash character in encoded strings to prevent a
potential XSS vulnerability when embedding JSON inside C<< <script> ..
</script> >> tags. This function does I<not> do that because it might not even
be sufficient. The following is probably an improvement:
json_format($data) =~ s{</}{<\\/}rg =~ s/<!--/<\\u0021--/rg;
json_format($data) =~ s{</}{<\\/}rg =~ s/<!--/<\\u0021--/rg;
The following C<%options> are supported: