yxml updates
This commit is contained in:
parent
4e7538db99
commit
94469e44e3
1 changed files with 66 additions and 23 deletions
89
dat/yxml
89
dat/yxml
|
|
@ -1,10 +1,11 @@
|
||||||
=pod
|
=pod
|
||||||
|
|
||||||
I<*But see the L<Bugs and Limitations|/Bugs and Limitations> below.>
|
I<*But see the L<Bugs and Limitations|/Bugs and Limitations> and L<Conformance Issues|/Conformance Issues> below.>
|
||||||
|
|
||||||
Yxml is a small (C<6 KiB>) non-validating yet mostly conforming XML parser
|
Yxml is a small (C<6 KiB>) L<non-validating|/Validating vs. non-validating> yet
|
||||||
written in C. Its primary goals are small binary size, simplicity and
|
mostly conforming XML parser written in C. Its primary goals are small binary
|
||||||
correctness. It also happens to be L<pretty fast|/Comparison>.
|
size, simplicity and correctness. It also happens to be L<pretty
|
||||||
|
fast|/Comparison>.
|
||||||
|
|
||||||
The code can be obtained from the L<git repo|http://g.blicky.net/yxml.git> and
|
The code can be obtained from the L<git repo|http://g.blicky.net/yxml.git> and
|
||||||
is available under a permissive MIT license. The only two files you need are
|
is available under a permissive MIT license. The only two files you need are
|
||||||
|
|
@ -60,11 +61,6 @@ But let's not be I<too> optimistic, because there are also...
|
||||||
|
|
||||||
=over
|
=over
|
||||||
|
|
||||||
=item * Element and Attribute names may only consist of ASCII characters.
|
|
||||||
|
|
||||||
=item * Does not verify that non-ASCII characters in attribute values or
|
|
||||||
element contents are within the allowed character ranges.
|
|
||||||
|
|
||||||
=item * A conditional section in a C<< <!DOCTYPE ..> >> declaration will result
|
=item * A conditional section in a C<< <!DOCTYPE ..> >> declaration will result
|
||||||
in a parse error.
|
in a parse error.
|
||||||
|
|
||||||
|
|
@ -77,33 +73,51 @@ not available through the API.
|
||||||
|
|
||||||
I hope to have these issues fixed in the near future.
|
I hope to have these issues fixed in the near future.
|
||||||
|
|
||||||
=head3 Non-features
|
=head3 Conformance Issues
|
||||||
|
|
||||||
And now follows a list of things that are not supported and probably never will
|
|
||||||
be. Most items on this list can be implemented on top of yxml.
|
|
||||||
|
|
||||||
=over
|
=over
|
||||||
|
|
||||||
=item * Does not verify all well-formedness constraints. In particular, does
|
=item * Does not verify that non-ASCII characters in element names, element
|
||||||
not verify that attribute names within the same element are unique, and does
|
content, attribute names and attribute values are within the allowed Unicode
|
||||||
not verify that the contents of a C<< <!DOCTYPE ..> >> declaration follow the
|
character ranges.
|
||||||
XML grammar.
|
|
||||||
|
|
||||||
=item * No helper functions to deal with namespaces. Yxml will parse XML files
|
=item * Does not verify that attribute names within the same element are unique.
|
||||||
with namespaces just fine, but it's up to the application to do the rest.
|
|
||||||
|
=item * Does not verify that the contents of a C<< <!DOCTYPE ..> >> declaration
|
||||||
|
follow the XML grammar.
|
||||||
|
|
||||||
|
=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have
|
||||||
|
to convert it to UTF-8 or something similar first.
|
||||||
|
|
||||||
=item * No support for custom entity references, neither through the API nor
|
=item * No support for custom entity references, neither through the API nor
|
||||||
using C<< <!ENTITY> >>.
|
using C<< <!ENTITY> >>.
|
||||||
|
|
||||||
|
=back
|
||||||
|
|
||||||
|
These conformance issues are the result of the byte-oriented and minimal design
|
||||||
|
of yxml, and I do not intent to fix these directly within the library. All of
|
||||||
|
the above mentioned issues can be fixed on top of yxml (by the application, or
|
||||||
|
by a wrapper) if strict conformance is required. With the exception of custom
|
||||||
|
entity references, but I have a simple idea on how to support that in the
|
||||||
|
future, too.
|
||||||
|
|
||||||
|
=head3 Non-features
|
||||||
|
|
||||||
|
And now follows a list of things that are not part of the core XML
|
||||||
|
specification and are not directly supported. As with the conformance issues,
|
||||||
|
these features can be implemented on top of yxml.
|
||||||
|
|
||||||
|
=over
|
||||||
|
|
||||||
|
=item * No helper functions to deal with namespaces. Yxml will parse XML files
|
||||||
|
with namespaces just fine, but it's up to the application to do the rest.
|
||||||
|
|
||||||
=item * No DTD or XML Schema validation.
|
=item * No DTD or XML Schema validation.
|
||||||
|
|
||||||
=item * No XSLT.
|
=item * No XSLT.
|
||||||
|
|
||||||
=item * No XPath.
|
=item * No XPath.
|
||||||
|
|
||||||
=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have
|
|
||||||
to convert it to UTF-8 or something similar first.
|
|
||||||
|
|
||||||
=item * Doesn't do your household chores.
|
=item * Doesn't do your household chores.
|
||||||
|
|
||||||
=back
|
=back
|
||||||
|
|
@ -122,7 +136,7 @@ implementation is also included as an indication of the "theoretical" minimum.
|
||||||
expat 2.1.0 MIT 162 139 194 432 1.47 1.09
|
expat 2.1.0 MIT 162 139 194 432 1.47 1.09
|
||||||
libxml2 2.9.1 MIT 464 328 518 816 2.53 1.75
|
libxml2 2.9.1 MIT 464 328 518 816 2.53 1.75
|
||||||
mxml 2.7 LGPL2+static 32 733 75 832 12.38 7.80
|
mxml 2.7 LGPL2+static 32 733 75 832 12.38 7.80
|
||||||
yxml git MIT 6 015 31 448 1.18 0.73
|
yxml git MIT 5 935 31 384 1.14 0.74
|
||||||
|
|
||||||
The code for these benchmarks is available in the
|
The code for these benchmarks is available in the
|
||||||
L<bench/|http://g.blicky.net/yxml.git/tree/bench> directory on git. Some
|
L<bench/|http://g.blicky.net/yxml.git/tree/bench> directory on git. Some
|
||||||
|
|
@ -164,3 +178,32 @@ with C<-Os> than with C<-O2>.
|
||||||
libxml2 2.9.1 MIT 356 948 412 256 3.01 2.08
|
libxml2 2.9.1 MIT 356 948 412 256 3.01 2.08
|
||||||
mxml 2.7 LGPL2+static 27 725 71 704 11.70 7.44
|
mxml 2.7 LGPL2+static 27 725 71 704 11.70 7.44
|
||||||
yxml git MIT 4 835 30 264 1.72 1.05
|
yxml git MIT 4 835 30 264 1.72 1.05
|
||||||
|
|
||||||
|
|
||||||
|
=head2 Validating vs. non-validating
|
||||||
|
|
||||||
|
TL;DR: yxml does I<not> accept garbage XML documents, it will correctly handle
|
||||||
|
and report issues if the input does not strictly follow the XML grammar.
|
||||||
|
|
||||||
|
The terms I<validating> and I<non-validating> have specific meanings within the
|
||||||
|
context of XML. A validating parser is one that reads the doctype declaration
|
||||||
|
(DTD) associated with a document, and validates that the contents of the
|
||||||
|
document follow the rules described in the DTD. A DTD may also include
|
||||||
|
instructions on how to parse the document, including the definition of custom
|
||||||
|
entity references (C<&whatever;>) and instructions on how attribute values or
|
||||||
|
element contents should be normalized before passing its data to the
|
||||||
|
application.
|
||||||
|
|
||||||
|
A non-validating parser is one that ignores the DTD and happily parses
|
||||||
|
documents that do not follow the rules described in that DTD. They (usually)
|
||||||
|
don't support entity references and will not normalize attribute values or
|
||||||
|
element contents. A non-validating parser still has to verify that the XML
|
||||||
|
document follows the XML syntax rules.
|
||||||
|
|
||||||
|
It should be noted that a lot of XML documents found in the wild are not
|
||||||
|
described with a DTD, but instead use an alternative technology such as XML
|
||||||
|
schema. Wikipedia L<has more
|
||||||
|
information|https://en.wikipedia.org/wiki/XML#Schemas_and_validation> on this.
|
||||||
|
Using a validating parser for such documents would only introduce bloat and may
|
||||||
|
introduce L<potential security
|
||||||
|
vulnerabilities|https://en.wikipedia.org/wiki/Billion_laughs>.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue