yxml updates
This commit is contained in:
parent
4e7538db99
commit
94469e44e3
1 changed files with 66 additions and 23 deletions
89
dat/yxml
89
dat/yxml
|
|
@ -1,10 +1,11 @@
|
|||
=pod
|
||||
|
||||
I<*But see the L<Bugs and Limitations|/Bugs and Limitations> below.>
|
||||
I<*But see the L<Bugs and Limitations|/Bugs and Limitations> and L<Conformance Issues|/Conformance Issues> below.>
|
||||
|
||||
Yxml is a small (C<6 KiB>) non-validating yet mostly conforming XML parser
|
||||
written in C. Its primary goals are small binary size, simplicity and
|
||||
correctness. It also happens to be L<pretty fast|/Comparison>.
|
||||
Yxml is a small (C<6 KiB>) L<non-validating|/Validating vs. non-validating> yet
|
||||
mostly conforming XML parser written in C. Its primary goals are small binary
|
||||
size, simplicity and correctness. It also happens to be L<pretty
|
||||
fast|/Comparison>.
|
||||
|
||||
The code can be obtained from the L<git repo|http://g.blicky.net/yxml.git> and
|
||||
is available under a permissive MIT license. The only two files you need are
|
||||
|
|
@ -60,11 +61,6 @@ But let's not be I<too> optimistic, because there are also...
|
|||
|
||||
=over
|
||||
|
||||
=item * Element and Attribute names may only consist of ASCII characters.
|
||||
|
||||
=item * Does not verify that non-ASCII characters in attribute values or
|
||||
element contents are within the allowed character ranges.
|
||||
|
||||
=item * A conditional section in a C<< <!DOCTYPE ..> >> declaration will result
|
||||
in a parse error.
|
||||
|
||||
|
|
@ -77,33 +73,51 @@ not available through the API.
|
|||
|
||||
I hope to have these issues fixed in the near future.
|
||||
|
||||
=head3 Non-features
|
||||
|
||||
And now follows a list of things that are not supported and probably never will
|
||||
be. Most items on this list can be implemented on top of yxml.
|
||||
=head3 Conformance Issues
|
||||
|
||||
=over
|
||||
|
||||
=item * Does not verify all well-formedness constraints. In particular, does
|
||||
not verify that attribute names within the same element are unique, and does
|
||||
not verify that the contents of a C<< <!DOCTYPE ..> >> declaration follow the
|
||||
XML grammar.
|
||||
=item * Does not verify that non-ASCII characters in element names, element
|
||||
content, attribute names and attribute values are within the allowed Unicode
|
||||
character ranges.
|
||||
|
||||
=item * No helper functions to deal with namespaces. Yxml will parse XML files
|
||||
with namespaces just fine, but it's up to the application to do the rest.
|
||||
=item * Does not verify that attribute names within the same element are unique.
|
||||
|
||||
=item * Does not verify that the contents of a C<< <!DOCTYPE ..> >> declaration
|
||||
follow the XML grammar.
|
||||
|
||||
=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have
|
||||
to convert it to UTF-8 or something similar first.
|
||||
|
||||
=item * No support for custom entity references, neither through the API nor
|
||||
using C<< <!ENTITY> >>.
|
||||
|
||||
=back
|
||||
|
||||
These conformance issues are the result of the byte-oriented and minimal design
|
||||
of yxml, and I do not intent to fix these directly within the library. All of
|
||||
the above mentioned issues can be fixed on top of yxml (by the application, or
|
||||
by a wrapper) if strict conformance is required. With the exception of custom
|
||||
entity references, but I have a simple idea on how to support that in the
|
||||
future, too.
|
||||
|
||||
=head3 Non-features
|
||||
|
||||
And now follows a list of things that are not part of the core XML
|
||||
specification and are not directly supported. As with the conformance issues,
|
||||
these features can be implemented on top of yxml.
|
||||
|
||||
=over
|
||||
|
||||
=item * No helper functions to deal with namespaces. Yxml will parse XML files
|
||||
with namespaces just fine, but it's up to the application to do the rest.
|
||||
|
||||
=item * No DTD or XML Schema validation.
|
||||
|
||||
=item * No XSLT.
|
||||
|
||||
=item * No XPath.
|
||||
|
||||
=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have
|
||||
to convert it to UTF-8 or something similar first.
|
||||
|
||||
=item * Doesn't do your household chores.
|
||||
|
||||
=back
|
||||
|
|
@ -122,7 +136,7 @@ implementation is also included as an indication of the "theoretical" minimum.
|
|||
expat 2.1.0 MIT 162 139 194 432 1.47 1.09
|
||||
libxml2 2.9.1 MIT 464 328 518 816 2.53 1.75
|
||||
mxml 2.7 LGPL2+static 32 733 75 832 12.38 7.80
|
||||
yxml git MIT 6 015 31 448 1.18 0.73
|
||||
yxml git MIT 5 935 31 384 1.14 0.74
|
||||
|
||||
The code for these benchmarks is available in the
|
||||
L<bench/|http://g.blicky.net/yxml.git/tree/bench> directory on git. Some
|
||||
|
|
@ -164,3 +178,32 @@ with C<-Os> than with C<-O2>.
|
|||
libxml2 2.9.1 MIT 356 948 412 256 3.01 2.08
|
||||
mxml 2.7 LGPL2+static 27 725 71 704 11.70 7.44
|
||||
yxml git MIT 4 835 30 264 1.72 1.05
|
||||
|
||||
|
||||
=head2 Validating vs. non-validating
|
||||
|
||||
TL;DR: yxml does I<not> accept garbage XML documents, it will correctly handle
|
||||
and report issues if the input does not strictly follow the XML grammar.
|
||||
|
||||
The terms I<validating> and I<non-validating> have specific meanings within the
|
||||
context of XML. A validating parser is one that reads the doctype declaration
|
||||
(DTD) associated with a document, and validates that the contents of the
|
||||
document follow the rules described in the DTD. A DTD may also include
|
||||
instructions on how to parse the document, including the definition of custom
|
||||
entity references (C<&whatever;>) and instructions on how attribute values or
|
||||
element contents should be normalized before passing its data to the
|
||||
application.
|
||||
|
||||
A non-validating parser is one that ignores the DTD and happily parses
|
||||
documents that do not follow the rules described in that DTD. They (usually)
|
||||
don't support entity references and will not normalize attribute values or
|
||||
element contents. A non-validating parser still has to verify that the XML
|
||||
document follows the XML syntax rules.
|
||||
|
||||
It should be noted that a lot of XML documents found in the wild are not
|
||||
described with a DTD, but instead use an alternative technology such as XML
|
||||
schema. Wikipedia L<has more
|
||||
information|https://en.wikipedia.org/wiki/XML#Schemas_and_validation> on this.
|
||||
Using a validating parser for such documents would only introduce bloat and may
|
||||
introduce L<potential security
|
||||
vulnerabilities|https://en.wikipedia.org/wiki/Billion_laughs>.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue