From 94469e44e3e564a1ef8c05ad0f51b0c7db264543 Mon Sep 17 00:00:00 2001 From: Yorhel Date: Wed, 2 Oct 2013 16:27:05 +0200 Subject: [PATCH] yxml updates --- dat/yxml | 89 +++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 23 deletions(-) diff --git a/dat/yxml b/dat/yxml index 1359995..a5dfa7b 100644 --- a/dat/yxml +++ b/dat/yxml @@ -1,10 +1,11 @@ =pod -I<*But see the L below.> +I<*But see the L and L below.> -Yxml is a small (C<6 KiB>) non-validating yet mostly conforming XML parser -written in C. Its primary goals are small binary size, simplicity and -correctness. It also happens to be L. +Yxml is a small (C<6 KiB>) L yet +mostly conforming XML parser written in C. Its primary goals are small binary +size, simplicity and correctness. It also happens to be L. The code can be obtained from the L and is available under a permissive MIT license. The only two files you need are @@ -60,11 +61,6 @@ But let's not be I optimistic, because there are also... =over -=item * Element and Attribute names may only consist of ASCII characters. - -=item * Does not verify that non-ASCII characters in attribute values or -element contents are within the allowed character ranges. - =item * A conditional section in a C<< >> declaration will result in a parse error. @@ -77,33 +73,51 @@ not available through the API. I hope to have these issues fixed in the near future. -=head3 Non-features - -And now follows a list of things that are not supported and probably never will -be. Most items on this list can be implemented on top of yxml. +=head3 Conformance Issues =over -=item * Does not verify all well-formedness constraints. In particular, does -not verify that attribute names within the same element are unique, and does -not verify that the contents of a C<< >> declaration follow the -XML grammar. +=item * Does not verify that non-ASCII characters in element names, element +content, attribute names and attribute values are within the allowed Unicode +character ranges. -=item * No helper functions to deal with namespaces. Yxml will parse XML files -with namespaces just fine, but it's up to the application to do the rest. +=item * Does not verify that attribute names within the same element are unique. + +=item * Does not verify that the contents of a C<< >> declaration +follow the XML grammar. + +=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have +to convert it to UTF-8 or something similar first. =item * No support for custom entity references, neither through the API nor using C<< >>. +=back + +These conformance issues are the result of the byte-oriented and minimal design +of yxml, and I do not intent to fix these directly within the library. All of +the above mentioned issues can be fixed on top of yxml (by the application, or +by a wrapper) if strict conformance is required. With the exception of custom +entity references, but I have a simple idea on how to support that in the +future, too. + +=head3 Non-features + +And now follows a list of things that are not part of the core XML +specification and are not directly supported. As with the conformance issues, +these features can be implemented on top of yxml. + +=over + +=item * No helper functions to deal with namespaces. Yxml will parse XML files +with namespaces just fine, but it's up to the application to do the rest. + =item * No DTD or XML Schema validation. =item * No XSLT. =item * No XPath. -=item * Can't parse documents in a non-ASCII-compatible encoding. You'll have -to convert it to UTF-8 or something similar first. - =item * Doesn't do your household chores. =back @@ -122,7 +136,7 @@ implementation is also included as an indication of the "theoretical" minimum. expat 2.1.0 MIT 162 139 194 432 1.47 1.09 libxml2 2.9.1 MIT 464 328 518 816 2.53 1.75 mxml 2.7 LGPL2+static 32 733 75 832 12.38 7.80 - yxml git MIT 6 015 31 448 1.18 0.73 + yxml git MIT 5 935 31 384 1.14 0.74 The code for these benchmarks is available in the L directory on git. Some @@ -164,3 +178,32 @@ with C<-Os> than with C<-O2>. libxml2 2.9.1 MIT 356 948 412 256 3.01 2.08 mxml 2.7 LGPL2+static 27 725 71 704 11.70 7.44 yxml git MIT 4 835 30 264 1.72 1.05 + + +=head2 Validating vs. non-validating + +TL;DR: yxml does I accept garbage XML documents, it will correctly handle +and report issues if the input does not strictly follow the XML grammar. + +The terms I and I have specific meanings within the +context of XML. A validating parser is one that reads the doctype declaration +(DTD) associated with a document, and validates that the contents of the +document follow the rules described in the DTD. A DTD may also include +instructions on how to parse the document, including the definition of custom +entity references (C<&whatever;>) and instructions on how attribute values or +element contents should be normalized before passing its data to the +application. + +A non-validating parser is one that ignores the DTD and happily parses +documents that do not follow the rules described in that DTD. They (usually) +don't support entity references and will not normalize attribute values or +element contents. A non-validating parser still has to verify that the XML +document follows the XML syntax rules. + +It should be noted that a lot of XML documents found in the wild are not +described with a DTD, but instead use an alternative technology such as XML +schema. Wikipedia L on this. +Using a validating parser for such documents would only introduce bloat and may +introduce L.