I have been getting questions concerning the performance of Tango in the XMLbenchmarks I have been running, with people wondering how something that is not C/C++ could be so fast. “They must be cheating!”
This post intends to explain how D, and subsequently Tango, can perform so well, even against C/C++. To read more about D, please visit the home page for D - D Programming Language. Tango is an alternate ’standard’ library for the D programming language, with a design philosophy of building a great library, with extensive documentation, and providing the greatest functionality in the most efficient manner possible. How do they do that you ask?
Array slicing - Using D’s array slicing feature, you don’t need to create a copy of a string in parsing. Tango maps the DOM nodes directly onto the input buffer, so for any part of the XML, only one text version exists, that being the input. VTD-XML does something similar, and RapidXml does similar, but actually mutates the input buffer while parsing, so creates one copy before starting the parse. Java6 SAX, on the other hand, turns character arrays into strings for element and attribute names, etc, and that means more allocation.
Native code - Yes, yes, I know. C/C++ are native as well. I write this because Java is not, and while todays’ JIT can compete and in cases outrun native code, if you are allocating hundreds of megabytes during processing, you are going to slow down in garbage collection. D is native code, to be fast right out of the gate, but also uses garbage collection. This just shows that the approach to design is MORE important than the choice of language. You will note that RapidXml is competing on speed, but when it comes to competing on efficency (speed/ram used), it loses, because it copies the input buffer to modify it during parse. Both RapidXml and Tango use internally managed freelists for nodes in the tree. One tweak to the RapidXml design could improve its efficiency an order of magnitude.
Avoid heap allocation as much as possible - More allocation means more CPU time. Tango’s internal philosophy is one in which you allocate as little as possible, and allocate on the stack if possible. This makes the code much faster.
Design choices - Tango XML has made specific design decisions which affect the speed, but can impact the ability of the parser compared to the parser that you are used to using today. For example, Tango XML, in its fastest incarnation, does NOT do entity decoding. If you are down at a system level, and you are building some xml proxy, you don’t need to do this. These sorts of design decisions are starting to show in the fastest/newest xml parsers. RapidXml fastest configuration does the same. Java StaX parsers have an option for the same. These sorts of omissions by design can be layered on top of the existing parser, so that YOU can make the tradeoff between speed and a particular feature. It seems as though good library design is a lost art since some point in the late 70’s/early 80’s. Features over efficiency have prevailed in the last 2 decades because ‘CPU is less expensive than programmer time’. I am not saying this is bad, just showing that good design, both in language and standard library, can create some impressive numbers.
Comments are open if other D people would like to add their $.02.
I created a benchmark similar to the one that VTD-XML uses. Basically, since most xml processing is mutation, this benchmark parses an input xml file, executes various xpaths on the file, modifying the document in 2 instances, and then serializes the new document. The steps are listed below:
Parse blog.xml, preparing to query the resulting document
Perform the following xpath queries, or their equivalents, once each:
count(//*) (10390 for this document)
//item (a list of those 10390 items)
/blog/item (similar to the previous, except you know the path)
//text() (all text nodes)
count(//item)
count(/blog/item)
/blog/item[@num=’a781′]
/blog/item/body/p/a
Mutate the document by removing the resulting nodes from the last 2 queries (performed inline with the queries)
serialize the modified document back out
I created this benchmark for 4 products (the ones that have xpath or xpath-like support, if you know of another one, please submit me some code, and I will be happy to run and aggregate the results):
After the run, I take the average cycle time, and turn that into the followin graph showing cycles per second. blog.xml is 1.3MB, so you can multiply these numbers by 1.3 to get the Megabytes per second number for each tool.
Some notes of the implementations:
Tango, while not actually having an actual xpath parser, has the requisite power in its query language to be able to pull this off with aplomb
You will note that the VTD code does NOT delete the /blog/item[@num=’a781′] node, because its XMLModifier is unable to perform deletes inside a delete. If someone knows how to fix this, please let me know
Would also note that these benchmarks were run on an Intel Q6700 quad core machine at 2.66 GHz, with 4GB of RAM, running Ubunu Linux.
I have added the recent RapidXml to the graphs. Note that the RAM usage for RapidXml skyrockets, cost it efficiency. Noted on their homepage, they make a copy of the input buffer, because the input is ‘destroyed’ while parsing. I would assume that this memory usage would fit the machine it is running on, but that is a HUGE amount of allocation.
I have started writing this post as a sidebar in comparing the parsers in my benchmarks. I will post what I know, and add more to it as I am informed by the community. Consider this a living post. Where something is just a fact, I list it as a Pro, such as language developed.
Product
Pros
Cons
Tango PullParser (pull)
Written in the D programming language
Tango devs are very aware of cost of allocation, and try to avoid it as often as possible.
Extremely fast, extremely memory efficient
Beta level code
Interfaces may change, since Tango is not yet 1.0
NOT W3C XML compliant (ignores DOCTYPE, etc)
Tango SaxParser (SAX)
Written in the D programming language, on top of Tango’s PullParser.
Straight port of Java SAX code, with a small amount of D flavor
Useful for porting existing SAX-based code
Beta level code
Interfaces may change, since Tango is not yet 1.0
As shown in the benchmarks, virtual calls (SAX does a lot of them) cost quite dearly
NOT W3C XML compliant
Tango Document (DOM)
Written in the D programming language, and a DOM-style tree of xml to manipulate
Faster than all non-tree code tested so far
Not DOM compliant
Integrated query language, inspired by XPath
Beta level code
Interfaces may change, since Tango is not yet 1.0
Not DOM compliant
NOT W3C XMLcompliant
Phobos std.xml (DOM)
Written in the D programming language
Shipped in D 2.0’s standard library
DOM-style tree object model
Not DOM compliant
Not DOM compliant
Requires previous knowledge of the structure of the xml being parsed. Cannot parse arbitrary XML
NOT W3C compliant
RapidXml (DOM)
Written in C++, with ultimate performance in mind
Highly configurable, use only the featureset you need.
Not DOM compliant
Not DOM compliant
Not W3C XML compliant (ignores DOCTYPE)
libxml2 (SAX)
Written in C
extremely robust - passes all 1800 tests from the OASIS XML Tests Suite
VTD-XML (DOM)
Written in Java, also availabe in C, C#
Indexes the XML for super fast querying
XPath Support
Java SAX (SAX)
Written in Java
Java DOM (DOM)
Written in Java
W3C DOM compliant
W3C XML compliant
XPath support
Java StaX parsers (pull)(includes Aalto, Woodstox, and javolution)
Aaron was kind enough to help me out with the RapidXml test. RapidXml is written in highly-tuned C++, and does give Tango a run for the money. I am really glad we are starting to add some non-Java alternatives, so we can see what native code can do. Without further ado, the code is bench_rapidxml.cpp, which was compiled via:
Push email is what I need the MOST in a phone, and the iPhone wasn’t cutting the mustard, until maybe sometime in the really near future that we aren’t allowed to know at this point, but the teaser seems to be enough.
I was helping someone on IRC in #d.tango try to use tango.text.xml to parse and display data from an xml document. We ended up building a simple example using HttpGet to get the document, Document to parse it, and Document’s xpath-like querying functionality to extract the useful bits.
import tango.io.File;
import tango.io.Stdout;
import tango.text.xml.Document;
import tango.net.http.HttpGet;
void main ()
{
auto doc = new Document!(char);
auto page = new HttpGet (\"http://www.google.com/ig/api?weather=London\");
auto content = cast (char[]) page.read;
doc.parse (content);
foreach( node; doc.query.descendant[\"forecast_conditions\"])
{
Stdout.formatln(\"forecast for {} is {} with a high of {}\",
node.query[\"day_of_week\"].attribute.nodes[0].value,
node.query[\"condition\"].nodes[0].getAttribute(\"data\").value,
node.query[\"high\"].nodes[0].getAttribute(\"data\").value);
}
}
The D programming language coupled with Tango as a standard library allows you to become a productive programmer.
Update: Please ignore the backslashes in the code if you are trying to run this example. For some reason, Wordpress is mucking around with the output.
From my mistaken typing in the aalto benchmark, I accidentally benchmarked the default Java6 StaX parser, so this graph changes the axis to allow more players, and adds the real Aalto numbers. Click to view the graphs in full size.
public class Aalto
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();
byte[] bytes = new byte[(int)length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}
is.close();
return bytes;
}
public static void main (String args[]) throws Exception
{
int iterations = 2000;
Average for hamlet.xml: 147.22 MB/sec
Average for soap_mid.xml: 43.80 MB/sec
As noted on the website, Aalto does seem to be quite fast on the “fast path”. Impressive for a Java solution at this point.
Update: 2008-03-03 13:15 PST: Thanks to Paul Findlay for catching my misspelling of the aalto.jar in the java run command. These numbers posted are actually for the default Java6 StaX parser, and not Aalto. Re-running, I get:
Average for hamlet.xml: 147.85 MB/sec
Average for soap_mid.xml: 85.95 MB/sec
Much more impressive numbers from the Java camp. Graphs will be updated later today.
public class Javolution
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();
byte[] bytes = new byte[(int)length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}
is.close();
return bytes;
}
public static void main (String args[]) throws Exception
{
int iterations = 2000;
public class Woodstox
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();
byte[] bytes = new byte[(int)length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}
is.close();
return bytes;
}
public static void main (String args[]) throws Exception
{
int iterations = 2000;
I added Java DOM to the graphs. Building a tree in memory is not the fastest way to parse a doc, but it is the easiest way to modify the doc after parsing. Java 6 DOM shows off not too terribly bad in the parsing speed, but with all the allocation going on, RAM usage skyrockets, and the efficiency graph shows the pain.
This goes to show you how good library design and the D Programming Language come together to kick serious butt.
Speed master Kris made some changes to Tango’s xml libraries today, and increased the performance of the parser to over 500MB/second! The machine is still the quad core 2.66GHz Intel box running Linux with 4GB of RAM. This run reflects revision 3286 of Tango SVN.
I will only update the images here, I think you should now know how I obtained them…
While SAX is showing slower in speed than DOM in Tango (I hope that is as weird to read as it was for me to write), you can see that the RAM usage graph puts it back into perspective.
I also forgot to note that this quad core box is now capable of parsing XML at over 2GB/sec if all 4 cores are used. Impressive indeed.
I decided to post a graph of speed versus resource usage as an interesting view into the overhead of the various programs. Since all benchmarks maxxed out the CPU at 100%, and all cached the data to be parsed, so disk wasn’t being used, that leaves RAM as a measurement of resource usage. The following is a chart of the parsing speed divided by the memory usage. Of note was xmlpull and xmlsax using 688KB of memory, so their numbers actually increased, showing not only the speed, but the conservation of resources. The RAM numbers were taken from top while the program was running, and represent the “Resident Set” so as not to make Java look horribly bad.
Update: 2008-02-24 15:45 PST - I updated the graph to offset the RAM usage by subtracting the file size from the total RAM, so that as the files get larger, they won’t be penalized. To put it into other words, the closer you can keep RAM usage to the filesize, decreasing overhead, the more resource efficient your parser is. I bet you are thinking Tango was designed that way from the beginning right about now, aren’t you?
Average parsing speed: 79.02 and 39.83 MB/sec, respectively. Note that I did remove the DTD declaration from hamlet.xml for this benchmark, since it was erroring out trying to find play.dtd.
Ouput from java -version:
stonecobra@jeff-home:~/xmlbench$ java -version
java version “1.6.0_03″
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Server VM (build 1.6.0_03-b05, mixed mode)
Many thanks to Nietsnie who was kind enough to write up a libxml2 sax benchmark, and run it on his quad core 2.66GHz box running linux. I have updated other benchmarks to reflect using his machine as well, to keep all on the same playing field. test.c is the benchmark code used, listed here:
Here is the current summary of the benchmarks run so far in a graphical form:
I hope to add more (libxml2, Xerces-C, etc) in the future. If you have C++ chops, I am looking for someone to code up one for MSXML. I will also be adding some Java benchmarks in here as well.
Update 2008-02-23 20:57 PST - Since Nietsnie was kind enough to donate his machine time, I re-ran all the current benchmarks on his box, to be able to include the libxml2 sax numbers as apples to apples. The graph is now updated, and includes the speed (Megabytes per second). Thanks to Robert Fraser for catching that.
The current benchmarking machine is an Ubuntu box with 4GB RAM sporting a quad-core Intel chip at 2.66GHz. In other words, much faster than my machine.
I hesitate to publish these numbers, as they are not direct apples to apples comparison. The reason is that the D Programming Language version 2.0’s std.xml is an xml parser, but one where you must know the schema beforehand, and register handlers for each element by name. I was unwilling/too lazy to write said handlers for the docs I was doing, so I found a method called check(), that according to the source code comments makes sure that a document is well-formed, and contains no bad characters. That’s as close as I am going to get to parsing these docs without code help from the community, so take this with a grain of salt or two. I am using DMD 2.011, using stdxml.d to benchmark, listed here:
Average for hamlet.xml: 6.51 MB/sec.
Average for soap_mid.xml: 4.39 MB/sec.
PS: I also wanted to note for any naysayers, that I left off -O -release and -inline because the phobos example actually runs SLOWER with any and/or all of these flags. I am not trying to slip anything by anyone here.