I have been getting questions concerning the performance of Tango in the XML benchmarks I have been running, with people wondering how something that is not C/C++ could be so fast. “They must be cheating!”
This post intends to explain how D, and subsequently Tango, can perform so well, even against C/C++. To read more about D, please visit the home page for D - D Programming Language. Tango is an alternate ’standard’ library for the D programming language, with a design philosophy of building a great library, with extensive documentation, and providing the greatest functionality in the most efficient manner possible. How do they do that you ask?
- Array slicing - Using D’s array slicing feature, you don’t need to create a copy of a string in parsing. Tango maps the DOM nodes directly onto the input buffer, so for any part of the XML, only one text version exists, that being the input. VTD-XML does something similar, and RapidXml does similar, but actually mutates the input buffer while parsing, so creates one copy before starting the parse. Java6 SAX, on the other hand, turns character arrays into strings for element and attribute names, etc, and that means more allocation.
- Native code - Yes, yes, I know. C/C++ are native as well. I write this because Java is not, and while todays’ JIT can compete and in cases outrun native code, if you are allocating hundreds of megabytes during processing, you are going to slow down in garbage collection. D is native code, to be fast right out of the gate, but also uses garbage collection. This just shows that the approach to design is MORE important than the choice of language. You will note that RapidXml is competing on speed, but when it comes to competing on efficency (speed/ram used), it loses, because it copies the input buffer to modify it during parse. Both RapidXml and Tango use internally managed freelists for nodes in the tree. One tweak to the RapidXml design could improve its efficiency an order of magnitude.
- Avoid heap allocation as much as possible - More allocation means more CPU time. Tango’s internal philosophy is one in which you allocate as little as possible, and allocate on the stack if possible. This makes the code much faster.
- Design choices - Tango XML has made specific design decisions which affect the speed, but can impact the ability of the parser compared to the parser that you are used to using today. For example, Tango XML, in its fastest incarnation, does NOT do entity decoding. If you are down at a system level, and you are building some xml proxy, you don’t need to do this. These sorts of design decisions are starting to show in the fastest/newest xml parsers. RapidXml fastest configuration does the same. Java StaX parsers have an option for the same. These sorts of omissions by design can be layered on top of the existing parser, so that YOU can make the tradeoff between speed and a particular feature. It seems as though good library design is a lost art since some point in the late 70’s/early 80’s. Features over efficiency have prevailed in the last 2 decades because ‘CPU is less expensive than programmer time’. I am not saying this is bad, just showing that good design, both in language and standard library, can create some impressive numbers.
Comments are open if other D people would like to add their $.02.