Wednesday
27
Feb 2008

XML Benchmarks - Updated graphs with StaX parsers

(10:22 pm) Tags: [Software, Projects, D Programming Language]

Thanks to Paul Findlay, we finally have a possible contender in the Java camp with Aalto.

This goes to show you how good library design and the D Programming Language come together to kick serious butt.

PS: I am looking for anyone to do comparisons with MSXML, RapidXML, etc. More native code help is needed. Send me email at scott aht dotnot daht org.

Popularity: 59%

Comments: (3)

XML Benchmarks - Aalto

(10:11 pm) Tags: [Software, Projects]

Next up from Paul Findlay: Aalto. Aalto.java:

// requires jar files from http://www.cowtowncoder.com/hatchery/aalto/index.html
// (and maybe some command line switches as per the same page)

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

import java.io.*;

public class Aalto
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();

byte[] bytes = new byte[(int)length];

int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}

is.close();
return bytes;
}

public static void main (String args[]) throws Exception
{
int iterations = 2000;

XMLInputFactory xmlif = XMLInputFactory.newInstance();
xmlif.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
xmlif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, Boolean.FALSE);
xmlif.setProperty(XMLInputFactory.IS_COALESCING, Boolean.FALSE);

byte[] content = Aalto.getBytesFromFile(new File(args[0]));
ByteArrayInputStream bais = new ByteArrayInputStream(content);

for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();

for (int j = 0; j < iterations; j++) {
XMLStreamReader xr = xmlif.createXMLStreamReader(bais);
while (xr.hasNext()) {
xr.next();
}
xr.close();
bais.reset();
}

long stop = System.currentTimeMillis();
double timer = (stop - start) / 1000.0;
double total = (content.length * iterations) / (timer * (1024 * 1024));
System.out.print(total);
System.out.println(” MB/s”);
}
}
}

How it was run:

echo “aalto”
javac -classpath aalto-0.9.jar Aalto.java
echo “hamlet.xml”
java -cp aalto-0.9.jar:stax2-3.0pr1.jar:. Aalto hamlet.xml
echo “soap_mid.xml”
java -cp alto-0.9.jar:stax2-3.0pr1.jar:. Aalto soap_mid.xml

Results:

stonecobra@jeff-home:~/xmlbench$ ./all
aalto
hamlet.xml
119.02434356083324 MB/s
149.60675553887623 MB/s
149.81687738654318 MB/s
149.4390819546354 MB/s
150.23889675946305 MB/s
150.36596659038446 MB/s
150.4507992936795 MB/s
151.09010863912005 MB/s
151.00455365121567 MB/s
151.13292249818468 MB/s
soap_mid.xml
41.88683525261311 MB/s
43.82856162166171 MB/s
43.896140352961176 MB/s
43.86607965078486 MB/s
43.552910290707864 MB/s
44.16855218759427 MB/s
44.16093954502489 MB/s
44.13811735404555 MB/s
44.267755915728124 MB/s
44.22191426307118 MB/s

Average for hamlet.xml: 147.22 MB/sec
Average for soap_mid.xml: 43.80 MB/sec

As noted on the website, Aalto does seem to be quite fast on the “fast path”. Impressive for a Java solution at this point.

Update: 2008-03-03 13:15 PST: Thanks to Paul Findlay for catching my misspelling of the aalto.jar in the java run command. These numbers posted are actually for the default Java6 StaX parser, and not Aalto. Re-running, I get:

stonecobra@jeff-home:~/xmlbench$ ./all
aalto
hamlet.xml
138.74820070137716 MB/s
148.31704212905834 MB/s
148.73064235808528 MB/s
148.73064235808528 MB/s
148.56492576492863 MB/s
148.85517261961868 MB/s
148.97991159108764 MB/s
149.18827510380245 MB/s
149.14655578749824 MB/s
149.23001776611466 MB/s
soap_mid.xml
79.94439040256923 MB/s
85.83643927646042 MB/s
86.3571861274804 MB/s
86.64922936768157 MB/s
85.72156950158394 MB/s
86.5906628050809 MB/s
87.09101673699332 MB/s
87.03185164410135 MB/s
87.06142413871369 MB/s
87.20958857734321 MB/s

Average for hamlet.xml: 147.85 MB/sec
Average for soap_mid.xml: 85.95 MB/sec
Much more impressive numbers from the Java camp. Graphs will be updated later today.

Popularity: 37%

Comments: Comments Off

XML Benchmarks - Javolution

(9:56 pm) Tags: [Software, Projects]

Another benchmark from Paul Findlay, using Javolution. Here is Javolution.java:

// requires jar files from http://javolution.org/javolution-5.2.6-bin.zip

import javolution.xml.stream.XMLInputFactory;
import javolution.xml.stream.XMLStreamReader;

import java.io.*;

public class Javolution
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();

byte[] bytes = new byte[(int)length];

int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}

is.close();
return bytes;
}

public static void main (String args[]) throws Exception
{
int iterations = 2000;

XMLInputFactory factory = XMLInputFactory.newInstance();

byte[] content = Javolution.getBytesFromFile(new File(args[0]));
ByteArrayInputStream bais = new ByteArrayInputStream(content);

for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();

for (int j = 0; j < iterations; j++) {
XMLStreamReader xr = factory.createXMLStreamReader(bais);
while (xr.hasNext()) {
xr.next();
}
xr.close();
bais.reset();
}

long stop = System.currentTimeMillis();
double timer = (stop - start) / 1000.0;
double total = (content.length * iterations) / (timer * (1024 * 1024));
System.out.print(total);
System.out.println(” MB/s”);
}
}
}

javac -classpath javolution.jar Javolution.java
echo “hamlet.xml”
java -cp javolution.jar:. Javolution hamlet.xml
echo “soap_mid.xml”
java -cp javolution.jar:. Javolution soap_mid.xml
stonecobra@jeff-home:~/xmlbench$ ./all
javolution
hamlet.xml
50.6551508686574 MB/s
51.165395577138696 MB/s
51.19486307315164 MB/s
51.19486307315164 MB/s
51.18503680384777 MB/s
51.23420590740574 MB/s
51.229284746527114 MB/s
51.23420590740574 MB/s
51.23420590740574 MB/s
51.229284746527114 MB/s
soap_mid.xml
44.98275478234452 MB/s
45.975555578724986 MB/s
46.000317996451415 MB/s
46.00857806432652 MB/s
45.99206089395699 MB/s
46.066481704465005 MB/s
46.08305238133712 MB/s
46.08305238133712 MB/s
46.09134219108371 MB/s
46.066481704465005 MB/s

Average for hamlet.xml: 51.16 MB/sec
Average for soap_mid.xml: 45.93 MB/sec

Most of the Java camp is starting to look the same.

Popularity: 31%

Comments: Comments Off

XML Benchmarks - Woodstox

(9:38 pm) Tags: [Software, Projects]

Thanks to Paul Findlay for submitting 3 new Java benchmarks, the first of which is for Woodstox. The file is Woodstox.java, listed here:

// requires jar files from http://woodstox.codehaus.org/Download#Download-Stable(3.2.4)

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import org.codehaus.stax2.XMLInputFactory2;

import java.io.*;

public class Woodstox
{
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();

byte[] bytes = new byte[(int)length];

int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

if (offset < bytes.length) {
throw new IOException(”Could not completely read file “+file.getName());
}

is.close();
return bytes;
}

public static void main (String args[]) throws Exception
{
int iterations = 2000;

XMLInputFactory2 xmlif = (XMLInputFactory2) XMLInputFactory2.newInstance();
xmlif.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
xmlif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, Boolean.FALSE);
xmlif.setProperty(XMLInputFactory.IS_COALESCING, Boolean.FALSE);
xmlif.configureForSpeed();

byte[] content = Woodstox.getBytesFromFile(new File(args[0]));
ByteArrayInputStream bais = new ByteArrayInputStream(content);

for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();

for (int j = 0; j < iterations; j++) {
XMLStreamReader xr = xmlif.createXMLStreamReader(bais);
while (xr.hasNext()) {
xr.next();
}
xr.close();
bais.reset();
}

long stop = System.currentTimeMillis();
double timer = (stop - start) / 1000.0;
double total = (content.length * iterations) / (timer * (1024 * 1024));
System.out.print(total);
System.out.println(” MB/s”);
}
}
}

I built it and ran it with the following commands:

echo “Woodstox”
javac -classpath wstx-asl-3.2.4.jar:stax2-2.1.jar Woodstox.java
echo “hamlet.xml”
java -cp wstx-asl-3.2.4.jar:stax2-2.1.jar:. Woodstox hamlet.xml
echo “soap_mid.xml”
java -cp wstx-asl-3.2.4.jar:stax2-2.1.jar:. Woodstox soap_mid.xml

And the results:

stonecobra@jeff-home:~/xmlbench$ ./all
Woodstox
hamlet.xml
77.77020756723444 MB/s
79.63985120144747 MB/s
79.4618717961999 MB/s
79.77087698116867 MB/s
79.75894773382589 MB/s
80.0822948192333 MB/s
79.91430678694842 MB/s
80.26306749376882 MB/s
80.49322117357285 MB/s
80.49322117357285 MB/s
soap_mid.xml
47.38704850013582 MB/s
49.05643715110748 MB/s
49.38738844260493 MB/s
49.492325910804404 MB/s
49.77112883454436 MB/s
49.916573395720704 MB/s
50.121629741829885 MB/s
49.86799751658902 MB/s
50.13143636083631 MB/s
50.1608792561148 MB/s

Average for hamlet.xml: 79.76 MB/sec
Average for soap_mid.xml: 49.53 MB/sec

Popularity: 26%

Comments: Comments Off
Tuesday
26
Feb 2008

XML Benchmarks - Allocation hurts

(11:40 am) Tags: [Software, Projects, D Programming Language]

I added Java DOM to the graphs. Building a tree in memory is not the fastest way to parse a doc, but it is the easiest way to modify the doc after parsing. Java 6 DOM shows off not too terribly bad in the parsing speed, but with all the allocation going on, RAM usage skyrockets, and the efficiency graph shows the pain.

This goes to show you how good library design and the D Programming Language come together to kick serious butt.

Popularity: 32%

Comments: Comments Off

XML Benchmarks - Java 6 DOM

(11:29 am) Tags: [Software, Projects]

I have added a Java 6 DOM to the benchmark, so I could compare the Tango DOM. I used Dom.java, listed here:

import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import java.io.IOException;

public class Dom {

public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();

byte[] bytes = new byte[(int)length];

int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}

is.close();
return bytes;
}

public static void main(String[] args) {

if (args.length <= 0) {
System.out.println("Usage: java Dom filename");
return;
}
try {
String document = args[0];
int iterations = 2000;
byte[] content = getBytesFromFile(new File(document));
ByteArrayInputStream bais = new ByteArrayInputStream(content);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();
for (int j = 0; j < iterations; j++) {
parser.parse(new InputSource(bais));
bais.reset();
}
long stop = System.currentTimeMillis();
double timer = (stop - start) / 1000.0;
double total = (content.length * iterations) / (timer * (1024 * 1024));
System.out.print(total);
System.out.println(" MB/s");
}
}
catch (Exception e) {
e.printStackTrace();
}

}

}

Results on the quad core machine:

stonecobra@jeff-home:~/xmlbench$ java Dom hamlet.xml
47.97158513186668 MB/s
49.42985018499479 MB/s
49.71088484444904 MB/s
49.943635499212824 MB/s
49.86891851295874 MB/s
49.83630008373143 MB/s
49.93895912884773 MB/s
49.845615280008765 MB/s
49.845615280008765 MB/s
49.86891851295874 MB/s
stonecobra@jeff-home:~/xmlbench$ java Dom soap_mid.xml
28.16242814247465 MB/s
29.23571100413446 MB/s
29.16914517762231 MB/s
29.272451872527633 MB/s
29.295880544275597 MB/s
29.19240870915283 MB/s
29.3428505772142 MB/s
29.491456174060122 MB/s
29.54246180563062 MB/s
29.45755015408535 MB/s

Hamlet average: 49.63
Soap_mid average: 29.22

Popularity: 25%

Comments: Comments Off
Sunday
24
Feb 2008

XML Benchmarks - Tango ups the ante

(8:56 pm) Tags: [Software, Projects, D Programming Language]

Speed master Kris made some changes to Tango’s xml libraries today, and increased the performance of the parser to over 500MB/second! The machine is still the quad core 2.66GHz Intel box running Linux with 4GB of RAM. This run reflects revision 3286 of Tango SVN.

I will only update the images here, I think you should now know how I obtained them…

While SAX is showing slower in speed than DOM in Tango (I hope that is as weird to read as it was for me to write), you can see that the RAM usage graph puts it back into perspective.

I also forgot to note that this quad core box is now capable of parsing XML at over 2GB/sec if all 4 cores are used. Impressive indeed.

Tango is an alternate standard library for the D Programming Language.

Popularity: 31%

Comments: Comments Off

XML Benchmarks - Speed versus resources

(3:13 pm) Tags: [Software, Projects]

I decided to post a graph of speed versus resource usage as an interesting view into the overhead of the various programs. Since all benchmarks maxxed out the CPU at 100%, and all cached the data to be parsed, so disk wasn’t being used, that leaves RAM as a measurement of resource usage. The following is a chart of the parsing speed divided by the memory usage. Of note was xmlpull and xmlsax using 688KB of memory, so their numbers actually increased, showing not only the speed, but the conservation of resources. The RAM numbers were taken from top while the program was running, and represent the “Resident Set” so as not to make Java look horribly bad.

Update: 2008-02-24 15:45 PST - I updated the graph to offset the RAM usage by subtracting the file size from the total RAM, so that as the files get larger, they won’t be penalized. To put it into other words, the closer you can keep RAM usage to the filesize, decreasing overhead, the more resource efficient your parser is. I bet you are thinking Tango was designed that way from the beginning right about now, aren’t you?

Popularity: 25%

Comments: Comments Off
Saturday
23
Feb 2008

XML Benchmarks - Java 6 SAX

(10:26 pm) Tags: [General, Software, Projects]

Next is Java 6’s default SAX implementation, Xerces. The code used was Sax.java, listed here:

import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.*;

public class Sax extends DefaultHandler
{

public Sax ()
{
super();
}

public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
long length = file.length();

byte[] bytes = new byte[(int)length];

int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}

is.close();
return bytes;
}

public static void main (String args[]) throws Exception
{
int iterations = 2000;
XMLReader xr = XMLReaderFactory.createXMLReader();
System.out.println(xr.getClass().getName());
Sax handler = new Sax();
xr.setContentHandler(handler);
xr.setErrorHandler(handler);
byte[] content = Sax.getBytesFromFile(new File("soap_mid.xml"));
ByteArrayInputStream bais = new ByteArrayInputStream(content);
for (int i = 0; i < 10; i++) {
long start = System.currentTimeMillis();
for (int j = 0; j < iterations; j++) {
xr.parse(new InputSource(bais));
bais.reset();
}
long stop = System.currentTimeMillis();
double timer = (stop - start) / 1000.0;
double total = (content.length * iterations) / (timer * (1024 * 1024));
System.out.print(total);
System.out.println(" MB/s");
}
}

}

Results for hamlet.xml and soap_mid.xml, respectively:

stonecobra@jeff-home:~/xmlbench$ javac Sax.java
stonecobra@jeff-home:~/xmlbench$ java Sax
com.sun.org.apache.xerces.internal.parsers.SAXParser
76.29067136262248 MB/s
78.73458569472892 MB/s
79.06138207768956 MB/s
79.43820129521801 MB/s
79.35546548074598 MB/s
79.01453088831019 MB/s
79.17875348813743 MB/s
79.23756997416339 MB/s
79.98621528135779 MB/s
79.86643957713294 MB/s
stonecobra@jeff-home:~/xmlbench$ vi Sax.java
stonecobra@jeff-home:~/xmlbench$ javac Sax.java
stonecobra@jeff-home:~/xmlbench$ java Sax
com.sun.org.apache.xerces.internal.parsers.SAXParser
37.13359003481657 MB/s
39.63213785618475 MB/s
39.705837787112095 MB/s
39.75512354386879 MB/s
39.77363726175634 MB/s
40.43266076064926 MB/s
40.59280279471393 MB/s
40.567094876541226 MB/s
40.34988523468258 MB/s
40.38804716901551 MB/s

Average parsing speed: 79.02 and 39.83 MB/sec, respectively. Note that I did remove the DTD declaration from hamlet.xml for this benchmark, since it was erroring out trying to find play.dtd.

Ouput from java -version:

stonecobra@jeff-home:~/xmlbench$ java -version
java version “1.6.0_03″
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Server VM (build 1.6.0_03-b05, mixed mode)

Popularity: 32%

Comments: Comments Off

XML Benchmarks - libxml2 sax

(9:24 pm) Tags: [Software, Projects]

Many thanks to Nietsnie who was kind enough to write up a libxml2 sax benchmark, and run it on his quad core 2.66GHz box running linux. I have updated other benchmarks to reflect using his machine as well, to keep all on the same playing field. test.c is the benchmark code used, listed here:

#include
#include
#include
#include
#include

Results for hamlet.xml:

eff@jeff-home:~/code/tango/example/text$ gcc -I/usr/include/libxml2 test.c -lxml2 -o test -lrt -O2
jeff@jeff-home:~/code/tango/example/text$ ./test
Throughput: 117.783120 MB/s
Throughput: 127.832775 MB/s
Throughput: 127.837450 MB/s
Throughput: 127.837006 MB/s
Throughput: 127.857626 MB/s
Throughput: 127.719954 MB/s
Throughput: 127.850622 MB/s
Throughput: 127.815921 MB/s
Throughput: 127.808884 MB/s
Throughput: 127.489089 MB/s
Average parsing speed: 126.78 MB/sec. Results for soap_mid.xml:

jeff@jeff-home:~/code/tango/example/text$ gcc test.c -o test -I/usr/include/libxml2 -lxml2 -lrt -O2
jeff@jeff-home:~/code/tango/example/text$ ./test
Throughput: 78.227945 MB/s
Throughput: 78.989423 MB/s
Throughput: 79.249715 MB/s
Throughput: 79.010625 MB/s
Throughput: 78.373106 MB/s
Throughput: 78.914673 MB/s
Throughput: 77.875016 MB/s
Throughput: 77.820623 MB/s
Throughput: 78.135982 MB/s
Throughput: 77.797501 MB/s

Average parsing speed: 78.3MB/sec.

Popularity: 25%

Comments: Comments Off

XML Benchmarks - Current Summary

(1:41 am) Tags: [Software, Projects, D Programming Language]

Here is the current summary of the benchmarks run so far in a graphical form:

I hope to add more (libxml2, Xerces-C, etc) in the future. If you have C++ chops, I am looking for someone to code up one for MSXML. I will also be adding some Java benchmarks in here as well.

Update 2008-02-23 20:57 PST - Since Nietsnie was kind enough to donate his machine time, I re-ran all the current benchmarks on his box, to be able to include the libxml2 sax numbers as apples to apples. The graph is now updated, and includes the speed (Megabytes per second). Thanks to Robert Fraser for catching that.

The current benchmarking machine is an Ubuntu box with 4GB RAM sporting a quad-core Intel chip at 2.66GHz. In other words, much faster than my machine.

Popularity: 29%

Comments: (1)

XML Benchmarks - Phobos std.xml

(1:27 am) Tags: [Software, Projects, D Programming Language]

I hesitate to publish these numbers, as they are not direct apples to apples comparison. The reason is that the D Programming Language version 2.0’s std.xml is an xml parser, but one where you must know the schema beforehand, and register handlers for each element by name. I was unwilling/too lazy to write said handlers for the docs I was doing, so I found a method called check(), that according to the source code comments makes sure that a document is well-formed, and contains no bad characters. That’s as close as I am going to get to parsing these docs without code help from the community, so take this with a grain of salt or two. I am using DMD 2.011, using stdxml.d to benchmark, listed here:

module stdxml;

import std.stdio;
import std.xml;
import std.perf;

void benchmark (int iterations, invariant char[] content) {
auto elapsed = new HighPerformanceCounter();
elapsed.start;

for (auto i=0; ++i < iterations;) {
check(content);
}

elapsed.stop;
float timer = elapsed.milliseconds / 1000.0;
auto total = (content.length * iterations) / (timer * (1024 * 1024));
writef(total);
writefln(" MB/s");
}

void main()
{
invariant char[] content = import ("hamlet.xml");
for (int i = 11; --i;)
benchmark (10, content);
}

You will note that iterations are way down compared to others, because I really couldn’t wait around all night for the results. Results are:

D:\d2>dmd\bin\dmd -J. stdxml.d
D:\d2\dmd\bin\..\..\dm\bin\link.exe stdxml,,,user32+kernel32/noi;

D:\d2>stdxml
1.65656 MB/s
1.68694 MB/s
1.67213 MB/s
1.68908 MB/s
1.67318 MB/s
1.68481 MB/s
1.68268 MB/s
1.68162 MB/s
1.68588 MB/s
1.67951 MB/s

Average checking speed: 1.68MB/sec. Can I get some help from phobos people? Scripting languages are faster than this… Results for soap_mid.xml:

D:\d2>dmd\bin\dmd -J. stdxml.d
D:\d2\dmd\bin\..\..\dm\bin\link.exe stdxml,,,user32+kernel32/noi;

D:\d2>stdxml
1.18841 MB/s
1.22127 MB/s
1.2201 MB/s
1.2201 MB/s
1.23065 MB/s
1.21894 MB/s
1.22829 MB/s
1.22829 MB/s
1.22829 MB/s
1.23065 MB/s

Average checking speed: 1.22 MB/sec. So it is not just Tango that slows down with the attributes…

If someone from the phobos community wants to update the code run here, just leave a comment or send me mail privately. scott at you can guess where.

Update 2008-02-23 19:57 PST
Running on a quad core 2.66GHz box yielded:

stonecobra@jeff-home:~/xmlbench$ ~/d2/dmd/bin/dmd -J. stdxml.d
gcc stdxml.o -o stdxml -m32 -Xlinker -L/home/stonecobra/d2/dmd/bin/../lib -lphobos2 -lpthread -lm
stonecobra@jeff-home:~/xmlbench$ ./stdxml
6.47343 MB/s
6.50501 MB/s
6.5691 MB/s
6.48918 MB/s
6.50501 MB/s
6.50501 MB/s
6.52092 MB/s
6.48918 MB/s
6.52092 MB/s
6.47343 MB/s
stonecobra@jeff-home:~/xmlbench$ vi stdxml.d
stonecobra@jeff-home:~/xmlbench$ ~/d2/dmd/bin/dmd -J. stdxml.d
gcc stdxml.o -o stdxml -m32 -Xlinker -L/home/stonecobra/d2/dmd/bin/../lib -lphobos2 -lpthread -lm
stonecobra@jeff-home:~/xmlbench$ ./stdxml
4.39338 MB/s
4.3979 MB/s
4.38586 MB/s
4.37986 MB/s
4.40244 MB/s
4.37986 MB/s
4.39338 MB/s
4.40092 MB/s
4.37089 MB/s
4.37089 MB/s

Average for hamlet.xml: 6.51 MB/sec.
Average for soap_mid.xml: 4.39 MB/sec.

PS: I also wanted to note for any naysayers, that I left off -O -release and -inline because the phobos example actually runs SLOWER with any and/or all of these flags. I am not trying to slip anything by anyone here.

Popularity: 14%

Comments: Comments Off

XML Benchmarks - Tango SaxParser

(1:09 am) Tags: [Software, Projects, D Programming Language]

Next is Tango’s SaxParser, a SAX API layered on top of PullParser for the D Programming Language. It passes parsing events through to a handler, push-style. I used the current SVN HEAD of Tango, which is current revision 3247, and compiled with DMD v1.024. I count the number of elements, attributes, and text nodes, along with their lengths, to attempt to compare to the benchmarks here. Apparently, Tango is beating them masterfully. soap_mid.xml is the same file (by size, and I suspect, origin) as their “soap2.xml”. And they have an extra 200MHz of CPU in their benchmark. The benchmark code used was xmlsax.d, listed here:

module xmlsax;

import tango.io.Stdout;
import tango.time.StopWatch;

import tango.text.xml.SaxParser;

void benchmark (int iterations, SaxParser!(char) parser, char[] content)
{
StopWatch elapsed;
elapsed.start;

for (auto i=0; ++i < iterations;)
{
parser.parse;
parser.reset;
}

Stdout.formatln ("{} MB/s", (content.length * iterations) / (elapsed.stop * (1024 * 1024)));
}

void main()
{
auto content = import ("hamlet.xml");
auto parser = new SaxParser!(char);
auto handler = new LengthHandler!(char);
parser.setSaxHandler(handler);
parser.setContent(content);

for (int i = 11; --i;)
benchmark (2000, parser, content);
}

private class LengthHandler(Ch = char) : SaxHandler!(Ch) {

public uint elm;
public uint att;
public uint txt;
public uint elmlen;
public uint attlen;
public uint txtlen;

public void startElement(Ch[] uri, Ch[] localName, Ch[] qName, Attribute!(Ch)[] atts) {
elm++;
elmlen += localName.length;
foreach (inout attr; atts) {
att++;
attlen += attr.localName.length;
}
}

public void characters(Ch[] ch) {
txt++;
txtlen += ch.length;
}

}

Results for hamlet.xml:

D:\d\tango\example\text>jake xmlsax.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmlsax+Stdout+Print+IBuffer

D:\d\tango\example\text>xmlsax
258.59 MB/s
259.45 MB/s
258.91 MB/s
258.70 MB/s
258.79 MB/s
259.27 MB/s
259.37 MB/s
259.94 MB/s
258.72 MB/s
258.64 MB/s

Average parsing speed: 259.04 MB/sec. Results for soap_mid.xml:

D:\d\tango\example\text>jake xmlsax.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmlsax+Stdout+Print+IBuffer

D:\d\tango\example\text>xmlsax
179.96 MB/s
180.15 MB/s
180.97 MB/s
179.67 MB/s
180.76 MB/s
180.46 MB/s
179.90 MB/s
178.61 MB/s
179.20 MB/s
180.47 MB/s

Average parsing speed: 180.02 MB/sec. Sax seems to do a bit better than DOM with the attributes, but still shows a significant overhead to PullParser.

Update 2008-02-23 19:57 PST
Running on a quad core 2.66GHz box yielded:

stonecobra@jeff-home:~/xmlbench$ rebuild xmlsax.d -J./ -full -O -release
stonecobra@jeff-home:~/xmlbench$ ./xmlsax
348.91 MB/s
347.93 MB/s
344.69 MB/s
348.59 MB/s
346.96 MB/s
348.17 MB/s
347.37 MB/s
348.23 MB/s
347.12 MB/s
348.31 MB/s
stonecobra@jeff-home:~/xmlbench$ vi xmlsax.d
stonecobra@jeff-home:~/xmlbench$ rebuild xmlsax.d -J./ -full -O -release
stonecobra@jeff-home:~/xmlbench$ ./xmlsax
243.25 MB/s
242.80 MB/s
239.49 MB/s
236.53 MB/s
236.85 MB/s
237.12 MB/s
243.53 MB/s
238.09 MB/s
244.43 MB/s
241.99 MB/s

Average for hamlet.xml: 347.63 MB/sec.
Average for soap_mid.xml: 240.41MB/sec.

Popularity: 13%

Comments: Comments Off

XML Benchmarks - Tango Document

(12:56 am) Tags: [Software, Projects, D Programming Language]

Next is Tango’s Document, a DOM-ish parser built on top of PullParser fro the D Programming Language. It builds an in-memory tree of the document being parsed, which can then be easily navigated/edited in-memory. I used the current SVN HEAD of Tango, which is current revision 3247, and compiled with DMD v1.024. The benchmark code used was xmldom.d, listed here:

import tango.io.Stdout;
import tango.time.StopWatch;
import tango.text.xml.Document;

/*******************************************************************************

*******************************************************************************/

void bench (int iterations)
{
StopWatch elapsed;

auto doc = new Document!(char);
auto content = import (”hamlet.xml”);

elapsed.start;
for (auto i=0; ++i < iterations;)
doc.parse (content);

Stdout.formatln ("{} MB/s", (content.length * iterations) / (elapsed.stop * (1024 * 1024)));
}

/*******************************************************************************

*******************************************************************************/

void main()
{
for (int i=11; --i;)
bench (2000);
}

It was compiled using: jake xmldom.d -O -release -inline -J.
Resulting run was:

D:\d\tango\example\text>jake xmldom.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmldom+Stdout+Print+IBuffer…

D:\d\tango\example\text>xmldom
240.39 MB/s
239.77 MB/s
239.34 MB/s
240.70 MB/s
242.35 MB/s
241.36 MB/s
241.66 MB/s
242.81 MB/s
241.96 MB/s
241.92 MB/s

Average of the resulting run: 241.23 MB/sec parsing. That is one speedy little DOM builder. Run for soap_mid.xml brought back:

D:\d\tango\example\text>jake xmldom.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmldom+Stdout+Print+IBuffer

D:\d\tango\example\text>xmldom
117.36 MB/s
118.34 MB/s
118.28 MB/s
118.38 MB/s
117.57 MB/s
118.28 MB/s
118.73 MB/s
118.22 MB/s
118.12 MB/s
118.63 MB/s

Average of the runs was 118.19 MB/sec parsing. Looks like a similar result to PullParser. Attributes must have a fairly high cost in this implementation.

Update 2008-02-23 19:57 PST
Running on a quad core 2.66GHz box yielded:

stonecobra@jeff-home:~/xmlbench$ rebuild xmldom.d -J./ -full -O -release -inline
stonecobra@jeff-home:~/xmlbench$ ./xmldom
334.71 MB/s
333.09 MB/s
335.62 MB/s
336.30 MB/s
338.15 MB/s
335.38 MB/s
337.06 MB/s
335.71 MB/s
337.62 MB/s
337.03 MB/s
stonecobra@jeff-home:~/xmlbench$ vi xmldom.d
stonecobra@jeff-home:~/xmlbench$ rebuild xmldom.d -J./ -full -O -release -inline
stonecobra@jeff-home:~/xmlbench$ ./xmldom
164.46 MB/s
166.30 MB/s
166.11 MB/s
166.95 MB/s
166.73 MB/s
167.22 MB/s
167.14 MB/s
165.83 MB/s
166.99 MB/s
166.95 MB/s

Average for hamlet.xml: 336.07 MB/sec.
Average for soap_mid.xml: 166.47MB/sec.

Popularity: 11%

Comments: Comments Off

XML Benchmarks - Tango PullParser

(12:41 am) Tags: [Software, Projects, D Programming Language]

First up, Tango’s tango.text.xml.PullParser. You instantiate the parser, start the parse, and then continue to ask for the next ‘node’. I used the current SVN HEAD of Tango, which at the time of writing was revision 3247, compiled with DMD v1.024. The benchmark code ran is xmlpull.d, and is listed here:

import tango.io.Stdout;
import tango.time.StopWatch;

import tango.text.xml.PullParser;

void benchmark (int iterations)
{
StopWatch elapsed;

auto content = import (”hamlet.xml”);
auto parser = new PullParser!(char) (content);

elapsed.start;
for (auto i=0; ++i < iterations;)
{
while (parser.next) {}
parser.reset;
}
Stdout.formatln ("{} MB/s", (content.length * iterations) / (elapsed.stop * (1024 * 1024)));
}

void main()
{
for (int i = 11; --i;)
benchmark (2000);
}

It was compiled with the command: jake xmlpull.d -O -release -inline -J.. Results of the run:

D:\d\tango\example\text>jake xmlpull.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmlpull+Stdout+Print+IBuffer+

D:\d\tango\example\text>xmlpull
316.82 MB/s
315.71 MB/s
315.49 MB/s
317.55 MB/s
316.77 MB/s
316.69 MB/s
316.64 MB/s
317.43 MB/s
316.16 MB/s
317.90 MB/s

Average of the resulting run: 316.72 MB/sec parsing. Replacing hamlet.xml with soap_mid.xml in the above code results in:

D:\d\tango\example\text>jake xmlpull.d -O -release -inline -J.
d:\d\dmd\bin\..\..\dm\bin\link.exe xmlpull+Stdout+Print+IBuffer+ICon

D:\d\tango\example\text>xmlpull
229.19 MB/s
227.78 MB/s
229.12 MB/s
229.71 MB/s
228.63 MB/s
229.40 MB/s
228.98 MB/s
229.21 MB/s
230.02 MB/s
228.56 MB/s

Average of the resulting run: 229.06. Lower than hamlet.xml, probably due to the attribute processing required, but also possibly the lack of whitespace.

Update 2008-02-23 19:57 PST
Running on a quad core 2.66GHz box yielded:

stonecobra@jeff-home:~/xmlbench$ rebuild xmlpull.d -J./ -full -O -release
stonecobra@jeff-home:~/xmlbench$ ./xmlpull
469.57 MB/s
477.30 MB/s
478.07 MB/s
477.20 MB/s
477.90 MB/s
477.68 MB/s
476.67 MB/s
477.75 MB/s
478.29 MB/s
477.24 MB/s
stonecobra@jeff-home:~/xmlbench$ vi xmlpull.d
stonecobra@jeff-home:~/xmlbench$ rebuild xmlpull.d -J./ -full -O -release
stonecobra@jeff-home:~/xmlbench$ ./xmlpull
337.11 MB/s
341.61 MB/s
340.24 MB/s
341.51 MB/s
341.09 MB/s
341.21 MB/s
341.96 MB/s
329.75 MB/s
338.79 MB/s
338.23 MB/s

Average for hamlet.xml: 476.77 MB/sec.
Average for soap_mid.xml: 339.15MB/sec. Now we are talking some speed!!! This D Programming Language has some merit.

Popularity: 11%

Comments: Comments Off

XML Benchmarks - Introduction

(12:38 am) Tags: [Software, Projects]

In wanting to see how well the Tango XML parsers fair in the world, I have started this benchmarking post. I will post all of my results, as well as the code and files that achieve these results here, so this post will be living as I expand and update it.

First off, baseline equipment. I have a Thinkpad T60p with 2.0Ghz Intel T2500 CPU, 2GB RAM, and a fairly slow hard drive. All of my tests will cache the document to be parsed in memory to try and elminate the hard drive as a potential bottleneck.

Next up, the files. I will be starting with hamlet.xml and soap_mid.xml. hamlet.xml weighs in at 274KB, and contains no attributes at all, very element heavy, with a moderate amount of whitespace (enough to make the file readable). soap_mid.xml weighs in at 132KB, uses namespaces, and looks like it was barfed onto the street (not so human readable).

Now, the benchmark. I will be writing and posting the benchmarking code, but the gist is this: load up the file into memory to eliminate the hard drive as a bottleneck, execute 10 iterations of parsing the document enough times to constitute at least 100MB of data. I intend to use the fastest configuration of the parser as possible, not the safest, and will keep the code open to allow suggested improvements from the community.

Popularity: 5%

Comments: Comments Off
Friday
22
Feb 2008

Tango XML has landed

(4:17 pm) Tags: [Software, Projects, D Programming Language]

Tango has landed XML support in the tango.text.xml package. Current highlights include a pull parser, a DOM parser, and a SAX parser, as well as a budding XPath like package.

What makes these different you ask? Why another damn XML parser? Glad you asked. These components are intended to be high-speed, non-allocating tools that can be used at a server or appliance level with much less overhead than other solutions. For example, the SAX parser needs just a few KB of memory over and above the size of the content being parsed.

If you need a fast XML parser, check them out. I am still writing up my benchmarking output, so stay tuned for a post on that shortly.

What is Tango? Tango is an alternate standard library for the D Programming Language.

Popularity: 11%

Comments: Comments Off