In wanting to see how well the Tango XML parsers fair in the world, I have started this benchmarking post. I will post all of my results, as well as the code and files that achieve these results here, so this post will be living as I expand and update it.
First off, baseline equipment. I have a Thinkpad T60p with 2.0Ghz Intel T2500 CPU, 2GB RAM, and a fairly slow hard drive. All of my tests will cache the document to be parsed in memory to try and elminate the hard drive as a potential bottleneck.
Next up, the files. I will be starting with hamlet.xml and soap_mid.xml. hamlet.xml weighs in at 274KB, and contains no attributes at all, very element heavy, with a moderate amount of whitespace (enough to make the file readable). soap_mid.xml weighs in at 132KB, uses namespaces, and looks like it was barfed onto the street (not so human readable).
Now, the benchmark. I will be writing and posting the benchmarking code, but the gist is this: load up the file into memory to eliminate the hard drive as a bottleneck, execute 10 iterations of parsing the document enough times to constitute at least 100MB of data. I intend to use the fastest configuration of the parser as possible, not the safest, and will keep the code open to allow suggested improvements from the community.
Popularity: 3%