|
|
||||||||||||||||||||||||||||||||||||||||||||||||
I was looking at some parsers using SAX recently, and was struck by the huge chains of if/elseif/else within the startElement and endElement callbacks. For example:
public void startElement(String uri, String name /* ... */) { if (name.equals("foo")) { // process foo element } else if (name.equals("bar")) { // process bar element } else if (name.equals("baz")) { // etc. } }
One way this could be improved is to use the string-interning SAX feature, which instructs the XML parser to intern all strings it hands back. This allows one to rewrite the above method as:
public void startElement(String uri, String name /* ... */) { if (name == "foo") { // process foo element } else if (name == "bar") { // process bar element } else if (name == "baz") { // etc. } }
Unfortunately, Xerces does not support this feature!
For my
personal projects I try to use gnujaxp, which does, but applications
that need to work with an arbitrary SAX parser cannot rely
on interning.
But even with lame parsers we can now use the string-switching capabilities
of CGLIB to gain back most of the speed of interning. I have
checked net.sf.cglib.util.StringSwitcher into CVS. The API is very
simple:
private static final int FOO = 0, BAR = 1, BAZ = 2; private static final StringSwitcher SWITCHER = StringSwitcher.create(new String[]{ "foo", "bar", "baz" }, new int[]{ FOO, BAR, BAZ }, true);
Of course the string and integer arrays can be as long as you want (it would not be worth it for only three). The third parameter is a boolean that indicates whether the switcher will only be called with strings from the argument array. For validated documents this is safe and gives a 3X speedup.
Then the startElement method becomes:
public void startElement(String uri, String name, ...) { switch (SWITCHER.intValue(name)) { case FOO: // process foo element break; case BAR: // process bar element break; case BAZ: // etc. } }
One nice thing about StringSwitcher is that the speed is O(log N) where
N is the number of elements. Both of the other methods are O(N), so at
some point switching will even be faster than interning. Even for a
modest number of elements the overhead of StringSwitcher is respectably
small. This is for 1000 repetitions over 10000 randomly chosen elements,
from a vocabulary of 23 distinct elements:
| Method | Time(ms) |
|---|---|
| equals | 3319 |
| intern | 351 |
| switch | 547 |