Although it’s a nice and very systematic solution, I couldn’t resist blogging about an already existing alternative solution for this problem:
Over the past few days, I spent some time implementing a typed bytes Python module in C. It’s probably not quite ready for production use yet, and it still falls back to the pure python module for floats, but it seems to work fine and already leads to substantial speedups.
For example, the Python program
from typedbytes import Output Output(open("test.tb", "wb")).writes(xrange(10**7))
needs 18.8 secs to finish on this laptop, whereas it requires only 0.9 secs after replacing typedbytes with ctypedbytes. Similarly, the running time for
from typedbytes import Input for item in Input(open("test.tb", "rb")).reads(): pass
can be reduced from 22.9 to merely 1.7 secs by using ctypedbytes instead of typedbytes.
Obviously, Dumbo programs can benefit from this faster typed bytes module as well, but the gains probably won’t be as spectacular as for the simple test programs above. To give it a go, make sure you’re using the latest version of Dumbo, build an egg for the ctypedbytes module, and add the following option to your start command:
-libegg <path to ctypedbytes egg>
From what I’ve seen so far, this can speed up Dumbo programs by 30%, which definitely makes it worth the effort if you ask me. In fact, the Dumbo program would now probably beat the Java program in the benchmark discussed here, but, unfortunately, this wouldn’t be a very fair comparison. Johan recently made me aware of the fact that it’s better to avoid Java’s split() method for strings when you don’t need regular expression support, and using a combination of substring() and indexOf() instead seems to make the Java program about 40% faster. So we’re not quite as fast as Java yet, but at least the gap got narrowed down some more.