For a project at work I need to convert Excel .xls files to a more suitable format for datamunging on a UNIX platform, csv for example. To achieve this I set out hacking a small converter using the POI project from Jakarta, according to the documentation traversing the fields in the file should be trivial. So it was, unfortunately it was equally trivial to blow the stack as well. Reading a 900kb xls file took too much memory to fit in the normal 64Mb heap allocation for the Java VM. Sigh. Even more painful was it since this was the smallest file out of the ten files I needed to convert. Assuming that this a 1Mb file just barely didnt fit the 64Mb allocation, to convert the largest file on my list I would need almost 800Mb of heapspace.. when all I did was reading one row at a time and writing it to a new file? Since I need to convert the files on the fly a few times per day this is utterly useless. It’s a good thing that this is opensource, it will be interesting to find out how they manage to blow up the data 64-fold in memory.
Fortunately the package provided an alternative way to traverse the file which consumed memory within the limits of sanity and in the end I had a small hack that converts a xls file into several csv files, one for each found sheet. If you need to do the same thing you can get it here under a BSD license.