Why not Java for scientific computing?
First off, we will assume that the researcher will have access to the following:
- Reasonably powerful computing systems: In a world of research and the abundance of super computers sitting idle, most researchers will have access to some CPU time on a super computer of sorts. If he does not, he will have to rethink his ability to work on larger scales.
- Sufficient resources and time to develop: No decent program can be written hurriedly and automagically assumed to work at the fastest possible speed. Optimization takes time in any language (including FORTRAN, C, and Java) so ample time should be set aside for it.
Now for the meat and potatoes. Java is in many respects very similar to C++. Both are object oriented and provide for a concept of classes. In the modeling world, object orientation makes life incredibly easy when compared to structured languages like C since essentially we can create a “human object” or a “mosquito object” and so on. These objects can have their own qualities and classifications to make them more like the actual object that they’re representing. This makes for excellent code reuse since in Java/C++ you can extend objects to create new objects with similar properties but also differing properties.
In my case I’ve been working in Java on a model that models the bite patterns of mosquitoes in various room sizes and various ratios of humans to mosquitoes. Now I started this project writing in C, a favorite language of mine for some things but for others its just down right painful. It wasn’t long before I got to the point where I was going to have to define how humans and mosquitoes move which is obviously going to be different. A mosquito can’t move near as quickly in a 30 second time period as a human can. So of course I either have to make a generic function that takes an argument of some sort (probably a double or an int) and that function based on the value entered would decide how to progress. Now some might argue that you can just have it perform an operation on the entered argument and just produce movement that way but thats not exactly how it works. Some species’ movement appears “more random” than other species. On a large scale, a human’s movement appears more random than a mosquitoes because the human can cover great distances compared to a mosquito and even though to the human they are moving with a purpose, over a long enough time period, they will still appear random. Mosquito movement on the other hand will appear more uniform. Given a time period of say 1000 seconds, the mosquito will appear to have covered a small area, say a 3×3 space quite uniformly while the human may appear to have covered a 10×10 space more randomly.
Consider this though: what if I’m modeling 20,000 of fish, where some movement patterns are similar between fish and some aren’t. Imagine the nightmare that that generic function would become trying to hash out whether the movement of a fish is more random or more purposeful and the distance it travels in a step in time. Suffice it to say that its not as simple as writing a generic movement function. Yet if you wrote all the different move functions into a C library you’d have an immensely complex piece of code that defined how all these fish moved. With Java instead you define a class for each type of fish and then when you need to change something, you just edit the .java file for that class and change it to your liking. This may seem like a lot of files to handle but I assure you its easier to find a file that you have an idea of the name of rather than trying to find a snippet of code in one big file.
Next on our list is the fact that Java will run on any platform that has a JVM available for it (with in reason, crossing JVM’s probably isn’t a great idea). That means if your cluster is a hodge-podge of *nix, Windows, and Macs as long as there’s a compatible JVM for all of them then your code will run without a hitch, or at least it should. Java has a built-in API for handling distributing bits of code to client nodes on a cluster (Remote Method Invocation or RMI* classes in javadocs). Of course you can do this with C or C++ using PVM/UPC/MPI or any number of solutions but when you add all of the hoop-jumping it almost isn’t worth it unless they application is just too large to port without a complete rewrite (which sometimes isn’t a bad idea). RMI aside, its just the fact that you can send that code to someone else and they don’t have to worry if the binary will run or if you’ve interpreted the size of a byte incorrectly for that platform. It will run.
As for Java instead of Python, well thats more of a design preference argument. I think that in many scientific applications like system modeling its good to have static types because it makes the code clear as to what each variable will be used for. I know there are many good arguments for why dynamic types are better but this is merely a design type. I like to know absolutely why my variables are holding. There’s also the fact of Python’s interpreting/pseudo-compiling vs. Java’s bytecode-compiling which I think is a moot point. Its really a design preference when it comes down to it.
That pretty much sums my thoughts on this… and again I may not be right 100% but I’m merely speaking from experience.
About this entry
You’re currently reading “Why not Java for scientific computing?,” an entry on Zac Brown
- Published:
- 07.02.07 / 11pm
- Category:
- general
No comments
Jump to comment form | comments rss [?] | trackback uri [?]