Why not Java for scientific computing?

As I write this there are hundreds of scientists trudging along with their experiments using FORTRAN (probably F77, not even F90 or F95) and C/C++ to perform their various experiments and analyze their data. While C/C++ of course is arguably the standard in the scientific programming world for creating various experimental models in and Python seems to be gaining its own following as well but why doesn’t Java get a little more attention? Of course younger generations of scientists are willing to embrace Java and seem to be but its the “old school” scientists that are slowing down the uptake of Java into programs. They require that their research partners work with the antiquated systems written several millennia before even the old researcher himself was born. This of course makes development slower than traffic during rush hour in Miami. Don’t get me wrong, C is a great language and FORTRAN, if written properly, looks very nice but both aren’t exactly the quickest languages to develop in. In the discourse that follows I will first lay down some premises that I work with to build the rest of the argument.

First off, we will assume that the researcher will have access to the following:

  1. Reasonably powerful computing systems: In a world of research and the abundance of super computers sitting idle, most researchers will have access to some CPU time on a super computer of sorts. If he does not, he will have to rethink his ability to work on larger scales.
  2. Sufficient resources and time to develop: No decent program can be written hurriedly and automagically assumed to work at the fastest possible speed. Optimization takes time in any language (including FORTRAN, C, and Java) so ample time should be set aside for it.

Now for the meat and potatoes. Java is in many respects very similar to C++. Both are object oriented and provide for a concept of classes. In the modeling world, object orientation makes life incredibly easy when compared to structured languages like C since essentially we can create a “human object” or a “mosquito object” and so on. These objects can have their own qualities and classifications to make them more like the actual object that they’re representing. This makes for excellent code reuse since in Java/C++ you can extend objects to create new objects with similar properties but also differing properties.

In my case I’ve been working in Java on a model that models the bite patterns of mosquitoes in various room sizes and various ratios of humans to mosquitoes. Now I started this project writing in C, a favorite language of mine for some things but for others its just down right painful. It wasn’t long before I got to the point where I was going to have to define how humans and mosquitoes move which is obviously going to be different. A mosquito can’t move near as quickly in a 30 second time period as a human can. So of course I either have to make a generic function that takes an argument of some sort (probably a double or an int) and that function based on the value entered would decide how to progress. Now some might argue that you can just have it perform an operation on the entered argument and just produce movement that way but thats not exactly how it works. Some species’ movement appears “more random” than other species. On a large scale, a human’s movement appears more random than a mosquitoes because the human can cover great distances compared to a mosquito and even though to the human they are moving with a purpose, over a long enough time period, they will still appear random. Mosquito movement on the other hand will appear more uniform. Given a time period of say 1000 seconds, the mosquito will appear to have covered a small area, say a 3×3 space quite uniformly while the human may appear to have covered a 10×10 space more randomly.

Consider this though: what if I’m modeling 20,000 of fish, where some movement patterns are similar between fish and some aren’t. Imagine the nightmare that that generic function would become trying to hash out whether the movement of a fish is more random or more purposeful and the distance it travels in a step in time. Suffice it to say that its not as simple as writing a generic movement function. Yet if you wrote all the different move functions into a C library you’d have an immensely complex piece of code that defined how all these fish moved. With Java instead you define a class for each type of fish and then when you need to change something, you just edit the .java file for that class and change it to your liking. This may seem like a lot of files to handle but I assure you its easier to find a file that you have an idea of the name of rather than trying to find a snippet of code in one big file.

Now this is where the C++ advocates argue that “this is where C++ can do it just like Java.” Well that my be true but in C++ you’ve got to handle your own memory management. Now to a seasoned programmer with experience in C++ this may be no big deal but more often than not most scientists know programming as a by-product of necessity, not desire. They didn’t want to learn programming, they only did because they had to. This can lead to problems when working in C and C++ since the study of memory management in these languages is often just enough to get by with the code they start working on is only brushed up on a little more when they’ve run into a problem they can’t fix the way they wrote it the first time. In Java memory management is handled for you. You let Java’s garbage collector deal with your memory and a lot of time. When I started my model code in C I spent probably 65% of time hunting down memory errors and some may argue that its a lack of C experience and it may very well be but it still makes programming a lot easier when you don’t have to worry about it at all.

Next on our list is the fact that Java will run on any platform that has a JVM available for it (with in reason, crossing JVM’s probably isn’t a great idea). That means if your cluster is a hodge-podge of *nix, Windows, and Macs as long as there’s a compatible JVM for all of them then your code will run without a hitch, or at least it should. Java has a built-in API for handling distributing bits of code to client nodes on a cluster (Remote Method Invocation or RMI* classes in javadocs). Of course you can do this with C or C++ using PVM/UPC/MPI or any number of solutions but when you add all of the hoop-jumping it almost isn’t worth it unless they application is just too large to port without a complete rewrite (which sometimes isn’t a bad idea). RMI aside, its just the fact that you can send that code to someone else and they don’t have to worry if the binary will run or if you’ve interpreted the size of a byte incorrectly for that platform. It will run.

As for Java instead of Python, well thats more of a design preference argument. I think that in many scientific applications like system modeling its good to have static types because it makes the code clear as to what each variable will be used for. I know there are many good arguments for why dynamic types are better but this is merely a design type. I like to know absolutely why my variables are holding. There’s also the fact of Python’s interpreting/pseudo-compiling vs. Java’s bytecode-compiling which I think is a moot point. Its really a design preference when it comes down to it.

That pretty much sums my thoughts on this… and again I may not be right 100% but I’m merely speaking from experience.

Leave a Comment

You must be logged in to post a comment.