Posted
on April 9, 2009, 5:41 pm,
by Zac,
under
mac.
I have been spending a lot of time building packages in MacPorts for various things I need for work. Examples include ImageMagick and friends for using the rmagick gem with Ruby as well as building the ATLAS linear algebra library for use with Shogun (a machine learning toolbox of sorts).
Some of these things take forever to build *ahem* ATLAS *cough* so I periodically check to make sure the program hasn’t hung by checking Activity Monitor.app. During this time period I noticed ReportCrash repeatedly spiking to 100+%. This got me suspicious as I suspected something awry. So I dug around in Console.app, the tool for sanely checking your logs, and discovered to my dismay that something related to X11 was repeatedly crashing. A little more digging and I discovered it was a mistake I’d made with my DYLD_LIBRARY_PATH variable and overwriting OS X’s default libjpeg with one built in MacPorts. After fixing that, I was off to the races or something like that anyway.
The point of the matter is its nice to have something like Console.app. The hardcore *nix user in me tells me thats bollocks and that I should just be using cat and tail on the logs… but I must say accesing it this way was far more readable.
Try it out some time if you’re smelling suspicious behavior
.
A little gotcha I ran into after following Jan Varwig’s tutorial on installing Cabal & friends on Mac OS X and then subsequently trying to install Yi.
When doing “cabal install yi”, cabal won’t find “alex” if you haven’t got $HOME/.cabal/bin in your $PATH. By adding it your $PATH, cabal will then nicely finish installation and go on its merry way.
A few aberrations are apparent but that is probably easily remedied by increasing the sample size.
Results for classifying specific species as well as phylogenetic classification as follows:






Kingdom classification was not included as it seemed no consistent taxonomy exists in the NCBI databases with regard to Kingdom taxons. Superkingdoms exist but simply identifying “cellular organisms” for every single species does not seem… interesting.
The number of usable genomes (those which had sufficient information to compare taxons) was between 92 and 98, with 96 being the average number of genomes out of 100 that had taxon information that was usable.
Posted
on February 9, 2009, 11:04 pm,
by Zac,
under
general.
So, in my infinite wisdom and grace managed to slap a cup of water onto my 6-month old ThinkPad last Thursday. To say this was a bother, is well… an understatement. I did all the necessary precautionary things like shutting it down, pulling out the battery, dismantling it and generally just trying to take care of it.
However, I did need some files off of it so I let it dry out for a few hours, booted it up (was kinda surprsed it did), then proceeded to copy off those files I needed. In the mean time, I was stuck sharing a laptop with the girlfriend which was not really a good setup as I tend to customize everything to my liking.
All the while, I had been secretly wishing that I had a Mac of some sort. I’ve started to get into more development that would be made easier by using Mac, I was tired of fighting with Linux for a half-assed *nix solution on a laptop, and the Windows CLI just isn’t good enough.
Which brings us to Friday night, when I bought a MacBook. Basic White model, the new one though with 802.11 a/b/g/n and the Nvidia 9400m graphics chipset. I’m loving it. Its an excellent computer. Its nice to have a decent command line interface and now that I’ve got it configured the way I like it, I’m not sure I’ll willingly use any other hardware for laptops at least.
Hopefully though, my ThinkPad will recover and then I can give it to my dad to use. That would be nice. Hopefully
The server that I was running the computations hard locked sometime during the winter break. Apparently it ran out of disk space while another user was running simulations on it. Wasn’t able to access the machine till I returned to Miami.
Since I had no access to machine with large amounts of memory, I spent some time trying to figure out what was wrong with the training software. Still wasn’t able to find the problem, must be missing something simple.
Upon return to Miami, did the following:
- Fixed the server, apparently it ran out of disk space from log files created from other user’s run.
- Researched building a database for taxonomies.
- Built a database using the BioSQL schema after discovering that Genbank files track phylogeny through recursive ranks.
- Wrote a Python script to fetch the Genbank file for each of the 625 fasta-format genomes and load it into the BioSQL database.
- Began revising taxonomic classifier, ~80% done.
Next things to do:
- Run the taxonomic classifier.
- While waiting for taxonomic classifier results, tear apart training classifier and figure out whats wrong.