On the usefulness of Mac OS X’s Console Application

I have been spending a lot of time building packages in MacPorts for various things I need for work. Examples include ImageMagick and friends for using the rmagick gem with Ruby as well as building the ATLAS linear algebra library for use with Shogun (a machine learning toolbox of sorts).

Some of these things take forever to build *ahem* ATLAS *cough* so I periodically check to make sure the program hasn’t hung by checking Activity Monitor.app. During this time period I noticed ReportCrash repeatedly spiking to 100+%. This got me suspicious as I suspected something awry. So I dug around in Console.app, the tool for sanely checking your logs, and discovered to my dismay that something related to X11 was repeatedly crashing. A little more digging and I discovered it was a mistake I’d made with my DYLD_LIBRARY_PATH variable and overwriting OS X’s default libjpeg with one built in MacPorts. After fixing that, I was off to the races or something like that anyway.

The point of the matter is its nice to have something like Console.app. The hardcore *nix user in me tells me thats bollocks and that I should just be using cat and tail on the logs… but I must say accesing it this way was far more readable.

Try it out some time if you’re smelling suspicious behavior :) .

Normalized Score Graphs

Normalized graphs as follows:

Match

result-match-36-3609result-match-100-3609result-match-200-3709result-match-400-3709result-match-800-3809

Genus

result-genus-36-3609result-genus-100-3609result-genus-200-3709result-genus-400-3709result-genus-800-2009

Family

result-family-36-3609result-family-100-3609result-family-200-3709result-family-400-3709result-family-800-2009

Order

result-order-36-3609result-order-100-3609result-order-200-3709result-order-400-3709result-order-800-2009

Class

result-class-36-3609result-class-100-3609result-class-200-3709result-class-400-3709result-class-800-2009

Phylum

result-phylum-36-3609result-phylum-100-3609result-phylum-200-3709result-phylum-400-3709result-phylum-800-2009

Installing Yi on Mac OS X

A little gotcha I ran into after following Jan Varwig’s tutorial on installing Cabal & friends on Mac OS X and then subsequently trying to install Yi.

When doing “cabal install yi”, cabal won’t find “alex” if you haven’t got $HOME/.cabal/bin in your $PATH. By adding it your $PATH, cabal will then nicely finish installation and go on its merry way.

Genome Score Diversity Graphs

These were generated with 10000-mer pieces and 5000 bp overlaps.

Ralstonia solanacearum Graphs

result-nc_003295-phylo-3-100001result-nc_003295-phylo-4-100001result-nc_003295-phylo-5-100001result-nc_003295-phylo-6-100001result-nc_003295-phylo-7-100001result-nc_003295-phylo-8-100001

Score Graphs Updated

The score graphs below represent 50 groups ranging from the smallest score value to the largest score values, rather than 50 groups of N scores.

Match

result-match-36-36092result-match-100-36092result-match-200-37092result-match-400-37092result-match-800-38092

Genus

result-genus-36-36092

result-genus-100-36092result-genus-200-37092result-genus-400-37092result-genus-800-20092

Class

result-family-36-36092result-family-100-36092result-family-200-37092result-family-400-37092result-family-800-20092

Order

result-order-36-36092result-order-100-36092result-order-200-37092result-order-400-37092result-order-800-20092

Class

result-class-36-36092result-class-100-36092result-class-200-37092result-class-400-37092result-class-800-20092

Phylum

result-phylum-36-36092result-phylum-100-36092result-phylum-200-37092result-phylum-400-37092result-phylum-800-20092

Revised Graphs

Matches

result-match-36-36091

result-match-100-36091result-match-200-37091result-match-400-37091result-match-800-38091

Genus

result-genus-36-36091

result-genus-100-36091result-genus-200-37091result-genus-400-37091result-genus-800-20091

Family

result-family-36-36091result-family-100-36091result-family-200-37091result-family-400-37091result-family-800-20091

Order

result-order-36-36091result-order-100-36091result-order-200-37091result-order-400-37091result-order-800-20091

Class

result-class-36-36091result-class-100-36091result-class-200-37091result-class-400-37091result-class-800-20091

Phylum

result-phylum-36-36091result-phylum-100-36091result-phylum-200-37091result-phylum-400-37091result-phylum-800-20091

Genend update – Scores Comparison

Results are grouped by piece size with each k-mer represented for that graph.

Exact Matches

Genus Matches

Family Matches

Order Matches

Class Matches

Phylum Matches

EDIT: 800-mers done and loaded.

Genend Update

A few aberrations are apparent but that is probably easily remedied by increasing the sample size.

Results for classifying specific species as well as phylogenetic classification as follows:

Kingdom classification was not included as it seemed no consistent taxonomy exists in the NCBI databases with regard to Kingdom taxons. Superkingdoms exist but simply identifying “cellular organisms” for every single species does not seem… interesting.

The number of usable genomes (those which had sufficient information to compare taxons) was between 92 and 98, with 96 being the average number of genomes out of 100 that had taxon information that was usable.

MacBook

So, in my infinite wisdom and grace managed to slap a cup of water onto my 6-month old ThinkPad last Thursday. To say this was a bother, is well… an understatement. I did all the necessary precautionary things like shutting it down, pulling out the battery, dismantling it and generally just trying to take care of it.

However, I did need some files off of it so I let it dry out for a few hours, booted it up (was kinda surprsed it did), then proceeded to copy off those files I needed. In the mean time, I was stuck sharing a laptop with the girlfriend which was not really a good setup as I tend to customize everything to my liking.

All the while, I had been secretly wishing that I had a Mac of some sort. I’ve started to get into more development that would be made easier by using Mac, I was tired of fighting with Linux for a half-assed *nix solution on a laptop, and the Windows CLI just isn’t good enough.

Which brings us to Friday night, when I bought a MacBook. Basic White model, the new one though with 802.11 a/b/g/n and the Nvidia 9400m graphics chipset. I’m loving it. Its an excellent computer. Its nice to have a decent command line interface and now that I’ve got it configured the way I like it, I’m not sure I’ll willingly use any other hardware for laptops at least.

Hopefully though, my ThinkPad will recover and then I can give it to my dad to use. That would be nice. Hopefully :)

Genend Update

The server that I was running the computations hard locked sometime during the winter break. Apparently it ran out of disk space while another user was running simulations on it. Wasn’t able to access the machine till I returned to Miami.

Since I had no access to machine with large amounts of memory, I spent some time trying to figure out what was wrong with the training software. Still wasn’t able to find the problem, must be missing something simple.

Upon return to Miami, did the following:

  • Fixed the server, apparently it ran out of disk space from log files created from other user’s run.
  • Researched building a database for taxonomies.
  • Built a database using the BioSQL schema after discovering that Genbank files track phylogeny through recursive ranks.
  • Wrote a Python script to fetch the Genbank file for each of the 625 fasta-format genomes and load it into the BioSQL database.
  • Began revising taxonomic classifier, ~80% done.

Next things to do:

  • Run the taxonomic classifier.
  • While waiting for taxonomic classifier results, tear apart training classifier and figure out whats wrong.