Archive for the ‘bioinformatics’ Category

Archaea Classification Continued

After having thoroughly examined the code for a couple days and tried the code with replacement of fragments, I’ve convinced myself that the code is correct. After thinking about it, it occured to me that the relative k-mer distribution profiles for larger k-mers (7,8,9) might be skewed by even very small sampling without replacement. The [...]

Genend Update 2.33421

Still having problems loading full data sets into memory for Bacteria + Archaea genomes. Need to come up with a good way to do this with the 67/80/90% runs. Right now, I can only do it with Archaea.
The results for the run strike me as being somewhat odd. You’ll see below…

Despite having gone over the [...]

More results, new and improved software

Switched from Python to Java (sigh) to improve speed. Python (besides being dynamic) has worthless threading in comparison to Java. Java version is faster, by a lot, runs on just Archaea go from 4 days to something in the range of 12 hours with Archaea + Bacteria (625 genomes).
Ran into problems with threading but learned [...]

Old data bad, New data good, Program too slow

So the last set of data posted is definitely incorrect. Found flaws in the scripts’ function to generate relative distributions. Also modified the original identification script to work with classifying organisms.
The data for correct identification below…

The data for phylogenetic classification below…

Full bacterial and bacterial+archaeal analysis will be harder as the current program is too slow. [...]

More genomics…

After some misunderstanding, now have a program that does what is needed. Seems slow and memory constraints on loading higher level distributions is difficult (kmer size > 9).
Started a run last night(~18:00) on 625 genomes (50 Archaea, 525 Bacteria), still running. Got no significant results from 3-5-mers, now running on 6-9-mers.
Have a completed run from [...]

Genend - Update 1

Moved from Perl to Python. Extensive use of Perl in larger files proved to be hard to organize for myself, was having trouble keeping straight what I was doing. Also don’t like the Perl object/class system, more at home with Python’s.
Current progress includes a custom database object for use with interfacing to a sqlite database [...]