Archive for the 'Bioinformatics' Category

Rrrrrrrr

Young Pirate
After doing a fresh install of OSX Leopard on my Mac and having to re-familiarize with some of my Biostatistics/Microarray education background it was time do a fresh install of R on my machine.

R is a freely available statistical programming language modeled after S programming language. Matthew Keller, a founder of the Richmond R group describes some of R’s advantages over other statistical packages as the following:

  • Its fast and free
  • State of the art: Researchers provide their methods as R packages
  • Its second only to MATLAB for graphics

Some points that I would like to add to the list are:

  • Its graphs and tables can be used with the ever popular LaTeX
  • Its programming/scripting like interface makes it rather extensible
  • Its a great starting point for people interested in Biostatistics and Microarrays
  • The BioConductor project is primarily based on R
  • Did I mention that its free?

R is a rather powerful tool and I consider it to be a must have for any Biostatistician or Bioinformatician’s toolset. So check it out, and let me know what you think.

Other Important R Links:

  • CRAN, the Comprehensive R Archive Network (similar to perl’s CPAN)
  • The R Help Mailing List a good place for getting answers to R related questions
  • R Seek, a useful tool as google isn’t designed to just search for “R”

BIND and SOAP

I’m in the process of developing an application that utilizes information from protein-protein interaction databases. One specific database I am working with is BIND, the Biomolecular Interaction Network Database. As my application will be looking at a large number of genes I had to figure out how to write an application that interacted with the database.

The solution I found was BIND SOAP, an API designed to help developers interface with BIND using either C, Perl, Java, or VB .NET. SOAP or the Simple Object Access Protocol provides a basic messaging framework which allows for communication between applications across the interenet.

After some research I decided that Perl was the best way to go as there was already a Perl module available for use with SOAP, conveninetly named SOAP::Lite. If you have the CPAN module installed on your system the best way to get the SOAP::Lite module on Linux systems is to start the CPAN shell:

[root]# perl -MCPAN -e shell

Once the shell is started, run:

cpan> install SOAP::Lite

If you are having problems installing the module or installing from another system you can go here [soaplite.com] for additional instructions.

Continue reading ‘BIND and SOAP’

2007 Systems Biology Summit

Systems Biology Nametag

Last week, I attended the Systems Biology Summit in Richmond, Virginia. The opening session in the Summit was entitled “the Systems Biology Challenge in 21st Century Biomedical Research”. It consisted of speakers from the Research Institute, the National Institute of Health, Academia, and the Pharmaceutical industry providing their various viewpoints of Systems Biology.

Dr. Leroy Hood began the session with his keynote lecture on systems approaches in Biology and Medicine. The following is Dr. Hood’s thoughts on where we are in systems biology:

The information we are finding represents the “parts” of the system, when we move into the realm of establishing functionality of the system we are determining the blueprints for these parts.

A later speaker, Dr. Keith Elliston of Genstruct, expanded the discussion with his research on biological causal networks and their use for diagnostic reasoning or predictive inference. The following was his entertaining quote on networks and pathways that was repeated throughout the weekend:

System biology is not pathways but networks…stupid. A pathways is a specific path through the network.

Another entertaining quote was from Dr. Burt Adelman, representing Industry’s perspective and their thoughts on the transition of animal research to human treatments.

We treat humans. They’re very complex not inbred… mostly. We have to find what aspects of human biology are animal research reproducing.

The session ended with a panel discussion on systems biology. The most intriguing of the topics covered was the current problems in systems biology:

  • The peer review system for grant applications in the United States.
  • Researchers fear of failure.
  • Lack of effective collaborations.
  • The lack of tools for non-elite scientists.
  • The need of better leadership in the scientific community.

Overall I thought the summit was a great experience and I would go again if another opportunity arose. I got to network with different people and learned some new things that I will discuss on this blog in the next couple of weeks. My biggest gripe with the summit was that it was 90% presentations and 10% workshop. As a programmer coming into biology I know I should not expect anything like the WWDC, but if we are to build better collaborations and novel tools I just think the summit could have spent more time with people working together rather than gathering in a room and listening to one person talk. It would be interesting to put something like that together one day, what does everyone think?

Genetic Discrimination

Watson DNA Image

As covered by Nature magazine a couple weeks ago, the full genome of James D Watson, one of the fathers of DNA, has been sequenced. The article also describes how Watson’s DNA sequence revealed his predisposition to cancer. This revelation brings on several important questions. Will people come forward to see what diseases they are prone to have? More importantly how can future employers, health providers or insurance companies use this information to genetically discriminate against you?

As covered by Slashdot a month ago there is a bill currently waiting the approval of one senator to getting passed that addresses genetic discrimination. This bill will make it illegal US citizens to be denied jobs or insurance because of an implication of a disease provided by their genetic code.

I just hope this bill passes soon as it is essential to the use of novel Bioinformatic practices in the medical field.

MIA Once Again

Sorry to be MIA once again. I got busy with wrapping up my course work, fending off hackers from this site and attending last week’s Systems Biology Summit (more on this in a follow up post). But besides that, I was privileged enough to be asked to contribute some thoughts on working in Bioinformatics with regards to Academia at Bioinformatics Zen for their 11th Bio:blogs. You should check out the article, there is a various assortment of information provided by some of the more prominent bloggers in the Bioinformatics community.

Eye Color

Eye Color

I found an interesting post on today explaining the genetic properties of eye color. The article describes how eye color is a polygenetic trait (i.e. more than one gene involved) and of the genes involved one particular gene, OCA2, has more of an influence than the rest.

Its a brief article but I thought it would be useful as it has some jargon that is commonly used in biology and bioinformatics.

Key Terms: single nucleotide polymorphisms (SNPs), gene expression [Wikipedia]

Network Theory

This network made digg about a month ago. I thought it was interesting because I actually saw it a year ago. You’ll find that network/graph theory is a big topic in Bioinformatics.

I personally find its use in Bioinformatics to be a little bit of a double edge sword. Their importance has emerged as these networks are used to present a systematic overview of various biological processes (i.e. all the gene interactions at a given time in the cell). Which is one of the overall goals of Systems Biology as I briefly touched on in my previous post.

But at the same time their novelty has also caused their misuse in biological community. You may find biologists who want to include these networks in there study but have no knowledge of how they are constructed. One of the Ph.D. students in my lab terms these networks use as fancy bioinformatic “hand waving”. Which is what they are some of the time. The point is, these theoretical networks should be taken for what they are, a tool that facilitates further interpretation, not a concrete view of how a system works.

Systems Biology

Here is an interesting quote form my school’s site that a professor recently pointed out in class:

… systems are more than a sum of the parts, and that nonlinear interactions of components and processes result in emergent properties that can not be predicted from knowledge of the individual components and their behavioral processes.

In lamen’s terms, the study of entire biological systems (i.e. looking at all the genes of a cell at once) provides more insight to properties of the system that could not be seen or identified with the old biological dogma of single gene studies.

This is what Bioinformatics has done to the study of Biology. It has transcended the study from a micro exploration of individual gene function to the macro examination of the system as a whole by observing all the parts simultaneously.

Beginner’s Guide to Bioinformatics

As a computer scientist coming into Bioinformatics I was faced with the heavy task of catching up on my Biology and Chemistry (I was a Physics minor in undergrad but that wasn’t applicable to my Bioinformatics catch up). This meant two semesters of General Chemistry, a semester of Organic Chemistry and a semester of Cell Biology. Though all this course work was very educational and useful for my degree I don’t think its all that necessary for a someone who may be interested in fooling around with Bioinformatics problems on the side.

Here is a very general overview of cell biology for Non-Biologists wanting to get involved in Bioinformatics:

  1. Proteins are the essential part of all living organisms. Proteins have a variety of functions and are involved in every process within our cells. [Wikipedia]
  2. DNA is the blueprint for proteins. Segments of DNA (genes) translate into proteins. For more detail look into the Translation and Transcription of DNA to proteins.
  3. Cell function is determined by which proteins are expressed and their quantity. This means that some kind of gene regulation must take place. Also one can argue if you know the amount of genes expressed in a cell you can possibly infer that cells function.

For a more specific overview, the following are some of the essential key points for biology and bioinformatics:

  1. Genome - all the DNA in a cell.
  2. DNA - a string of nucleic acids (i.e. GATCACTT…ATCG).
  3. Gene - a substring of DNA that encodes proteins.
  4. Proteins - a string of amino acids (i.e. ACDEF…RSTY).
  5. Gene expression is regulated by the product of other genes. It is a network of interactions.
  6. Post-translation modifications are an important regulation mechanism for gene expression.

You may notice that the above deals quite a bit with string manipulation, hence the strong emphasis for Perl experience in Bioinformatic job postings. You will find that string manipulation is not the only driving force for computer science in Bioinformatics. I will try to explain other topics in subsequent posts.

As for Biologists wanting to do Bioinformatics I can not provide the best advice since I didn’t come into Bioinformatics from that direction but I would imagine that you may want to look into the following:

  1. Learn how to program. You want to know how to use a scripting language (preferably Perl) for smaller every day tasks and an object-oriented language such as C, C++, or Java for larger projects.
  2. Learn how to use databases. Bioinformatics deals with very large datasets. At some point your are going to have to deal with either retrieving information from databases or building your very own database so you might as well begin playing with them now.
  3. Install and run a Unix/Linux OS (Optional). This might be my personal bias but I believe if you are going to be working in Bioinformatics and its large data sets eventually you will find yourself either maintaining a server or SSHing into one so you might as well become familiar with that type of environment. At the very least XP users should install Cygwin.

Useful Links:

  • Bioinformatics intro offered at my university.
  • Graduate level of the Bioinformatics intro course.
  • Library of videos that cover a wide range of biological topics (theoretical and practical).
  • RT-PCR a common molecular biology method practiced in the lab.
  • Virtual lab which provides a virtual lab for non-biologists to actually work through basic molecular biologist techniques.

Finally I must say that I am far from an expert so any constructive suggestions to help clarify or expand the above is welcomed and appreciated.

A Little Busy…

Well I’ve been quite busy with the Holidays and work, so I haven’t been able to post in a while. As a brief update, here are a couple of screen captures to show what I’ve been doing:


Fedora Core 4 via Parallels on OS X



GEOSS via Fedora Core 4 via Parallels on OS X

GEOSS is an open source storage solution for Affymetrix microarrays. I’ll be posting soon to describe what GEOSS does and how it is useful to my research but if you can’t wait check out their home page.