Bioinformatics can be described as the field of study which uses computer science, engineering, and mathematics to solve problems in biology.
Some form of bioinformatics has been used since the dawn of the computer era—as early as the 1960s, computational biologists utilized supercomputers and computer networks to draw information out of data sets of amino-acid sequences in order to study proteins. After World War II ended, computers became more accessible to researchers, and the FORTRAN programming language introduced by IBM made it possible for non-experts to write programs to carry out scientific applications.
Before this was possible, sequencing was a laborious process that required the expertise of chemists. Computational biology turned sequencing into a routine laboratory process, and today computer technology helps analyze a wide variety of molecular biology information.
Here is a look at some of the tools and hardware being used in bioinformatics.
The Original Sequencing Hardware
First published in 1966, the Protein Sequenator was developed by researcher Pehr Edman. The Sequenator utilized protein degradation to automate the synthesis of amino acids in peptides and proteins. This was the first time that sequencing had been automated.
The programming unit of the Sequenator consisted of a 30 channel electronic timer, which were arranged in sequence so that when one timer elapsed after a preset amount of time, it would trigger the next one via relays. The sequenator would then move on to the next step in its sequencing process.
Block diagram of the Programming Unit. Image courtesy of St. Vincent's School of Medical Research.
Counting and timing were controlled by three Phillps Electrical decade counters, which received their signals from a pulse generator. The programming unit can manually reset impulse counters and program stages, reset the stage counter, select a stage, interrupt the pulse generator, and interrupt the output commands.
The time pulse generator was driven by a miniature lightbulb that shone onto a photosensitive diode, with the light being interrupted by a rotating disc with diametric holes. This produced a pulse generation of 20 pulses per minute.
Image courtesy of St. Vincent's Scool of Medical Research.
It’s very old school, but also interesting how such complex problems could be solved with relatively rudimentary computer technology.
Exploiting Parallelism with Custom Processors
A team in Italy developed a specialized processor to solve the Protein Similarity Problem—searching for similar substrings in a protein after being given a target. The team wanted to explore the possibility of using hardware with high parallelism to speed up computation time using already existing solutions, developing the Processor for PROtein Similarity DIScovery (PROSIDIS).
This was achieved by using a tool called the Parallel Hardware Generator (PHG), which can automatically develop parallel hardware. The high-level specifications of the protein similarity problem are given to the PHG, which then does all the required work to determine how to turn those specifications into parallel hardware solutions, then outputs a VHDL script that can be synthesized onto an FPGA. PHG is able to achieve this by using systolic arrays which project the computation domain into time-processor space.
Image courtesy of IPITEC.
The team tested the VHDL output on a 550 MHz Pentium III processor, a Xilinx XV1000 FPGA, connected by a RC1000-PP prototyping board. The problem the processor focused on was a proteome of length n=2,096,000 and peptide of length m = 24.
The results showed that the PROSIDIS processor developed with PHG showed speed improves ranging anywhere from 5.6 to 55.6 against COTS processor solutions.
This paper shows the overlap of high-performance computing and bioinformatics and highlights why high-performance computing is such an important area of research for many different domains.
Training Future Bioinformatics Researchers
The 4273π Raspberry Pi operating system was developed by St. Andrew’s University in Scotland to address a problem in biology undergraduate programs: students were usually not prepared for managing the computer systems they would need to use to carry out bioinformatics research.
While many possibilities exist to give students access to computer systems via SSH to a server, or virtual machine images on their own personal computers, it still didn’t teach them the actual systems administration aspect of these systems, which were mostly Linux based. And that principle investigators in bioinformatics research usually weren’t experts themselves.
Thus, 4273π was born which can run on an inexpensive Raspberry Pi, and is a customized version of Linux based on the Raspbian operating system. Each student in the senior level bioinformatics course was given a Raspberry Pi and some basic hardware on loan so that they could carry out assignments at school or home. They were able to access databases, manage and install required packages, and run administrative control of their systems, learning both the concepts of bioinformatics and systems administration simultaneously.
Screenshot of 4273π. Image courtesy of 4273π .
4273π is available openly online so it can be used by other education institutes or for hobbyists at home. It’s a forward-thinking solution, recognizing the inherent disciplinary overlap in bioinformatics and the areas in which future researchers could benefit to be knowledgeable in.
Feature image courtesy of the David H. Murdock Research Institute.