Saturday, 16 March 2013

Quick Thought: The challenge of bioengineering and bioinformatics

I have drafted a retrospective post on how I implemented rudimentary gene finding functions in Python, using helper modules provided from university (you can see the code on my GitHub, link above), but there's still a lot of work to do on the post. Assessment for Workshop 1 is available, which takes priority and will delay submission of that post, but I wanted to quickly post about a thought that struck me the other day.

Bioengineering and bioinformatics is a challenge because the specification for the programming language has already been determined, but the documentation is incomplete. The standard libraries for this programming language are polymorphic (mutations) and are constantly changing. Worse still, the behaviour of these libraries are context sensitive, changing their expected input, output and performance based on the environment they're in.

Could you imagine a software engineer using C++ with an incomplete reference manual, using standard libraries that are constantly changing and whose behaviour changes based on the OS you're programming in?

Welcome to bioengineering and bioinformatics.

Can you think of a better analogy or additions to my one?

Tuesday, 12 March 2013

Jay Keasling on Synthetic Biology and the National Bioeconomy Blueprint

During the daily trawl of my feeds I came across this article by Jay Keasling. It's a beginners' guide to introduce synthetic biology to the general public and highlights some of the key concepts and applications that excite me the most.

One of the most valuable nuggets of information in this article is a link to the National Bioeconomy Blueprint, which I'd never heard of before. It's a 43 page report released May 2012, by the Office of Science and Technology Policy for the US government. The PDF report goes into further detail, covering funding R&D, regulations and policy making, training and education, and collaboration between research and industry. The report  says that the Bioeconomy will have the biggest effects on health, energy, agriculture, environment, and sharing (i.e. precompetitive collaborations as a necessity to drive innovation). Bioinformatics and even "big data" get a mention, with synthetic biology, proteomics and information technologies flagged as foundational technologies.

As a broad overview, 5 strategic imperatives for the Bioeconomy are:
  1. Support R&D investments that will provide the foundation for the future bioeconomy.
  2. Facilitate the transition of bioinventions from research lab to market, including an increased focus on translational and regulatory sciences.
  3. Develop and reform regulations to reduce barriers, increase the speed and predictability of regulatory processes, and reduce costs while protecting human and environmental health.
  4. Update training programs and align academic institution incentives with student training for national workforce needs.
  5. Identify and support opportunities for the development of public-private partnerships and precompetitive collaborations—where competitors pool resources, knowledge, and expertise to learn from successes and failures.


I particularly enjoyed sections that spoke about synthetic biology. This quote summarises it well.
"Synthetic biology, the design and wholesale construction of new biological parts and systems, and the re-design of existing, natural biological systems for tailored purposes, integrates engineering and computer-assisted design approaches with biological research. Since natural biological systems are so complicated, a primary focus of synthetic biologists is developing technologies that make the engineering of biology easier, faster, and more predictable. This ability to quickly engineer organisms in laboratories holds vast potential for the bioeconomy, as engineered organisms could dramatically transform modern practices in high-impact fields such as agriculture, manufacturing, energy generation, and medicine."
Jay Keasling's article goes onto quote some of his colleagues and leaders in the field. Pam Silver notes,
"The field is poised to explode, both in terms of what scientists can accomplish and what the public realizes is possible." 
The article also highlights some of the dangers that synthetic biology enables. Laurie Zoloth makes the analogy,
"Synthetic biology is like iron: You can make sewing needles and you can make spears. Of course, there is going to be dual use." 
On the topic of the dangers of synthetic biology, Keasling also says in the article,
"In addition to discussing approaches to risk and risk assessment, synthetic biologists are also working hard to minimize potential adverse effects. For example, Silver’s lab is working to create genetic self-destruct traits, termed "auto-delete", as a way to ensure that genetically modified organisms don’t escape into the environment." 
I can't help but think of a time in the near future where the Bioeconomy will also include biosecurity companies, much like the security companies we have now for our computers, such as Symantec and Kaspersky that protect our computers from digital viruses. These biosecurity companies of the future would offer services to help protect, prevent and mitigate damage to ourselves and other biological assets under attack from rogue organisms, engineered for espionage and sabotage (and perhaps even assassination?) by biohackers.

Sunday, 10 March 2013

First steps - Dictionaries and enumerate()

The second week of semester 1 has just finished and we're starting to scratch the surface of fundamental basics in using Python for bioinformatics in the practicals. I have a practical every Friday afternoon, which is well timed for me to write a post on here once or twice a week during the weekends, reviewing content from the lastest practical or looking forward to content in the following week.

The first week took students through getting Python running and using IDLE interactively to execute simple commands to understand the basics. Like any course that introduces programming, it covered topics on built-in data types (strings, numbers), variables, flow control (if, else, loops), data structures (lists, tuples, dictionaries, sets), input/output, modules etc. The reference documentation and tutorials on python.org are an excellent source for this introductory content.

We created a mysequence.py module which implements a MySequence class. The sequence can be made up from any text string, but the class requires an alphabet parameter that we can use to specify it as DNA, RNA or a protein sequence of amino acids. The alphabet was provided as a list, but one of the most important data structures that I learnt about to work with alphabets are Dictionaries.


In the example above we create a list called DNAAlpha with characters A, C, G, and T as the values. Then we create a dictionary called dnaDict which is a like an unordered set of key:value pairs, where all the keys must be unique. We can then iterate through the characters in the alphabet list and use those characters in the list to find the keys in the dictionary and reference the values for each key.

One thing that tripped me up was iterating through the list with the for loop. My first version looked like this.


The error occurs because I assumed that for idx in DNAAlpha uses list indices to loop through with idx (it's important to note idx is an arbitrary variable we've created for the loop, not a keyword), however idx is actually assigned the value at each index, not the index itself. So when I use DNAAlpha[idx], in the first iteration of the loop it's actually trying to look at DNAAlpha['A'], which Python can't handle because list indices must be integers. A quick look on Google brought up a Stack Overflow thread which provided the answer. enumerate() allows you to find the index of each value as you loop through the list.

Please look at the full source code for mysequence.py by following the GitHub link in the navigation above.

My next post will look into the functions I wrote to find genes in a DNA sequence.