Searching for a needle in a genetic haystack

In the last year, a brand new technique for genome editing has appeared with the potential to revolutionise the way in which scientists engineer genomes. It provides the ability to make cuts in the genome at precisely controlled locations, resulting in the silencing of that particular region. This technique is known as CRISPR. To make such a precise cut, however, requires the active protein to find a unique 20 base pair region within the vast sequence space of the entire genome.

“Seek and Destroy” – Nature issue 7490 (image from

The CRISPR system originates from the bacterial immune system in which a segment of the bacteria’s genome contains short elements of viral DNA (protospacers), which act as guides to attack invading viruses by disrupting their genetic code [1]. This segment (known as the CRISPR array) contains so-called ‘Cas’ proteins capable of cleaving segments of DNA, that are guided into place by the protospacers stored in the array. The bacteria are able to add new stretches of foreign DNA to this array to develop immunity in future encounters with the invader.

The genome editing technology [2,3] takes this CRISPR array structure and replaces the viral DNA sections with lengths of the target genome that are intended to be silenced. This can then be inserted into the target genome by standard methods and, when transcribed, the Cas protein finds its way to the point in the genome with precise sequence complementarity to its guide. However, despite this technique’s success, the mechanism enabling the Cas-RNA complex to find its target was unknown, until now.

In this week’s nature cover article [4], Sternberg et al. show that a particular Cas protein, Cas9, can only bind to regions containing a three-nucleotide motif region, known as a PAM, that is found adjacent to protospacers. When searching for a needle in a haystack, you would not pick up every piece of hay, compare it to a picture of a needle, and then replace it when it didn’t match. Similarly, Cas9 does not try every point on the genome and compare the local sequence to its guide. It first limits the number of places to search by only binding to PAMs. This could be likened to the first step in searching for a matching phone number; you only compare the numbers with the same area code.

Of course, there are still a large number of PAM sequences in a given genome (or else the technique would be somewhat limited), but it vastly simplifies the problem. Furthermore, the PAM acts as the start point for the guide/genome sequence comparison, which proceeds one nucleotide at a time in sequence. The process is such that if the first 2-3 nucleotides do not match, then the complex rapidly disengages from that region and can move on to try a different location.  These factors combined ensure that the complex spends as little time as possible at incorrect locations.

This work is an impressive demonstration of how biological systems are able to solve difficult problems. An invading virus needs to be dealt with quickly if a bacterium is to survive. The CRISPR system therefore needed to find a rapid way of finding a target within a large search space. It achieved this by utilising a regular motif present within the target genome, choosing to only incorporate stretches of DNA adjacent to these motifs. It seems the trick to finding a needle in a haystack is to choose a smaller haystack to lose it in.


  1. Wiedenheft, B., Sternberg, S. H. & Doudna, J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331-338 (2012).
  2.  Mali, P. et al. RNA-guided human genome editing via Cas9. Science 339, 823-826 (2013).
  3.  Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013).
  4.  Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62-67 (2014).

By Matthew Evans- a second year PhD student in the lab of Dr Richard Morris


All with a little help from my friends: the wonders of bird flight formation

Birds flying in a typical V formation

Birds flying in a typical V formation

The V formations adopted by migrating birds are well known to anyone looking skyward at the end of summer. It is a formation regularly adopted by military and civilian aircraft for its energy saving benefits. Fully understanding how these formations work in nature has intrigued scientists for a long time. A new study into the biomechanics of how this works has made it onto the cover of this month’s Nature magazine – and it turns out that making use of these benefits is much harder for birds than for aircraft!

The way that wings generate lift is all to do with how they interact with the air they move through. Air moves faster over the top surface of the wing than it does over the bottom surface, creating a net circular flow relative to the stationary air the wing is moving through. These circular vortices are shed at the tips of wing forming a tube of spinning air that extends back from each wing tip (see the image below). The air on the inside of this tube (directly behind the wing) is moving down, a so-called downwash. The outside edge of the tube is formed of upwards-moving air creating an upwash. By carefully positioning a wingtip in this upwash region, a following bird or plane is able to ride this upward moving air, thereby reducing the amount of energy needed to fly.

Circular tip vortex generated by a plane’s wing visualized using coloured smoke. The smoke on the right hand side is being pulled away from the ground by the upwash while smoke on the left is pushed down towards the ground by the downwash.

Circular tip vortex generated by a plane’s wing visualized using coloured smoke. The smoke on the right hand side is being pulled away from the ground by the upwash while smoke on the left is pushed down towards the ground by the downwash.

A skilled pilot can make use of this to save fuel on long flights. Birds, however, have another obstacle to overcome: the flapping of their wings. As the lead bird flaps, the tip vortex oscillates up and down. To make optimal use of the tip vortex, a following bird would have to move its wings with a precise phase shift so that its wing tip follows the up and down motion of the tip vortex. Several theoretical studies have predicted these requirements but until now it had not been possible to test the plausibility of this on birds in flight.

Portugal et al. [1] used specially designed data loggers that measured both the positions of birds in a flock and when they beat their wings. The important thing is that these data loggers have to be small (23g) to avoid impacting on the birds’ flight, and that meant they had no means of transmitting the data to researchers. The researchers had to collect them. Obviously it would have been no good to put them on a random set of birds, only to watch them fly off into the sunset, with no idea of where they might end up!

Instead the researchers used a flock of northern bald ibises that had been part of a conservation programme, and had been taught by conservationists to follow certain migration routes. The team therefore knew where the birds would land to rest and could pick up their data loggers there.

Amazingly, the data collected showed that the birds were able to position themselves and to flap their wings with almost precisely the correct phase shift to make best use of the tip vortex of the bird in front. This ability was far beyond the expectations of the researchers. Even more impressively, the birds were able to adapt their flapping when changing position within the V. This often requires them to pass behind another bird, into the downwash generated by the bird in front. In order to minimise the effect of this, the trailing bird flies in direct anti-phase with the leader as it passes behind it, returning to the optimal phase shift once it can make use of the tip vortex again.

Such dynamic changes in flapping to maintain efficient energy usage demonstrate an incredible adaptability and raise many questions. How are the birds able to find these optimal flight patterns? Is it instinctive behaviour or do the birds learn to fly this way because it feels easier? How much benefit do these techniques have on the birds themselves? Hopefully we can look at those beautiful V formations with a whole new sense of wonder this year.

1. Portugal S. J., Hubel T. Y., Fritz J., Heese S., Trobe D., Voelkl B., Hailes S., Wilson A. M., Usherwood J. R., (2014). Upwash exploitation and downwash avoidance by flap phasing in ibis formation flight. Nature, 505, 399-402

All images from Wikipedia Commons

By Matthew Evans- a second year PhD student in the lab of Dr Richard Morris

RNA or protein ligands for metal co-factors in the spliceosome? Case closed … sort of

During the course of the meetings which occurred prior to this blog’s inception, it was agreed that one feature we’d like to regularly publish would be ‘digested versions’ of important or interesting research papers. We’ve only had one so far and, given that this is a blog written by doctoral students at a scientific research institute, I’d say it should be quite high on our priorities list. There are also going to be some problems pitching this – we’re largely a plant science institute with a few groups working on slightly different problems, but what about our readership? At this stage, it won’t extend much further than those who follow the JIC on Twitter, so mainly plant scientists and microbiologists. So is it our goal with this blog to preach to the converted or to make a fist of widening the net? I’d say we should do our best to attract readers from all branches of science and the wider public.

So, with that in mind, I’ve decided to summarise a paper I read recently which really couldn’t get more general and, well, important within the field of molecular biology as a whole. It’s a paper which aims to dissect, biochemically, one of the central questions at the heart of what many call our ‘central dogma’ of the flow of genetic information from DNA to RNA to Protein.

Biochemistry: Metal ghosts in the splicing machine

So, for the benefit of our non-molecular biologist readers, present or future, allow me to elaborate on this: Your body relies on proteins for everything. From the catalysis of chemical reactions, to responding to external stimuli, right through to forming the structural basis of connective tissues like bone and cartilage, as well as holding together our cells. But how are these proteins manufactured? Well, I have little doubt that you’ve probably heard of deoxyribonucleic acid (DNA). DNA is the structure within our cells which carries the instructions for making proteins. DNA is turned into Ribonucleic Acid (RNA) and the RNA acts as a messenger molecule for producing the protein. There are several types of RNA at play here, the most important being mRNA, tRNA, rRNA and snRNA. mRNA stands for ‘messenger’ RNA, which passes information from one place, to another. tRNA is ‘transfer’ RNA, this molecule acts as a physical link between the genetic code and the amino acids which are the building blocks of proteins. rRNA is ‘ribosomal’ RNA and its job is to form the molecular machinery (known as a ribosome) responsible for linking the amino acids together into chains called ‘peptides’. This is pretty easy to conceptualise. We’ll take a factory as an analogy. An order comes in to the factory which we’ll think of as a letter or e-mail containing a blueprint of the design the customer requires, and is represented by our mRNA. The workers, or tRNAs as we’ll call them, must put the right pieces in the right order to make the product but simply putting them next to each other won’t do anything. For example, metal parts may need welding together, which requires machinery or, in our case, the ribosome, which is composed of rRNA.

You’ll note that I’ve missed out one of the RNAs I mentioned earlier: snRNA, which stands for small nuclear RNA. The primary role of snRNA is to process the mRNA. The code required to make a protein is not encoded by DNA in a linear fashion; the regions which mean something (exons) are interspersed with regions of code which do not go on to make a protein (introns) and something must be done about this. Well, snRNAs in complex with a highly heterogenous and large protein complex known as the spliceosome ‘splice’ the pre-mRNA and ligate (join) the exons back together to form a mature mRNA. So to go back to our analogy, imagine the initial plans were drawn up by the work experience kid and before they’re sent out, the boss has to go over them and cut bits out. That’s the role of the snRNA containing spliceosome.

So, onto the paper I read recently which caught my interest entitled ‘RNA catalyses nuclear pre-mRNA splicing.’1 The researchers from the University of Chicago set out to determine which part of the aforementioned large, complex metalloenzyme known as the spliceosome was responsible for catalysis of the splicing reaction. For those unsure of what a metalloenzyme is, it means it requires metal co-factors (magnesium ions, in this case) in order to carry out its function. It was hitherto unknown whether the active site ligands for these magnesium ions were provided by the snRNA or by one of the many proteins which make up the rest of the complex.

However, using a technique known as metal specificity switch, the researchers could pinpoint a group of phosphate oxygens which appear to be crucial for Mg2+ binding and subsequent catalysis of splicing. Single oxygen atoms in snRNA U6 were sequentially replaced with sulphur (magnesium does not bind particularly well to sulphur relative to oxygen) and subsequently tested for splicing efficiency. Five of the positions appeared to result in decreased efficiency. In addition to this, the sulphurous snRNA containing spliceosomes were tested in the presence of metals known to bind sulphur with higher efficiency (e.g. manganese and cadmium). It was found that these sulphur binding metals were able to re-establish splicing efficiency where oxygen had been replaced with sulphur. These findings tell us that snRNA does indeed provide ligands for the catalytically crucial metal ions involved in the processing of pre-mRNA and that this is dependent on binding specificity of the metal co-factors. Further support for this comes from the spatial similarity of the metal ion binding sites of U6 snRNA to those found in group II introns, a class of intron which is able to self-splice without any input from the spliceosome but through exactly the same two phopshoryl-transfer reaction normally catalysed by the spliceosome2.

However, none of this actually excludes the possibility that one or more of the proteins involved in the spliceosome also provides ligands for metal co-factors. The same approach used on U6 snRNA was employed to assess the potential of Prp8 to provide ligands for the co-factors and no impaired splicing was observed. Prp8 had previously been imagined to be catalytically important for the spliceosome. What of other proteins then? Well, as previously stated, the spliceosome is a very heterogeneous and dynamic complex with proteins arriving and departing throughout the splicing reaction so de-convoluting each protein’s role in the reaction appears to be a daunting process. Crystal structures of the spliceosome at each stage of the splicing reaction, enabling observation of proteins close to the active site and their likelihood of binding metal co-factors, would certainly speed up the process. However, expressing and crystallising such a large, dynamic protein complex presents an enormous set of challenges in itself.

For now, however, it seems more likely that most of the proteins are simply regulators of the process or acting as chaperones, holding U6 snRNA in the correct conformation to catalyse the reaction and that the catalytically important part of the complex is, indeed, snRNA.

[1] Fica, S. M. et al. Nature 503, 229-234, (2013).

[2] Marcia, M. & Pyle, A. M. Cell 159, 497-507, (2012).

By Ben Hall – a second year PhD student in Mark Banfield’s group.