Searching for a needle in a genetic haystack

In the last year, a brand new technique for genome editing has appeared with the potential to revolutionise the way in which scientists engineer genomes. It provides the ability to make cuts in the genome at precisely controlled locations, resulting in the silencing of that particular region. This technique is known as CRISPR. To make such a precise cut, however, requires the active protein to find a unique 20 base pair region within the vast sequence space of the entire genome.

“Seek and Destroy” – Nature issue 7490 (image from Nature.com)

The CRISPR system originates from the bacterial immune system in which a segment of the bacteria’s genome contains short elements of viral DNA (protospacers), which act as guides to attack invading viruses by disrupting their genetic code [1]. This segment (known as the CRISPR array) contains so-called ‘Cas’ proteins capable of cleaving segments of DNA, that are guided into place by the protospacers stored in the array. The bacteria are able to add new stretches of foreign DNA to this array to develop immunity in future encounters with the invader.

The genome editing technology [2,3] takes this CRISPR array structure and replaces the viral DNA sections with lengths of the target genome that are intended to be silenced. This can then be inserted into the target genome by standard methods and, when transcribed, the Cas protein finds its way to the point in the genome with precise sequence complementarity to its guide. However, despite this technique’s success, the mechanism enabling the Cas-RNA complex to find its target was unknown, until now.

In this week’s nature cover article [4], Sternberg et al. show that a particular Cas protein, Cas9, can only bind to regions containing a three-nucleotide motif region, known as a PAM, that is found adjacent to protospacers. When searching for a needle in a haystack, you would not pick up every piece of hay, compare it to a picture of a needle, and then replace it when it didn’t match. Similarly, Cas9 does not try every point on the genome and compare the local sequence to its guide. It first limits the number of places to search by only binding to PAMs. This could be likened to the first step in searching for a matching phone number; you only compare the numbers with the same area code.

Of course, there are still a large number of PAM sequences in a given genome (or else the technique would be somewhat limited), but it vastly simplifies the problem. Furthermore, the PAM acts as the start point for the guide/genome sequence comparison, which proceeds one nucleotide at a time in sequence. The process is such that if the first 2-3 nucleotides do not match, then the complex rapidly disengages from that region and can move on to try a different location. These factors combined ensure that the complex spends as little time as possible at incorrect locations.

This work is an impressive demonstration of how biological systems are able to solve difficult problems. An invading virus needs to be dealt with quickly if a bacterium is to survive. The CRISPR system therefore needed to find a rapid way of finding a target within a large search space. It achieved this by utilising a regular motif present within the target genome, choosing to only incorporate stretches of DNA adjacent to these motifs. It seems the trick to finding a needle in a haystack is to choose a smaller haystack to lose it in.

References

Wiedenheft, B., Sternberg, S. H. & Doudna, J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331-338 (2012).
Mali, P. et al. RNA-guided human genome editing via Cas9. Science 339, 823-826 (2013).
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013).
Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62-67 (2014).

By Matthew Evans- a second year PhD student in the lab of Dr Richard Morris