Sunday, July 22, 2012

Palindromes in DNA

DNA strings are often millions of letters long. For example, the copy I have of the DNA of the Amycolatopsis marina, a bacteria discovered in 2009, consists of $6,503,724$ characters.

Bacteria are of course some of the smallest living beings. The DNA in the copy I have of the human chromosome 18 is an astounding $74,657,229$ characters long. Chromosome 18 is one of the 23 chromosomes of human beings, and is estimated to contain in between $300$ and $500$ genes. A gene is responsible for part of our functioning as human beings. An example of a gene contained within chromosome 18 is the NPC1 gene, named after the Niemann-Pick disease, type C1. This gene is named after the disease you get when you have an error on the gene. Experiments with mice show that this gene probably controls the appetite, and people with a mutation on the NPC1 gene often suffer from obesitas. Interestingly, the same mutation on NPC1 might make you immune for the ebola virus, one of the deadliest known viruses. So the mutation is partly a blessing in disguise.

If I search for the keyword palindrome in the electronic publications available at the library of my university, I get more than $500$ hits. The first ten of these hits are all about palindromes in DNA. My guess is that at least $90$% of the list of publications are about palindromes in DNA. So what is the interest in palindromes in DNA? Surely, if your strings only contain `A', `T', `C', and `G' you are likely to get many palindromes?

Let us look at how many palindromes we expect to find in the DNA of the Amycolatopsis marina. Suppose I want to find palindromes of length twelve. I calculate the chance that an arbitrary DNA string of length twelve is a palindrome as follows. The first six characters of the string don't matter. The seventh character needs to match with the sixth character, for which we have a chance of one in four. Remember that in DNA, `A' matches with `T' and vice versa, but both `A' and `T' do not match with themselves, or with `C' and `G'. The eighth character needs to match with the fifth character, for which we also have a chance of one in four. This goes on until the twelfth character, so I get a chance of \[ \frac{1}{4} \times \frac{1}{4} \times \frac{1}{4} \times \frac{1}{4} \times \frac{1}{4} \times \frac{1}{4} = (\frac{1}{4})^6 = \frac{1}{4^6} = \frac{1}{4096} \] Since the Amycolatopsis marina is $6,503,724$ characters long, there are $6,503,713$ substrings of length twelve. Multiplying this number with the chance that it is a palindrome, I expect to get $1,589$ palindromes. Using my algorithm for finding palindromes I get $1,784$ palindromes. This is slightly above $10$% more than expected, but that might be an accident. If I look at the palindromes of length fourteen in the $18$th human chromosome, using a similar calculation I expect to find $4,556$ palindromes. My algorithm finds $25,323$. More than five times as many palindromes as expected! This is not an accident anymore. Palindromes play a role in DNA. But what role?

Palindromes perform several tasks in human DNA. I will discuss one particularly intriguing task in this blog post.

Why do we humans have sex? I can answer this question from several perspectives. The most common answer will probably mention love, pleasure, or children. Looking at the question from a biological perspective, the answer discusses the biological advantages of having sex. Our children get their genes from two parents. They get two complete copies of human DNA, and merge these copies in some way to make them the way they are. In this merging process, damaged DNA from one parent can be replaced by functioning DNA from the other parent. There is a lot of DNA to repair: every day, each cell may be damaged at one million different places. Most of this damage is harmless, since a lot of DNA does not seem to have any function at all, but some kinds of damage may cause a lot of problems. Being able to repair non-functioning DNA when combining DNA from parents is essential for keeping the human race in a good state. The American geneticist Hermann Joseph Muller used this argument to explain why sexual reproduction is favored over asexual reproduction in organisms. When an organism reproduces asexually it passes all its DNA errors on to its offspring, without a possibility to repair them, eventually leading to the extinction of the population. The process has been dubbed Muller's ratchet, after the ratchet device, which can only turn in one direction. This is the theory, practice is slightly more complicated.

Muller's ratchet should already be at work in humans, since there is one human chromosome in which no combination of parental genes takes place: chromosome 23. Chromosome 23 determines whether we become a man or a woman. A man gets a copy of the chromosome called X from his mother and a copy called Y from his father. A woman gets an X from her mother and an X from her father. A woman can merge both X's and repair possible errors, but a man has two different copies of the chromosome, and has no possibility to combine, let alone repair, them. The Y is passed on from father to son, with no involvement of women. Muller's ratchet theory says the genes on the chromosome that make a man should deteriorate quickly, and men should soon become extinct.

There is no sign of men becoming extinct. Apparently there are some other mechanisms at work to repair DNA on the Y chromosome. If I cannot obtain a copy of some piece of DNA from my mother, maybe I can store copies of the DNA in the Y chromosome itself? If I maintain two copies, I can always check if a piece of DNA is correct by comparing it against the copy of this piece of DNA. This is the mechanism used by the Y chromosome, where the copies are stored as palindromes, with some noise in the middle of these palindromes. Such palindromes with gaps in the middle are often called inverted repeats in biology. The Y chromosome contains eight huge palindromes, the longest of which consists of almost three million characters. Around 25% of the Y chromosome consists of palindromes. The DNA in the palindromes carries genes for describing the male testes. So the mechanism by means of which men survive is called palindrome...

No comments:

Post a Comment