Tuesday, May 15, 2012

Palindromes in palindromes

In a previous blog post, I develop a quadratic time algorithm for finding the length of the maximal palindromes around all centers of a string. Although it is a lot faster than the naive algorithm that finds all palindromes in a string, it is still not very useful when trying to find palindromes in DNA, or in a text as long as the Bible.

To design an efficient algorithm for finding palindromes, I reuse as much information as possible when calculating the length of maximal palindromes around centers. In this blog post, I will explain the central idea that allows me to reuse previously calculated maximal palindromes.

In the naive algorithm for finding the maximal palindromes around all centers of a string, I start with calculating the maximal palindrome around the first, leftmost, center, then the palindrome around the next center on the first character, then the palindrome around the center in between the first two characters, and so on. I calculate the palindromes from left to right. In the illustration below I depict a string with the letters blackened out. But you may assume that it represents the string "yabadabadoo". In this string, p is the maximal palindrome around center b. For example, p might be "abadaba", with b equal to $9$. I assume p is the longest palindrome that reaches to its rightmost position, so that no palindrome around a center before b reaches there.

I calculate the maximal palindromes from left to right, so I have already calculated the maximal palindrome around center a, which appears within palindrome p. I call the maximal palindrome around center a q. In the "yabadabadoo" example a might be center $5$, with "aba" as maximal palindrome q around it. In the illustration q is completely within p, but it might also extend to before p. It cannot extend after p, since then there would be a longer palindrome that extends to the rightmost position of p. Center a' is the center I get when mirroring center a with respect to center b: b plus the difference between b and a. This would be $13$ ($=9+(9-5)$) in our example. What is the maximal palindrome around a'?

Since p is a palindrome, the string with the same length as q around a' is reverse q, provided q is completely within p. If q extends to before p, we cut it off at the start of p, and remove an equally long part at the end of q. Since q is a palindrome, reverse q equals q, and is hence also a palindrome. I will call this palindrome q' to distinguish it from q. Is it the maximal palindrome around a'?

To find out whether or not q' is the maximal palindrome around a', I determine whether or not the characters before and after q' are the same. If the maximal palindrome around center a does not extend to the start of p, I know that the characters around q are different. Since the same characters appear around the palindrome q', it is the maximal palindrome around center a'. In this case I don't even have to look at the characters before and after q': the fact that q does not extend to the start of p gives me enough information to determine that q' is maximal. If the maximal palindrome around center q does extend to the start of p, as in our example, where "abadaba" starts with "aba", or even before, I have to compare the character before q' with the character after p. If these are different, q' is the maximal palindrome around a', otherwise, I have to investigate further to find out the maximal palindrome around a'. I will do this in the blog post that discusses an efficient algorithm for finding the length of all maximal palindromes in a string. In our example, "aba" is not the maximal palindrome around center a', since the character 'd' appears both before and after q'

No comments:

Post a Comment