Stochastic 3d Protein Alignment
I've started working on a way to generate synthetic 1d alignments in an attempt to
find the one that generates the best 3d alignment for a constant number of pairs of atoms. In the examples below, I'm
aligning 1LYZ and 2LZM as in the 1993 Holm & Sander 'Dali' paper. In each example, random subsets of N > 3 atoms were paired
in each protein, and the subset that generated the best 3d alignment is shown.
When N = 3 pairs, you can get
the 3d fit down to a very small RMSD value:
Even with 5 pairs, the best RMSD is below 1 Angstrom:
But, with 10 pairs, the 3d fit starts to become poorer:
With 25 pairs, worse still. Nevertheless, notice that there is structure in the
dot plot, and the 'best' synthetic 1d alignment tends to show clustering in nearby
atoms. Some of these clusters even fall near the Dali alignment:
In this case, even the synthetic 1d alignment is better than Needleman-Wunsch for a 3d fit, although it
uses slightly fewer pairs. This work is promising, but there appear to be two problems:
- Entropy. Based on the NW and Dali alignments, one might expect a good 1d/3d alignment
to consist of short line segments of nearby atoms, with gaps in between. These segments could be from bottom left to top right
for subsequences going in the same direction, or from top left to bottom right for antiparallel
subsequences. Purely random alignments are unlikely to converge on this
kind of coherent structure. What I might try is to start with a linear sequential alignment, and perturb it away
from linearity --rather than the other way around-- to see if it converges.
- Freezing. Since there can only be one dot per row and column, alignments with many pairs tend to get stuck
in a particular configuration with nowhere to go. I need to find a way to 'unfreeze' these configurations
away from local minima so that the error can continue to drop.
I've solved these problems, and now when aligning on 100 atom pairs, the best I have done is an RMSD of 2.88 Angstroms. The 1d alignment shows
Another 3d alignment at about the same RMSD value shows a similar 1d pattern:
Other, slightly higher 3d values show slightly different --but similar to one another-- 1d patterns:
However, a better 3d value shows a very different 1d pattern:
For 77 atom pairs, the best I have done was 2.54 A. The Dali result in the HS paper was 4.2 A, although the 1d alignment was
For 42 atom pairs, the best I have done was 1.86 A. The Dali result in the HS paper was 2.2 A:
Although I am now getting good alignments, the biological significance of the results is questionable:
- Why do 3d alignments with similar RMSD values result from very different 1d alignments? Do these represent equally
likely conformational geometries of actual molecules (e.g. enzymes, prions)?
- Although there is clustering in the 1d alignments, clusters are not generally linear, contiguous, subsequences
of atoms, whereas real proteins appear to have linear homologous segments. Perhaps I should try constraining the random 1d alignments
to only allow contiguous linear subsequences of
length N ≥ 4, as suggested by the H&S paper.
İSky Coyote 2007