Align3d - Java software for 3d protein alignment


I've written 2 small java programs related to 3d protein alignment. They are:

Align3d: Align one PDB file to another using the McLachlan conjugate rotation method and an additional 1d sequence alignment file.

Syntax:

java -jar Align3d.jar fixed-PDB-file movable-PDB-file 1d-alignment-file [aligned-PDB-file]

Example:

> java -jar Align3d.jar 1LYZ.pdb 2LZM.pdb dali.alignment.txt
1LYZ.pdb has 129 nodes
2LZM.pdb has 164 nodes
dali.alignment.txt has 199/199 pairs
2LZM.pdb has 94 links to 1LYZ.pdb
2LZM.pdb   center at       42.004128       -6.873670       15.785904
1LYZ.pdb   center at        2.185106       20.626851       21.652223
Initial error =       17.428050
    1        5.619949
    2        5.504115
    3        5.485816
    4        5.485784
    5        5.485784
    6        5.485784
    7        5.485784
    8        5.485784
    9        5.485784
   10        5.485784
   11        5.485784
   12        5.485784
   13        5.485784
   14        5.485784
   15        5.485784
   16        5.485784
   17        5.485784
   18        5.485784
Rotation matrix:
       0.584392        0.763068       -0.276066 
      -0.441732        0.013763       -0.897042 
      -0.680705        0.646171        0.345114 
Writing 2LZM.aligned.pdb

RMSD: Calculate alignment error between 2 PDB files using additional 1d sequence alignment file.

Syntax:

java -jar RMSD.jar fixed-PDB-file aligned-PDB-file 1d-alignment-file [center-flag]

Examples:

> java -jar RMSD.jar 1LYZ.pdb 2LZM.pdb dali.alignment.txt 
1LYZ.pdb has 129 nodes
2LZM.pdb has 164 nodes
dali.alignment.txt has 199/199 pairs
2LZM.pdb has 94 links to 1LYZ.pdb
Error =       51.768560

> java -jar RMSD.jar 1LYZ.pdb 2LZM.pdb dali.alignment.txt 1
1LYZ.pdb has 129 nodes
2LZM.pdb has 164 nodes
dali.alignment.txt has 199/199 pairs
2LZM.pdb has 94 links to 1LYZ.pdb
Error =       17.428050

> java -jar RMSD.jar 1LYZ.pdb 2LZM.aligned.pdb dali.alignment.txt
1LYZ.pdb has 129 nodes
2LZM.aligned.pdb has 164 nodes
dali.alignment.txt has 199/199 pairs
2LZM.aligned.pdb has 94 links to 1LYZ.pdb
Error =        5.485789


Discussion

Both programs read PDB files in version 2.3 format. See the PDB wesbite for more information. The format used for 1d sequence files is simply 2 lines containing the strings for each sequence. For example:

> cat dali.alignment.txt 
kvfgrcelaaamkrhgldnyrgysLGNWVCAAKFESnfNTQATNRNtDGSTDYGiLQINSRwwcnDGRT.pgsrNLCNipcsallssDITASVNCAKKIVSD........................gngMNAWV.....................................awrNRCKGT........DVQAWirgcrl
........................MNIFEMLRIDEG..LRLKIYKDtEGYYTIG.IGHLLTkspsLNAAkseldKAIG..rncngviTKDEAEKLFNQDVDAavrgilrnaklkpvydsldavrrcaliNMVFQmgetgvagftnslrmlqqkrwdeaavnlaksrwynqtpnrAKRVITtfrtgtwdAYKNL......

> cat nw.alignment.txt 
-KVFGRCELAAAMK??RHGLDNYRGY---SLG??NWVC--------AAKFE--------SNFN??TQATNRNTDG----STD??Y-GILQINSRWWCN??DGRTPGSRNLCNI??PCSALLSSD-----ITAS??VNCAKKIVSDGNG??MNA----W---VAWRNRCKG----??TDVQAWIRGCRL??
MNIFEMLRIDEGL??RLKIYKDTEGYYT??IGIGHLLTKSPSL??NAAKSELDKAIGR??NCNGVITKDEAEK??LFNQDVDAAVRGI??LRNAKLKPVYDSL??DAVRRCALINMVF??QMGETGVAGFTNS??LRMLQQKRWDEAA??VNLAKSRWYNQTP??NRAKRVITTFRTG??TWDAYKNL??

At present, these files must be prepared from output possibly in other text formats.

The Align3d program reads the 2 PDB files, the 1d sequence file, and creates links between the 3d structures using the 1d sequences. 3d alignment, and alignment error, is calculated using only linked (corresponding) residues between 3d structures. Coordinates used are for Carbon-alpha atoms in the backbone of the first chain in each file. After links have been formed, the mean of each linked coordinate set is removed (i.e. each chain is moved to {0, 0, 0}) before rotation alignment is performed. See the paper 'McLachlan, A.D. (1982) Rapid Comparison of Protein Structures, Acta Cryst. A38, 871-873' for details of the alignment process. The Align3d program performs 18 iterations of the alignment (6 iterations about each of 3 conjugate axes), which is more than sufficient to stabilize the resulting RMSD error to 6 decimal places. The code can easily be modified to accomodate more or fewer iterations.

Aligned coordinates are written to another PDB file in the same format. In this case, all ATOM, HETATM, and SIGATM records are transformed by the alignment, including those in additional chains. ATOM and HETATM coordinates are rotated and translated to the new alignment center, while SIGATM coordinates are rotated only.

The RMSD program reads the 2 PDB files, the 1d sequence file, and calculates the RMSD error using only the linked atoms. This error may be calculated using the coordinates as-is, or both linked structures may first be translated to {0, 0, 0} by entering a centering flag (any 4th command line argument).


Issues

1. Displaying simultaneous structures.

Once a 3d alignment has been performed, the combined output can be viewed in Jmol:

Or in my Tertris program:

2. 1d alignment sequences.

Currently the Align3d program requires an external 1d alignment file to form links between the two structures. This file could be generated by the Dali or Needleman-Wunsch methods, or some other means, including 1d <--> 3d iterative methods. If a method such as Dali is used, this could be packaged as another Java program/plugin, separate from the 3d alignment. However, if a 1d <--> 3d iterative method is desired, then it should either be part of the 3d alignment code, or two separate 1d and 3d alignment programs/plugins must talk to each other in a tightly integrated manner in order to accomplish the composite alignment.


© 2007 Sky Coyote