Dali1dPlugin - Dali1d sequence alignment plugin for Geneious


I've created a Dali 1d sequence alignment plugin in Java for Geneious 3.0.6. The following example shows the alignment of 1LYZ to 2LZM, as described in 'J Mol Biol. 1993 Sep 5;233(1):123-38. Protein structure comparison by alignment of distance matrices. Holm L, Sander C.'. In Geneious:

  1. Select the structures 1LYZ and 2LZM:



  2. Click the 'Alignment' tool button, and select 'Pre-align 3d structures'. You should see the following default settings:



    These settings are:

    • Pattern length: The number of contiguous residues in each protein which are inter-compared to build initial alignments. The H&S paper uses hexapeptides.

    • Distance threshold (A): The maximum intra-pattern (of two fragments on one protein) distance to consider when creating the pattern similarity list. The H&S paper suggests 25 A, but a shorter distance produces smaller lists (and shorter runtimes) without significantly degrading the initial alignments.

    • Initial pattern list size: The maximum number of positively scoring pattern x pattern comparisons between the two proteins to save. 80000 in the H&S paper.

    • Final pattern list size: The truncated number of pattern x pattern comparisons to retain after comparisons have been performed. 40000 in the H&S paper. The final pattern list is used to create seed alignments for the cycle of iterations.

    • Population size: The number of candidate alignments to try during each iteration. A number from 1000 - 5000 is usually sufficient. Larger populations allow for more variation in the alignment search, but take longer to run.

    • Number of parents to keep: The number of high-scoring alignments to keep across iterations. All other candidate alignments in the population are created from these. The H&S paper suggests 10. Using a population size of 1010 and 10 parents makes 100 new alignments from each pair of parents each iteration.

    • Stability threshold (iters): The maximum number of iterations during which the top alignment score does not change (freezes at a non-optimal solution) before all parent alignments are subjected to bit-clearing 'knockout' (see below). The H&S paper performs 'knockout' after a fixed number of iterations, but I have found that allowing iterations to continue until the leader does not improve for a while seems to work better. 5-10 iterations is a good choice. Set this to a larger value to allow all trajectories to continue to evolve without being subjected to periodic bit-clearing.

    • Maximum number of iterations: The maximum nuber of iterations to perform before returning the best alignment.

    • Minimum alignment length: The minimum number of aligned residues before returning the best alignment.

    • Minimum similarity score: The minimum similarity score before returning the best alignment.

      Iterations proceed until one of these criteria are true.

    • 'Clear bits' probability: The probability that mutation of a new alignment will involve clearing bits in the alignment mask. The H&S paper performs a 'trimming cycle' on all alignments every 5 iterations. However, I have found that performing trimming, expansion, and bit swapping on a continuous basis with fixed probabilities seems to work better.

    • 'Swap bits' probability: The probability that mutation of a new alignment will involve moving bits in the alignment mask from one place to another without changing the length of the alignment. There is no analog of this in the H&S paper.

    • 'Set bits' probability: The probability that mutation of a new alignment will involve setting bits in the alignment mask. The H&S paper performs an 'expansion cycle' on all alignments 4 out of every 5 iterations.

      The H&S paper confines all trimming and expansion operations to units of 4 peptides. However, I have found that randomly choosing peptide sizes of 1, 2, 4, or 6 contiguous bits each operation seems to work better.

    • Knockout length (fraction): The fraction of bits to randomly clear in each parent alignment after the leading alignment has reached stability according to the setting above. The H&S paper uses 20% of the total length.

    • Print informational output to console: Print ongoing run information to the console, including intra-pattern lengths, pattern similarity size and scores, seed construction, and best alignment during each iteration. See below for an example of such output.

  3. Click OK to start the alignment. A progress dialog will appear, showing an estimate of the percentage of the alignment calculation that has been performed. Click 'Cancel' to abort the alignment in progress. If you have chosen to print to the console, information about the progress of the alignment will appear there.




    Here is an annotated example of output displayed during the calculation of an alignment.

  4. When the alignment is complete (about 6 minutes for the defualt settings on a 2 GHz MacBook) it will be displayed in Geneious:



    This particular alignment has 75 corresponding residues and a similarity score of 243.6, which is exceptionally good.

To perform a 3d alignment using the Align3d plugin:

  1. Select an existing 1d alignment made from 2 PDB files (such as the alignment just completed above), click the 'Alignment' tool, and choose 'Align 3d structures'. You should then see the following dialog:



    These settings are:

    • Number of iterations: Number of steps of the McLachlan conjugate axes minimization algorithm to perform. At each step, 3 different axes will be used to rotate the movable protein into alignment with the fixed protein. Usually, just a few iterations will be required to minimize the RMSD error.

    • Index of fixed (reference) sequence/structure: 1-based index of the fixed protein's sequence in the selected alignment.

    • Index of movable (aligned) sequence/structure: 1-based index of the movable protein's sequence in the selected alignment.

    • Print informational output to console: Print information about the alignment to the console.


  2. Click 'OK' to perform the alignment. If output to the console is selected, it will be displayed there, e.g:

    ===== Wednesday, August 15, 2007 10:11:21 AM US/Pacific =====
    
    *** Align3dPlugin ***
    iterations = 6
    fixedIndex = 1
    movableIndex = 2
    outputFlag = 1
    1 arguments:
    "Protein alignment 33" SequenceAlignmentDocument with 2 sequences:
    	  1 "1LYZ" PdbDocument 1LYZ KVFGRCELAAAMKRHGLDNYRGYSL...
    	  2 "2LZM" PdbDocument 2LZM -----------------------MN...
    Fixed PDB = 1LYZ, fixed alignment = KVFGR...
    Movable PDB = 2LZM, movable alignment = -----...
    Calling Align3d(1LYZ, 2LZM, KVFGR..., -----...)
    1LYZ has 129 nodes
    2LZM has 164 nodes
    2LZM has 75 links to 1LYZ
    2LZM       center at       40.880933       -8.357147       15.104293
    1LYZ       center at        0.871867       21.397107       21.101840
    Initial error =       16.322712
        1        3.956975
        2        3.773326
        3        3.747428
        4        3.747418
        5        3.747418
        6        3.747418
        7        3.747418
        8        3.747418
        9        3.747418
       10        3.747418
       11        3.747418
       12        3.747418
       13        3.747418
       14        3.747418
       15        3.747418
       16        3.747418
       17        3.747418
       18        3.747418
    Rotation matrix:
           0.574672        0.768853       -0.280385 
          -0.552787        0.112039       -0.825756 
          -0.603472        0.629533        0.489398 
    

  3. Once the alignment has been performed (just a second or two), the movable protein's coordinates will be transformed and displayed in Geneious. A text view of the transformed protein will also include information about the alignment:



    Both the transformed and reference proteins are included in the output PDB file, so they can be displayed in one window. Here is 1LYZ+2LZM(aligned) in Geneious Basic:

    The 1d alignment was length=78, score=245.805954, rmsd=3.757917. You may need Geneious Pro to color the two chains differently. However, you can do it externally in the free version of Jmol:

Tips for use


© 2007 Sky Coyote