File formats



The following file formats are currently under development, and may change at any time. All files are text files in Unix format, and contain newlines without carriage returns. In the following except for the PGML format, < and > are indicators of text to be replaced.

Abstracts file formats

There are two abstracts file formats that can be read by MacAlias. The program distinguishes between these formats by the contents of the file.
  1. Plaintext abstracts files. Files of this type contain abstracts in the following format:
    PMID - <PMID number>
    TI  -  <Title text on one line>
    AB  -  <Abstract text on one line>
    RN  -  <Residue text on one line>
    PL  - <PL field data on multiple lines, all other lines indented with spaces>
    PN  - <PN field data on multiple lines, all other lines indented with spaces>
    PF  - <PF field data on multiple lines, all other lines indented with spaces>
    <Additional PL, PN, PF, or other fields>
    //
    <Additional abstracts>
    Plaintext abstracts files must begin with a PMID number on the first line.

  2. PGML (Protein/Gene Markup Language) embedded markup abstracts files. Files of this type contain abstracts in the following tagged format. In the following, < and > are literal characters which appear in the file:
    <!doctype pgml>
    <pmid PMID-number>
    <title>Title text</title>
    <abstract>
    Abstract text
    </abstract>
    <residue>
    Residue text
    </residue>
    <other>
    PL, PF, and PN fields, one physical line per field.  Each field may contain
    multiple logical lines terminated with <br> tags. The end of a field should
    not be terminated with <br>.  All but the first line of a field must be
    indented with a <tab> tag.
    </other>
    Additional abstracts beginning with <pmid> tags.
    PGML abstracts files must begin with the <!doctype> tag on the first line. Text within the title, abstract, residue, and other fields can contain strings which are marked as belonging to a 'category'. Text which is marked in this way is displayed in color in the Abstracts window, using red, blue, and green to alternate through the categories. The format of the embedded markup is indicated by the following example:
    Unmarked text and then <cat "Category name">some marked text</cat> belonging to a category.
    Any number of categories can be indicated in this way. Category names should be in quotes in the <cat> tag.

Notes file format

Abstracts notes are saved in a file with the following format, one line per abstract:

<PMID number><tab><Notes text>

To support multiple logical lines per note, the following character conversion is performed on the note text prior to saving:

When a notes file is read by the program, '$' characters are converted back to newlines.

Proteins file format

The proteins database file is maintained in the following format:

<primary protein symbol>
<tab><primary protein name>
<tab><secondary protein name, if any>
<tab><additional secondary protein names, if any>
<Additional protein entries>
There is no whitespace or other delimiter between protein entries.

Genes file format

The genes database file is maintained in the following format:

<primary gene symbol>
<tab><primary gene name>
<tab><secondary gene name, if any>
<tab><additional secondary gene names, if any>
<Additional gene entries>
There is no whitespace or other delimiter between gene entries.

Protein/gene links file format

The links database file is maintained in the following format:

<primary protein symbol><tab><primary gene symbol>
<Additional link entries>
There is no whitespace or other delimiter between link entries. A links file may contain more than one primary gene symbol per protein. In this case it is assumed that all link entries for a single protein are grouped together contiguously in the file. A links file may also contain more than one primary protein symbol per gene. In this case there is no assumption about the grouping of link entries for a single gene.

Session file format

A session file is saved in the following (keyword, value) format. In the file, keywords appear as literal strings with exact case and spelling:

abstracts:<tab><Full path name of abstracts file>
abstract:<tab><Index number of current abstract in file>
proteins:<tab><Full path name of proteins file>
protein:<tab><Index number of current protein in file>
notes:<tab><Full path name of notes file>
links:<tab><Full path name of links file>
genes:<tab><Full path name of genes file>
Only lines for those keywords for which there is existing data (i.e. the corresponding file has been read) are saved in the file. Note that this file may be edited by hand if, for example, one of the referenced files is moved to another location.

Back to main page


İSky Coyote 2002