An illustrated example of the use of Lutefisk and CIDentify:

1. Here is a rather poor raw CID data from the doubly charged precursor (m/z 739.8) of a 1478.5 Da. tryptic peptide acquired on a QTof:

example [MS/MS]

2. The raw data file is saved as an ASCII tab-delimted file which looks like this. (A list of peaks can also be used as input):


1478.523825 2
73.048897 1
86.099892 1
101.078896 1
102.061897 5
104.062897 3
119.063896 1
120.095894 2
.
.
.

To follow along at home, download the example .dta file.

3. The Lutefisk.params file is then set up with the file information:


//                           Lutefisk parameters file
//
//  If this file is present in the directory from which Lutefisk is invoked,
//  then the value of the parameters listed in the 'VALUE' column below
//  will override the program defaults.
//

// TITLE                 VALUE	       			 DEFAULT

CID Filename:   QTof_ETYGDMADCCEK.dta   | CID Filename.
CID Quality:             Y              | Check for CID data quality. (Y/N)
Peptide MW:              0              | Peptide molecular weight.  Zero will take information from input file.
Charge-state:            2              | Number of charges on the precursor ion.
MaxEnt3:                 N              | Data file processed using MaxEnt 3 (Qtof only) (Y/N)
// Mass Tolerances ----------------------------------------------------------------------
Peptide Error (u):      0.45            | Peptide molecular weight tolerance.
Fragment Error (u):     0.25            | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect.
Final Fragment Err (u): 0.02            | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring.
// Memory and Speed ---------------------------------------------------------------------
Max. Final Sequences:   20000           | Number of final sequences stored.
Max. Subsequences:       5000           | Number of subsequence allowed.
Mass Scrambles for Statistics:   0      | Number of times to use a wrong precursor mass (for calculating score significance).
// Spectral Processing ------------------------------------------------------------------
CID File Type:           D              | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat'
Profile/Centroid:        C              | Is this CID data in profile or centroid form?  P=Profile, C=Centroid, A=Autodetect.
Peak Width (u):         0.75            | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode.
Ion Threshold:          0.1             | Ion threshold.  (Ions  average intensity x Ion threshold are utilized.)
Mass Offset (u):        0.0             | Mass offset.
Ions Per Window:         6              | Ions per input window (windows are 60 Da wide).
Ions Per Residue:       2.7             | Number of ions per average residue.
// Subsequencing ------------------------------------------------------------------------
Transition Mass (u):   5000             | Cutoff for monoisotopic to average mass calculations.
Fragmentation Pattern:   Q              | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic)
Max. Gaps:              -1              | Maximum number of gaps per subsequence. -1 implies a default value.
Extension Threshold:   0.15             | Extension threshold.
Max. Extensions:         6              | Maximum number of extensions per subsequence.
// Extras -------------------------------------------------------------------------------
Cysteine Mass:      160.03065           | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl)
Proteolysis:             T              | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above
Modified N-terminus:     N              | Modified N-terminus?  (N=none, A=acetylated, C=carbamylated, P=pyroglutamic acid)
Modified C-terminus:     N              | Modified C-terminus?  (N=none, A=amidated)
Present Amino Acids:     *              | Amino acids known to be present in the peptide. * means none.
Absent Amino Acids:      *              | Amino acids known to be absent from the peptide. * means none.
Auto Tag:                N              | Auto-tag (Y/N).
Tag Low Mass y Ion:      0              | Sequence tag - low mass y ion
Sequence Tag:            *              | Sequence tag - single letter code, no spaces, from low mass to high mass y ion
Tag High Mass y Ion:     0              | Sequence tag - high mass y ion
Edman Data File:                        | File with Edman data
DB Sequence File:                       | File with sequences to score with the final results.
Shoe Size (US):                         | US shoe size.  Default of 15.

4. The tab-delimited data file, the Lutefisk.params file, the Lutefisk.edman file, and the Lutefisk. details file must all be in the same location as the Lutefisk application. The Lutefisk application is then started and it runs without further user intervention. It's output appears in the application window:


Verbose mode OFF
Lutefisk1900 v1.3.1
Copyright 1996-1900 Richard S. Johnson

Run Date: Thu Dec 20 11:10:03 2001
Processing CID datafile 'QTof_ETYGDMADCCEK.dta'
Number of ions: 36 

Quality assessment:
This spectrum stinks.
Graph is finished. 
Subsequencing is finished.
Max subsequences: 1248 
Scoring 2841 completed sequences.
Scoring  200 sequences following the b and y filter.
The cutoff for the high m/z ions is 83.3 percent.
Scoring   120 sequences after the high m/z filter.
Scoring  114 remaining after removing redundant sequences.
Sequences expanded to   609 for qtof score.
Cross-dressing.

 Rank  X-corr  IntScr  IntOnlyScr Quality  StDevErr CS   CalFact  Sequence
   1   0.588   0.576   0.636      0.667    0.0151   8   0.999916 [SS]EFDMADYPD[202.06]
   2   0.588   0.576   0.636      0.667    0.0151   8   0.999916 [SS]EmDMADYPD[202.06]
   3   0.323   0.575   0.616      0.636    0.0160   8   0.999922 [393.12]GDMADCCEK
   4   0.611   0.574   0.648      0.667    0.0122   8   0.999916 [SS]EFDMAYDPD[202.06]
   5   0.611   0.574   0.648      0.667    0.0122   8   0.999916 [SS]EmDMAYDPD[202.06]
   6   0.588   0.565   0.620      0.667    0.0201   8   0.999940 [SS]EFDMADYPD[202.08]
   7   0.588   0.565   0.620      0.667    0.0201   8   0.999940 [SS]EmDMADYPD[202.08]
   8   0.516   0.565   0.544      0.750    0.0174   9   0.999943 [393.12]GDMADYPD[202.08]
   9   0.516   0.565   0.524      0.750    0.0158   9   0.999941 [393.12]GDMADYPD[202.06]
  10   0.611   0.564   0.632      0.667    0.0199   8   0.999940 [SS]EFDMAYDPD[202.08]
  11   0.611   0.564   0.632      0.667    0.0199   8   0.999940 [SS]EmDMAYDPD[202.08]
  12   0.539   0.564   0.556      0.750    0.0172   9   0.999943 [393.12]GDMAYDPD[202.08]
  13   0.574   0.563   0.656      0.583    0.0190   7   0.999916 [SS]D[276.11]MADYPD[202.06]
  14   0.588   0.561   0.660      0.667    0.0236   8   0.999986 [SS]EmDMADYPD[202.10]
  15   0.588   0.561   0.660      0.667    0.0236   8   0.999986 [SS]EFDMADYPD[202.10]
  16   0.437   0.559   0.504      0.600    0.0177   7   0.999945 [393.12]GDMWCCEK
  17   0.531   0.557   0.612      0.636    0.0139   7   0.999949 [SS]EmDMWYPD[202.06]
  18   0.531   0.557   0.612      0.636    0.0139   7   0.999949 [SS]EFDMWYPD[202.06]
  19   0.574   0.556   0.656      0.583    0.0190   7   0.999916 [SS]D[276.08]MADYPD[202.06]
  20   0.574   0.555   0.656      0.583    0.0188   7   0.999986 [SS]D[276.14]MADYPD[202.10]
0 sequences excluded based on poor quality.

Maximum Spectral Quality = 0.600000

Longest contiguous series of sequence ions defining a sequence of single amino acids  6

 Sequence                                              Rank CombScr X-corr IntOnlyScr  Quality
[393.12]GDMWCCEK                                         1   0.174   0.437   0.504    0.600

The residue 'm' signifies oxidized Met.

Search time:  0:00:06

5. A file named QTof_ETYGDMADCCEK.lut is automatically created with summary information and the final result list of peptides:


Lutefisk1900 v1.3.1
Copyright 1996-2001 Richard S. Johnson

Run Date: Thu Dec 20 11:10:11 2001
 Filename: QTof_ETYGDMADCCEK.dta
 Molecular Weight: 1477.52  Molecular Weight Tolerance:  0.45  Fragment Ion Tolerance:  0.25
 Ion Offset:  0.00  Charge State:  2   Centroided or Pre-processed Data 
 Tryptic Digest       Tryptic QTOF Fragmentation Pattern 
 Cysteine residue mass:  160.03  Switch from monoisotopic to average mass at 5000 
 Ions per window: 6.0  Extension Threshold: 0.15  Extension Number:  6
 Gaps:  2 Peak Width:  0.8 Data Threshold:  0.10 (1) Ions per residue: 2.7
 Amino acids known to be present: *
 Amino acids known to be absent: *
 Unmodified N-terminus. Unmodified C-terminus.
 N-terminal Tag Mass:    0.00 C-terminal Tag Mass:    0.00 Sequence Tag: *0
 Edman data is not available.    AutoTag OFF
 Spectral Quality = 0.600000
Contiguous series of sequence ions defines a sequence of length  6
 
 
 
 
 Sequence                                              Rank CombScr X-corr IntOnlyScr  Quality
[393.12]GDMWCCEK                                         1   0.174   0.437   0.504      0.600

The residue 'm' signifies oxidized Met.

Search time:  0:00:06

Numbers in brackets represent the mass of unsequenced amino acids where sufficient fragmentation to determine the order and/or identity of the dipeptide was not present. Trp (W) has the same mass as Asp + Ala, and the program erroneously inserted it in place of these amino acids. In this case, the actual sequence of the peptide was ETYGDMADCCEK, derived from a tryptic digest of BSA.

6. Optionally, the output file from Lutefisk can subsequently be used as an input file for CIDentify - a version of William Peason's FASTA algorithm modified by Alex Taylor and made available at the FASTA FTP site at the University of Virginia. Searching a non-redundant protein sequence database with the QTof_ETYGDMADCCEK.lut file from the above example produced the following results:


 1 queries used to search 262996175 residues in 834683 sequences
 statistics extrapolated from 20000 to 831917 sequences
 results sorted and z-values calculated from initn score
 17351 scores better than 26 saved, ktup: 1, variable pamfact
 /apps/ms/CIDentify/matrices/Blsm90MS_Qtof.mat matrix, joining threshold: 59, opt. width: 32  scan time:  0:03:08
 Cysteine nominal mass set to 160
 N-terminal bonus residues: RK

 CIDentify version 1.0.7  Search Date: Thu Dec 20 11:14:09 2001
   1 Lutefisk queries vs /apps/blast/db/nr_aa library


The best scores are:                                            initn  initn sum  std.dev.
nr_aa//gi|6687188|emb|CAB64867.1| (AJ133489) albumin [Canis fami   61      61       8.2 
nr_aa//gi|13124699|sp|P49822|ALBU_CANFA SERUM ALBUMIN PRECURSOR    61      61       8.2 
nr_aa//gi|418694|pir||ABBOS serum albumin precursor [validated]    61      61       8.2 
nr_aa//gi|113582|sp|P14639|ALBU_SHEEP SERUM ALBUMIN PRECURSOR;gi   61      61       8.2 
nr_aa//gi|229552|prf||754920A albumin [Bos taurus]                 61      61       8.2 
nr_aa//gi|1351907|sp|P02769|ALBU_BOVIN SERUM ALBUMIN PRECURSOR (   61      61       8.2 
nr_aa//gi|3319897|emb|CAA76841.1| (Y17737) albumin [Canis famili   61      61       8.2 
nr_aa//gi|2190337|emb|CAA41735.1| (X58989) serum albumin [Bos ta   61      61       8.2 
nr_aa//gi|164318|gb|AAA30988.1| (M36787) albumin [Sus scrofa]      56      56       7.3 
nr_aa//gi|113578|sp|P08835|ALBU_PIG SERUM ALBUMIN PRECURSOR;gi|4   56      56       7.3 
nr_aa//gi|1351908|sp|P49064|ALBU_FELCA SERUM ALBUMIN PRECURSOR (   55      55       7.1 
nr_aa//gi|1351909|sp|P49065|ALBU_RABIT SERUM ALBUMIN PRECURSOR;g   54      54       6.9 
nr_aa//gi|17446772|ref|XP_067552.1| (XM_067552) hypothetical pro   50      50       6.2 
nr_aa//gi|543794|sp|P35747|ALBU_HORSE SERUM ALBUMIN PRECURSOR&gi   50      50       6.2 
nr_aa//gi|17456620|ref|XP_074425.1| (XM_074425) hypothetical pro   46      46       5.4 
nr_aa//gi|11528335|gb|AAG37228.1|AF298884_1 (AF298884) intraflag   46      46       5.4 
nr_aa//gi|3336927|emb|CAB06801.1| (Z86111) signal peptidase I [S   46      46       5.4 
nr_aa//gi|7481411|pir||T34784 probable signal peptidase I - Stre   46      46       5.4 
nr_aa//gi|829430|gb|AAA79564.1| (U24449) vpu gene product [Human   45      45       5.3 
nr_aa//gi|7578776|gb|AAF64136.1|AF222792_3 (AF222792) MerT [Stre   44      44       5.1 

>>nr_aa//gi|6687188|emb|CAB64867.1| (AJ133489) albumin [Canis familiaris] (608 aa)
initn:   61  initn sum:   61   Std. dev.:  8.2 
Smith-Waterman score: 34;    77.8% identity in 9 aa overlap

                                                   10                             
                                          XXGDMW-CCEK                             
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v :::  :::X                             
nr_aa/ EFAKACAAEESGANCDKSLHTLFGDKLCTVASLRDKYGDMADCCEKQEPDRNECFLAHKDDNPGFPPLVAPEPDA
              80        90       100       110       120       130       140      

nr_aa/ LCAAFQDNEQLFLGKYLYEIARRHPYFYAPELLYYAQQYKGVFAECCQAADKAACLGPKIEALREKVLLSSAKER
        150       160       170       180       190       200       210       220 

>>nr_aa//gi|13124699|sp|P49822|ALBU_CANFA SERUM ALBUMIN PRECURSOR (ALLERGEN
CAN F 3) (608 aa)
initn:   61  initn sum:   61   Std. dev.:  8.2 
Smith-Waterman score: 34;    77.8% identity in 9 aa overlap

                                                   10                             
                                          XXGDMW-CCEK                             
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v :::  :::X                             
nr_aa/ EFAKACAAEESGANCDKSLHTLFGDKLCTVASLRDKYGDMADCCEKQEPDRNECFLAHKDDNPGFPPLVAPEPDA
              80        90       100       110       120       130       140      

nr_aa/ LCAAFQDNEQLFLGKYLYEIARRHPYFYAPELLYYAQQYKGVFAECCQAADKAACLGPKIEALREKVLLSSAKER
        150       160       170       180       190       200       210       220 

>>nr_aa//gi|418694|pir||ABBOS serum albumin precursor [validated] - bovine (607 aa)
initn:   61  initn sum:   61   Std. dev.:  8.2 
Smith-Waterman score: 34;    77.8% identity in 9 aa overlap

                                                   10                             
                                          XXGDMW-CCEK                             
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v :::  :::X                             
nr_aa/ EFAKTCVADESHAGCEKSLHTLFGDELCKVASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPKLKPDPNTL
              80        90       100       110       120       130       140      

nr_aa/ CDEFKADEKKFWGKYLYEIARRHPYFYAPELLYYANKYNGVFQDCCQAEDKGACLLPKIETMREKVLASSARQRL
        150       160       170       180       190       200       210       220 

Note that this search not only identifies the identical match from the database but also non-identical homologs of serum albumin from other species.