1. Here is a rather poor raw CID data from the doubly charged precursor (m/z 739.8) of a 1478.5 Da. tryptic peptide acquired on a QTof:
2. The raw data file is saved as an ASCII tab-delimted file which looks like this. (A list of peaks can also be used as input):
1478.523825 2
73.048897 1
86.099892 1
101.078896 1
102.061897 5
104.062897 3
119.063896 1
120.095894 2
.
.
.
To follow along at home, download the example .dta file.
3. The Lutefisk.params file is then set up with the file information:
// Lutefisk parameters file
//
// If this file is present in the directory from which Lutefisk is invoked,
// then the value of the parameters listed in the 'VALUE' column below
// will override the program defaults.
//
// TITLE VALUE DEFAULT
CID Filename: QTof_ETYGDMADCCEK.dta | CID Filename.
CID Quality: Y | Check for CID data quality. (Y/N)
Peptide MW: 0 | Peptide molecular weight. Zero will take information from input file.
Charge-state: 2 | Number of charges on the precursor ion.
MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N)
// Mass Tolerances ----------------------------------------------------------------------
Peptide Error (u): 0.45 | Peptide molecular weight tolerance.
Fragment Error (u): 0.25 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect.
Final Fragment Err (u): 0.02 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring.
// Memory and Speed ---------------------------------------------------------------------
Max. Final Sequences: 20000 | Number of final sequences stored.
Max. Subsequences: 5000 | Number of subsequence allowed.
Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance).
// Spectral Processing ------------------------------------------------------------------
CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat'
Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect.
Peak Width (u): 0.75 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode.
Ion Threshold: 0.1 | Ion threshold. (Ions average intensity x Ion threshold are utilized.)
Mass Offset (u): 0.0 | Mass offset.
Ions Per Window: 6 | Ions per input window (windows are 60 Da wide).
Ions Per Residue: 2.7 | Number of ions per average residue.
// Subsequencing ------------------------------------------------------------------------
Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations.
Fragmentation Pattern: Q | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic)
Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value.
Extension Threshold: 0.15 | Extension threshold.
Max. Extensions: 6 | Maximum number of extensions per subsequence.
// Extras -------------------------------------------------------------------------------
Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl)
Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above
Modified N-terminus: N | Modified N-terminus? (N=none, A=acetylated, C=carbamylated, P=pyroglutamic acid)
Modified C-terminus: N | Modified C-terminus? (N=none, A=amidated)
Present Amino Acids: * | Amino acids known to be present in the peptide. * means none.
Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none.
Auto Tag: N | Auto-tag (Y/N).
Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion
Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion
Tag High Mass y Ion: 0 | Sequence tag - high mass y ion
Edman Data File: | File with Edman data
DB Sequence File: | File with sequences to score with the final results.
Shoe Size (US): | US shoe size. Default of 15.
4. The tab-delimited data file, the Lutefisk.params file, the Lutefisk.edman file, and the Lutefisk. details file must all be in the same location as the Lutefisk application. The Lutefisk application is then started and it runs without further user intervention. It's output appears in the application window:
Verbose mode OFF
Lutefisk1900 v1.3.1
Copyright 1996-1900 Richard S. Johnson
Run Date: Thu Dec 20 11:10:03 2001
Processing CID datafile 'QTof_ETYGDMADCCEK.dta'
Number of ions: 36
Quality assessment:
This spectrum stinks.
Graph is finished.
Subsequencing is finished.
Max subsequences: 1248
Scoring 2841 completed sequences.
Scoring 200 sequences following the b and y filter.
The cutoff for the high m/z ions is 83.3 percent.
Scoring 120 sequences after the high m/z filter.
Scoring 114 remaining after removing redundant sequences.
Sequences expanded to 609 for qtof score.
Cross-dressing.
Rank X-corr IntScr IntOnlyScr Quality StDevErr CS CalFact Sequence
1 0.588 0.576 0.636 0.667 0.0151 8 0.999916 [SS]EFDMADYPD[202.06]
2 0.588 0.576 0.636 0.667 0.0151 8 0.999916 [SS]EmDMADYPD[202.06]
3 0.323 0.575 0.616 0.636 0.0160 8 0.999922 [393.12]GDMADCCEK
4 0.611 0.574 0.648 0.667 0.0122 8 0.999916 [SS]EFDMAYDPD[202.06]
5 0.611 0.574 0.648 0.667 0.0122 8 0.999916 [SS]EmDMAYDPD[202.06]
6 0.588 0.565 0.620 0.667 0.0201 8 0.999940 [SS]EFDMADYPD[202.08]
7 0.588 0.565 0.620 0.667 0.0201 8 0.999940 [SS]EmDMADYPD[202.08]
8 0.516 0.565 0.544 0.750 0.0174 9 0.999943 [393.12]GDMADYPD[202.08]
9 0.516 0.565 0.524 0.750 0.0158 9 0.999941 [393.12]GDMADYPD[202.06]
10 0.611 0.564 0.632 0.667 0.0199 8 0.999940 [SS]EFDMAYDPD[202.08]
11 0.611 0.564 0.632 0.667 0.0199 8 0.999940 [SS]EmDMAYDPD[202.08]
12 0.539 0.564 0.556 0.750 0.0172 9 0.999943 [393.12]GDMAYDPD[202.08]
13 0.574 0.563 0.656 0.583 0.0190 7 0.999916 [SS]D[276.11]MADYPD[202.06]
14 0.588 0.561 0.660 0.667 0.0236 8 0.999986 [SS]EmDMADYPD[202.10]
15 0.588 0.561 0.660 0.667 0.0236 8 0.999986 [SS]EFDMADYPD[202.10]
16 0.437 0.559 0.504 0.600 0.0177 7 0.999945 [393.12]GDMWCCEK
17 0.531 0.557 0.612 0.636 0.0139 7 0.999949 [SS]EmDMWYPD[202.06]
18 0.531 0.557 0.612 0.636 0.0139 7 0.999949 [SS]EFDMWYPD[202.06]
19 0.574 0.556 0.656 0.583 0.0190 7 0.999916 [SS]D[276.08]MADYPD[202.06]
20 0.574 0.555 0.656 0.583 0.0188 7 0.999986 [SS]D[276.14]MADYPD[202.10]
0 sequences excluded based on poor quality.
Maximum Spectral Quality = 0.600000
Longest contiguous series of sequence ions defining a sequence of single amino acids 6
Sequence Rank CombScr X-corr IntOnlyScr Quality
[393.12]GDMWCCEK 1 0.174 0.437 0.504 0.600
The residue 'm' signifies oxidized Met.
Search time: 0:00:06
5. A file named QTof_ETYGDMADCCEK.lut is automatically created with summary information and the final result list of peptides:
Lutefisk1900 v1.3.1
Copyright 1996-2001 Richard S. Johnson
Run Date: Thu Dec 20 11:10:11 2001
Filename: QTof_ETYGDMADCCEK.dta
Molecular Weight: 1477.52 Molecular Weight Tolerance: 0.45 Fragment Ion Tolerance: 0.25
Ion Offset: 0.00 Charge State: 2 Centroided or Pre-processed Data
Tryptic Digest Tryptic QTOF Fragmentation Pattern
Cysteine residue mass: 160.03 Switch from monoisotopic to average mass at 5000
Ions per window: 6.0 Extension Threshold: 0.15 Extension Number: 6
Gaps: 2 Peak Width: 0.8 Data Threshold: 0.10 (1) Ions per residue: 2.7
Amino acids known to be present: *
Amino acids known to be absent: *
Unmodified N-terminus. Unmodified C-terminus.
N-terminal Tag Mass: 0.00 C-terminal Tag Mass: 0.00 Sequence Tag: *0
Edman data is not available. AutoTag OFF
Spectral Quality = 0.600000
Contiguous series of sequence ions defines a sequence of length 6
Sequence Rank CombScr X-corr IntOnlyScr Quality
[393.12]GDMWCCEK 1 0.174 0.437 0.504 0.600
The residue 'm' signifies oxidized Met.
Search time: 0:00:06
Numbers in brackets represent the mass of unsequenced amino acids where sufficient fragmentation to determine the order and/or identity of the dipeptide was not present. Trp (W) has the same mass as Asp + Ala, and the program erroneously inserted it in place of these amino acids. In this case, the actual sequence of the peptide was ETYGDMADCCEK, derived from a tryptic digest of BSA.
6. Optionally, the output file from Lutefisk can subsequently be used as an input file for CIDentify - a version of William Peason's FASTA algorithm modified by Alex Taylor and made available at the FASTA FTP site at the University of Virginia. Searching a non-redundant protein sequence database with the QTof_ETYGDMADCCEK.lut file from the above example produced the following results:
1 queries used to search 262996175 residues in 834683 sequences
statistics extrapolated from 20000 to 831917 sequences
results sorted and z-values calculated from initn score
17351 scores better than 26 saved, ktup: 1, variable pamfact
/apps/ms/CIDentify/matrices/Blsm90MS_Qtof.mat matrix, joining threshold: 59, opt. width: 32 scan time: 0:03:08
Cysteine nominal mass set to 160
N-terminal bonus residues: RK
CIDentify version 1.0.7 Search Date: Thu Dec 20 11:14:09 2001
1 Lutefisk queries vs /apps/blast/db/nr_aa library
The best scores are: initn initn sum std.dev.
nr_aa//gi|6687188|emb|CAB64867.1| (AJ133489) albumin [Canis fami 61 61 8.2
nr_aa//gi|13124699|sp|P49822|ALBU_CANFA SERUM ALBUMIN PRECURSOR 61 61 8.2
nr_aa//gi|418694|pir||ABBOS serum albumin precursor [validated] 61 61 8.2
nr_aa//gi|113582|sp|P14639|ALBU_SHEEP SERUM ALBUMIN PRECURSOR;gi 61 61 8.2
nr_aa//gi|229552|prf||754920A albumin [Bos taurus] 61 61 8.2
nr_aa//gi|1351907|sp|P02769|ALBU_BOVIN SERUM ALBUMIN PRECURSOR ( 61 61 8.2
nr_aa//gi|3319897|emb|CAA76841.1| (Y17737) albumin [Canis famili 61 61 8.2
nr_aa//gi|2190337|emb|CAA41735.1| (X58989) serum albumin [Bos ta 61 61 8.2
nr_aa//gi|164318|gb|AAA30988.1| (M36787) albumin [Sus scrofa] 56 56 7.3
nr_aa//gi|113578|sp|P08835|ALBU_PIG SERUM ALBUMIN PRECURSOR;gi|4 56 56 7.3
nr_aa//gi|1351908|sp|P49064|ALBU_FELCA SERUM ALBUMIN PRECURSOR ( 55 55 7.1
nr_aa//gi|1351909|sp|P49065|ALBU_RABIT SERUM ALBUMIN PRECURSOR;g 54 54 6.9
nr_aa//gi|17446772|ref|XP_067552.1| (XM_067552) hypothetical pro 50 50 6.2
nr_aa//gi|543794|sp|P35747|ALBU_HORSE SERUM ALBUMIN PRECURSOR&gi 50 50 6.2
nr_aa//gi|17456620|ref|XP_074425.1| (XM_074425) hypothetical pro 46 46 5.4
nr_aa//gi|11528335|gb|AAG37228.1|AF298884_1 (AF298884) intraflag 46 46 5.4
nr_aa//gi|3336927|emb|CAB06801.1| (Z86111) signal peptidase I [S 46 46 5.4
nr_aa//gi|7481411|pir||T34784 probable signal peptidase I - Stre 46 46 5.4
nr_aa//gi|829430|gb|AAA79564.1| (U24449) vpu gene product [Human 45 45 5.3
nr_aa//gi|7578776|gb|AAF64136.1|AF222792_3 (AF222792) MerT [Stre 44 44 5.1
>>nr_aa//gi|6687188|emb|CAB64867.1| (AJ133489) albumin [Canis familiaris] (608 aa)
initn: 61 initn sum: 61 Std. dev.: 8.2
Smith-Waterman score: 34; 77.8% identity in 9 aa overlap
10
XXGDMW-CCEK
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v ::: :::X
nr_aa/ EFAKACAAEESGANCDKSLHTLFGDKLCTVASLRDKYGDMADCCEKQEPDRNECFLAHKDDNPGFPPLVAPEPDA
80 90 100 110 120 130 140
nr_aa/ LCAAFQDNEQLFLGKYLYEIARRHPYFYAPELLYYAQQYKGVFAECCQAADKAACLGPKIEALREKVLLSSAKER
150 160 170 180 190 200 210 220
>>nr_aa//gi|13124699|sp|P49822|ALBU_CANFA SERUM ALBUMIN PRECURSOR (ALLERGEN
CAN F 3) (608 aa)
initn: 61 initn sum: 61 Std. dev.: 8.2
Smith-Waterman score: 34; 77.8% identity in 9 aa overlap
10
XXGDMW-CCEK
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v ::: :::X
nr_aa/ EFAKACAAEESGANCDKSLHTLFGDKLCTVASLRDKYGDMADCCEKQEPDRNECFLAHKDDNPGFPPLVAPEPDA
80 90 100 110 120 130 140
nr_aa/ LCAAFQDNEQLFLGKYLYEIARRHPYFYAPELLYYAQQYKGVFAECCQAADKAACLGPKIEALREKVLLSSAKER
150 160 170 180 190 200 210 220
>>nr_aa//gi|418694|pir||ABBOS serum albumin precursor [validated] - bovine (607 aa)
initn: 61 initn sum: 61 Std. dev.: 8.2
Smith-Waterman score: 34; 77.8% identity in 9 aa overlap
10
XXGDMW-CCEK
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^v ::: :::X
nr_aa/ EFAKTCVADESHAGCEKSLHTLFGDELCKVASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPKLKPDPNTL
80 90 100 110 120 130 140
nr_aa/ CDEFKADEKKFWGKYLYEIARRHPYFYAPELLYYANKYNGVFQDCCQAEDKGACLLPKIETMREKVLASSARQRL
150 160 170 180 190 200 210 220
Note that this search not only identifies the identical match from the database but also non-identical homologs of serum albumin from other species.