Lutefisk.params file parameters

 

CID Filename: Name of the CID data file. A full or partial pathname can be specified.

CID Quality: If you would like the program to give you it's opinion on the quality of the CID data, type "Y" or "N".  I gave up on this and no longer use it, so the default is “N”.

Peptide MW: Give the peptide molecular weight (NOT MH+!!) including any number of decimal places, depending on the mass accuracy of the instrument. For Sequest ".dta" files, a zero can be entered here, in which case the peptide molecular weight is obtained from the file header.

Charge-state: This is the charge state of the precursor ion. Any integer number can be used, although the program works best on CID spectra obtained from singly or doubly charged ions. Triply-charged ion precursors in a triple quad do not often yield complete sets of fragmentation ions sufficient to delineate a full-length sequence. For Sequest ".dta" files, a zero can be entered here, in which case the precursor charge is obtained from the file header.

MaxEnt3: Were the data subjected to a Max Ent 3 type of processing; ie, were the multiply charged fragment ions converted to their singly-charged counterparts and were the C13 isotope peaks removed? Answer "Y" or "N".

 

Mass Tolerances:

Peptide Error (u): This is the error in the peptide mass measurement in Daltons or fractions of a Dalton. This tolerance can be set as tight as you think your data warrants - 1 or 2 Daltons for low mass accuracy is suitable, or you can use a few hundredths of a Dalton for very accurate mass measurements. It is up to you.  For LCQ data, the software will try to “re-adjust” the peptide MW based on y/b ion pairs, so I generally choose 0.65 u as the peptide MW error for ion traps.  I use 0.45 for Qtof’s.

Fragment Error (u): This is the error in measurement of the m/z values of the fragment ions. For high quality triple quad data with unit resolution in Q3, I use a value of 0.5; for low resolution triple quad data I go with 0.75 or 1.0. For ion trap data, I typically use a value of 0.65. For poorly calibrated Qtof data, I use a value of 0.15 to 0.25, but for very well-calibrated data this tolerance can be reduced to 0.02 to 0.05 u.

Final Fragment Err (u): This value only applies to Qtof data. The idea is that temperature dependent expansion and contraction of the flight tube will change the calibration; however, the errors that result are linear. Lutefisk operates by finding a list of candidate sequences, and then it scores these candidates based on how well the predicted fragments match up with the observed fragments. In the final evaluation of sequence candidates derived from Qtof data, the calculated b- and y-type ions of each sequence are used to adjust the calibration of the data. Once the data has been recalibrated, then this Final Fragment Err is applied. Typically, I use a value of 0.02. If a value of zero is entered, then this recalibration feature is disabled and not applied.

Lately, for Qtof data, I use a Peptide Error of 0.45, a Fragment Error of 0.25, and a Final Fragment Err of 0.02. For nanospray ion trap data (collected in profile mode so that monoisotopic peaks can be identified), I use a Peptide Error of 0.45, a Fragment Error of 0.45, and a Final Fragment Err of zero (no effect). Larger peptide and fragment tolerances of 0.65 u are used for LC/MS/MS data from ion traps -- the centroided ions are not monoisotopic, hence the greater error.

Memory and Speed:

Max. Final Sequences: This is the maximum number of completed sequences (sequences that equal the specified peptide mass plus/minus the peptide mass tolerance) that can be stored before discarding low scoring sequences. This value is dependent on the RAM available to the program (see below); I generally use a value of 20000.

Max Subsequences: This is the maximum number of subsequences (partial sequences that get extended amino acid by amino acid) that can be stored before discarding low scoring subsequences. I usually allow 5000 subsequences to be processed, but this is also dependent on the amount of RAM that is available for Lutefisk. In one test case (Mac G3), I found that 12288 K was sufficient to allow for 20000 final sequences (above) and 5000 subsequences; 4096 was sufficient to allow for 10000 final sequences (above) and 2500 subsequences. I would recommend giving Lutefisk a bit more than the bare minimum, since I won't guarantee that in all cases your computer won't crash when short on RAM. In addition, the number of subsequences allowed is also dependent on the processor speed; I find that 5000 subsequences can take my G3 a few seconds to a minute to process data from a 1500 u peptide.

Mass Scrambles for Statistics: To help determine if the output is correct or nearly correct, Lutefisk compares the output to other sequences that are close matches, but known to be wrong. Typically, a value of six is used for this parameter, in which case, it derives the six best candidate sequences assuming six different incorrect peptide molecular weights. The incorrect molecular weights are 14 u, 28 u, and 42 u less than and greater than the correct peptide mass. The results are known to be wrong, and the scores for these wrong sequences are compared to the results derived by using the correct peptide mass. If you don't want to make a comparison to wrong sequences, then enter a zero for this parameter.  Lately, I have decided that this feature is not all that useful, so I use a value of zero. 

Spectral Processing:

CID File Type: Enter "F" if the CID data file is derived from the Finnigan "List" program, "T" if it is a tab-delineated ASCII file, "L" if it is a text file from the LCQ file converter program, or "D" if it is a ".dta" file.

Profile/Centroid: Profile data is subjected to a 5-point digital smooth; this is the only difference in processing. By entering a 'D' here, the program automatically differentiates between profile and centroid data. When using this default feature, I found that for some Sequest ".dta" files the program would mistakenly decide that it was profile data, so data files ending w/ ".dta" are automatically assumed to be centroided.

Peak Width (u): This value is used in the peak detection part of the program, and is dependent on the resolution of the mass analyzer. For unit resolved peaks, the program tries to identify and discard adjacent C13 peaks. For unit resolved spectra I usually use a value of 1.5; for lower resolution MS/MS data on a triple quad I use a value of 3. The auto-peakwidth seems to work quite well for triple quad data. Put a zero here ("0") to use auto-peakwidth when using profile data obtained from triple quads. For ion trap data, use 1 and for Qtof data use a value of 0.75.

Ion Threshold: Data with an intensity greater than the average intensity times this threshold is used for identifying peaks. I use a fairly low value of 0.1.

Mass Offset (u): For data where the CID fragment ion m/z values are consistently off by a known value, this value can be entered here. For example, if the data is always low by 0.2 Da then 0.2 can be entered here. If it is always high by 0.2, then the value of -0.2 is entered. This situation arises if you acquire data at a different resolution setting than what the third quadrupole was calibrated for.

Ions Per Window: The program steps from ion to ion and counts the number of ions between it and a mass 120 Da higher (120 Da is close to the weight averaged amino acid residue mass). If there are too many ions within this moving window, then only those with the greatest intensity are retained. For regions of a CID spectrum that could contain multiply charged fragment ions, this window is narrowed accordingly (e.g., 60 Daltons for regions that could possibly contain doubly-charged fragment ions). I usually use a value of 6 ions per window for unprocessed profile data. If your CID data contains centroided or peak top data that you have already processed by hand, i.e., you've eliminated superfluous ions and you wish to use all of the ions in the interpretation, try using a larger number here (like 20).

Ions Per Residue: This sets an overall limit to the number of ions to be considered. Since an average residue is of mass 120 Da, then a peptide of mass 1218 would be expected to have around 10 residues. I usually use a value of 2.7 here, so in this example, the number of ions used for sequencing would be limited to 27.

Subsequencing:

Transition Mass (u): This is the mass where the fragment mass values are in transition from monoisotopic to average mass values. Below this cutoff, peptide molecular weights and fragment ion m/z values are assumed to be monoisotopic masses. Above this cutoff average masses are assumed. For triple quadrupole data, I usually use a value of 1800. The cutoff is not abrupt; rather the switch occurs linear over a 400 Da range below the cutoff mass (ie, if 1800 is selected, the masses below 1400 are assumed to be monoisotopic, and those between 1400 and 1800 are in between). Since LCQ and Qtof data routinely give at least unit resolved MSMS data, I tend to set the cutoff very high (5000) for the trap data. This ensures that the program never has to deal with average mass calculations.

Fragmentation Pattern: The idea here was to allow for different types of fragmentation patterns to be recognized by the algorithm, thereby increasing the probability that the correct sequence will be amongst the candidate sequence list. Currently, there are only three types available - low energy CID of tryptic peptides on a triple quad, low energy CID of tryptic peptides on a Qtof, and low energy CID of tryptic peptides on an ion trap. So this means that for now you must enter 'T' (for triple quad), 'Q' (for Qtof), or 'L' (for ion trap).

Max. Gaps: A gap is a dipeptide of unknown sequence, but of known mass. Usually I allow the presence of only one gap per sequence. However, since the two N-terminal amino acids are so frequently unsequenceable, this "gap" is not counted in this limit. A value of "-1" is typically used, which is a signal to use a default number of gaps per sequence that depends on the peptide mass -- larger peptides are allowed more dipeptide gaps than smaller ones.

Extension Threshold: For a given subsequence there may be several possible amino acid extensions. The extension with the best score determines a threshold that the other extensions must exceed - highest score times this threshold equals the limit. I've been using a value of 0.15.

Max. Exentensions: In addition to the threshold described above, it is possible to set a limit on the number of extensions allowed for each subsequence. Only those extensions with the highest score are used and the low scoring extensions are ignored. I use a value of 6 here.

Extras:

Cysteine Mass: This variable is necessary to account for the various ways of alkylating cysteine residues. The easiest way to deal with the many possibilities is to have the user enter the residue mass of cysteine - 160.03 for carbamidomethylated cysteine, 161.01 for carboxymethylated cysteine, 208.07 for pyridylethylated cysteine, or any other value you want.

Proteolysis: This is different from the "fragmentation pattern" described above. If tryptic proteolysis ("T") is selected then both Arg and Lys are forced into the C-terminal position regardless of whether there is any fragmentation data to support their presence. This does not eliminate other possible C-terminal amino acids; it only insures that Lys and Arg are included as possibilities. Likewise, selecting Lys-C ("K") insures that Lys is at the C-terminus, and selecting Glu-C or V8 ("E") insures that Asp and Glu are considered as C-terminal amino acids. By selecting Asp-N ("D") the program makes sure that D is considered for the N-terminal amino acid even if there is no data supporting it is presence.

Modified N-terminus: You must specify the mass of the N-terminus.  For example, use 1.0078 for an unmodified peptide, 43.0184 if the peptide has been acetylated, or 44.0136 for N-carbamylated peptides.

Modified C-terminus: You must specify the mass of the C-terminus.  This is typically 17.0027 for unmodified peptides (-OH).

Present Amino Acids: If a complete sequence lacks one of these amino acids then it is discarded. Use single letter code without spaces. Use '*' to denote none.

Absent Amino Acids: These amino acids are not even considered when generating sequences. Use single letter code without spaces. Use '*' to denote none.

Auto Tag: Auto-tag looks at the most intense ions at m/z values greater than the precursor. It then tries to find short stretches of sequences called "sequence-tags", which are used to limit the number of sequences that are generated. I recommend using it for triple quad and Qtof data obtained for tryptic peptides with doubly-charged precursor ions. Specific sequence tags can still be entered as described below. Since ion trap data can have both b and y ions in the m/z region greater than the precursor ion, I find that it is best to not use the Auto-tag when sequencing with trap data.

Tag Low Mass y Ion: A sequence tag is a short stretch of sequence, usually interpreted by hand, that is surrounded by regions of unknown sequence but of known mass. Typically, these sequence tags are determined from y-type ions at m/z values greater than the precursor ion. If you have a sequence tag, then for this parameter, enter the m/z value of the lowest mass y ion in the series of y ions that delineates the sequence tag. If you do not wish to enter a sequence tag, then this value should be zero.

Sequence Tag: If you have a sequence tag, use the single letter code without spaces ordered from the low mass y ion to the high mass y ion. If you do not have a sequence tag, enter an asterisk ("*").

Tag High Mass y Ion: If you have a sequence tag, then the m/z value of the highest mass y ion in the y ion series that delineates the sequence tag is entered here. If you do not wish to enter a sequence tag, then this value should be zero.

Edman Data File: The program used to use Edman sequencing data, but this is no longer supported.

DB Sequence File: If you have any sequences or a sequence that you might think is correct (derived from, say, a database search), this information is put into this file. Give the path and filename, and if this is left blank, then no database-derived sequences are checked.

Shoe size (US): Enter your shoe size here. If no entry, then a default value of 15 will be assumed.

Output:

Number of sequences: Number of sequences to list in the .lut output file.  This number is the upper limit, and in many cases there will be less.

Score threshold:  The lower Pr score (probability of having half of the sequence correct) limit.  Typically, a lower threshold score of 0.2 is fine.  To maximize the number of sequences in the output, make this value 0.01 and give a high number to “Number of sequences” (e.g., 50).