Sherpa Documentation

User Guide for
Sherpa
"Your Guide to the Peaks"

Version 3.3.1
Documentation updated February 14, 2000

A Macintosh-based expert system for LC/MS and
MS/MS analysis of protein digests

Sherpa Homepage: http://www.hairyfatguy.com/Sherpa/

Copyright © 1994-2000 J. Alex Taylor and the University of Washington
All Rights Reserved Worldwide.

1.0 Welcome to Sherpa: 1.1 Statement of Program Philosophy; 1.2 System Requirements; 1.3 Quick Start
2.0 Handling Protein Sequences: 2.1 Opening a Protein Sequence; 2.2 Saving a Sequence; 2.3 The Sequence Window; 2.4 Defining Custom Amino Acids and Termini; 2.5 The Digest Window; 2.6 Defining Crosslinks
3.0 LC/MS Interpretation: 3.1 Opening an LC/MS File; 3.2 The LC/MS Import Dialog; 3.3 Running a Primary Search; 3.4 Running a Secondary Search; 3.5 Openended Search
4.0 MS/MS Interpretation: 4.1 Opening an MS/MS File; 4.2 The MS/MS Import Dialog; 4.3 Using PepID to Interpret an MS/MS Spectrum
5.0 Miscellaneous: 5.1 Customizing Settings; 5.2 Window Behavior; 5.3 References; 5.4 Legalities
6.0 Version History

1.0 Welcome to Sherpa

1.1 Statement of Program Philosophy

Sherpa is designed to be a robust, easy to use aide-de-camp in LC/MS and MS/MS interpretation. By automating simple but tediously repetitive calculations it allows the user to quickly determine the obvious and spend time exploring interpretations for the not so obvious. It is not, however, a black box that takes in raw data at one end and returns a full interpretation at the other end. At least a basic understanding of the principles involved in mass spectral interpretation as well as data acquisition is necessary to obtain useful information from its searches.

In an ideal world users would not need to use manuals in order to understand how to run programs. To that end I have tried to keep simplicity and user-friendliness foremost in the design of Sherpa. Interpretation with Sherpa is set up to be a dynamic process in which the user can play with a search's settings, run the search, change the setting, run the search again, etc. as opposed to a linear, one shot interpretation. This means that the user should feel free to experiment with optimizing the various setting without fear of not being able to easily get back to where they had started from.

This user guide primarily contains information about what the various user interface items of the program do. See Taylor et al. (1996), for a more applied discussion of the program.

Return to Table of Contents

1.2 System Requirements

To use Sherpa you must have:

A Macintosh 68020 or better

System 7.0 or higher

2.5 Mb of RAM to run

LC/MS files must be in either Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format.

MS/MS files in any of the above formats can be read natively while MS/MS files in any format not currently supported natively can be imported by saving the data in a text only format where each line of the file is an m/z value followed by a tab and then its corresponding intensity.

Return to Table of Contents

1.3 Quick Start

1.3.1 Opening a Protein Sequence

Sequences can be opened in one of three ways:

Selecting New Sequence from the File menu and typing or pasting the sequence into the text editor at the bottom of the new sequence window.
Selecting Load Sequence... from the File menu to open an existing sequence file previously created by Sherpa, a MacBioSpec file, or any file saved as text only.
Double clicking on a previously created Sherpa sequence file or opening a previously created Sherpa sequence file from the finder by using the Open command in the File menu.

The only limit to the number of sequences that can be open at one time is the available memory.

1.3.2 The Sequence Window

Once a sequence is loaded, it will appear in a sequence window with the sequence name as its title. Each sequence loaded will be displayed in its own sequence window. The amino acid sequence is shown in a text editor and can be edited directly by the user.

The N-terminal and C-terminal groups of the protein can be set using the N-terminal Group... and C-terminal Group... buttons in the Sequence Window or by using the same commands in the Options menu. The amino acid set can be defined by using the Define Residues... button or its corresponding command in the Options menu. To quickly replace one residue with another in the sequence use the Search For... command in the Edit menu. Sherpa continuously updates the residue position and mass of the region selected in the sequence.

The Digest button performs a theoretical digest on the sequence and displays the fragments in a digest window. See the Digest Window topic below for more information.

To the right of the sequence display is a box with the sequence's (or selection's) amino acid composition. There are also additional buttons along the right side of the window. The CID button displays the predicted MS/MS fragments for the highlighted section of sequence in a new window. The Copy to Report button prints the sequence and information about the sequence in the Search Results window. The Set Seq Name button allows the user to change the name of the sequence. The Seq Notes button will display any notes which have been associated with the sequence.

1.3.3 The Digest Window

When a sequence in a sequence window is digested, the theoretical fragments are displayed in a digest window. The type of enzymatic cleavage can be set by selecting the Digest Method... button. The cleavage agent(s) currently in use are displayed to the right of this button. The display format of the fragments can be either "MW", which gives the monoisotopic and average masses for each fragment, or "Charge-state" which displays the m/z for 1 to 6 charges for each peptide. The number of partial sites allowed, which is the maximum number of consensus cleavage sites tolerated inside a peptide, is set using a popup menu. A popup menu is also used to select whether the fragments are sorted by sequence position, mass, or HPLC index.

1.3.4 Opening an LC/MS File

An LC/MS file in either Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format can be opened by selecting Load LC/MS File 1... from the File menu. If an LC/MS file is already loaded, an additional LC/MS file can be loaded by selecting Load LC/MS File 2... from the File menu or the new LC/MS file can replace the first LC/MS file by selecting Load LC/MS File 1... again. Reloading an LC/MS file into either of the file 1 or file 2 slots will simply replace the LC/MS file currently loaded in that slot.

1.3.5 Running a Primary Search

Once an LC/MS file has been loaded, the Primary Search option in the Searches menu becomes active. The Primary Search consists of four search Options: a MW grouping search which locates potential charge-state groups, a peptide matching search which compares one or more theoretically digested sequences with the LC/MS data, an unidentified ions option to report prominent peaks that were not MW grouped or matched to a peptide, and a data file comparison option which correlates search results by whether they appear in one or both of the loaded LC/MS files (if two LC/MS files are loaded). The results of a Primary Search are displayed in the Search Results Window.

The Primary Search Settings Dialog can be accessed by the Primary Search Settings... option in the Searches menu. The search settings can also be accessed individually in the Options menu.

1.3.6 Using PepID to Interpret an MS/MS Spectrum

PepID, an algorithm developed by Richard Johnson at the University of Washington can be used to correlate an MS/MS spectrum with a protein sequence. After searching the sequence to find each peptide who's mass is within tolerance of the MW of the parent, the candidate sequences are scored to determine what percentage of the total ion current from the MS/MS peak list they can account for by conventional low energy fragmentation. In order to run a PepID search, an MS/MS file must be loaded.

Return to Table of Contents

2.0 Handling Protein Sequences

2.1 Opening a Protein Sequence

Sequences can be opened in one of three ways:

Selecting New Sequence from the File menu and typing or pasting the sequence into the text editor at the bottom of the new sequence window.

Selecting Load Sequence... from the File menu to open an existing sequence file previously created by Sherpa, a MacBioSpec file, or any file saved as text only.

Double clicking on a previously created Sherpa sequence file or opening a previously created Sherpa sequence file from the Finder by using the Open command in the Finder's File menu.

Any typed or opened sequence is stripped of any non-alphabetic characters and then capitalized before being displayed in the sequence window. The only limit to the number of sequences that can be open at one time is the available memory.

Return to Table of Contents

2.2 Saving a Sequence

Selecting Save As... from the File menu will display a standard save dialog with the sequence name as the default file name. Selecting Save (command-S) from the File menu will save the sequence and its information back to the sequence file if one already exists and can be located. If Save is selected and a sequence file does not already exist or cannot be located then Sherpa will display a standard save dialog as though Save As... had been selected.

When Sherpa saves a sequence file it places the sequence in the data fork of the file and information about it's termini and any custom amino acids used in the resource fork of the file. Hence, to any other program the sequence file will appear to be a normal text file containing just the sequence. But when a Sherpa-created sequence file is opened again in Sherpa, it will retrieve and reset the custom information saved with the sequence.

When saving a sequence that has been crosslinked to another sequence Sherpa gives the user the option to save the sequence as a regular sequence or as part of a linked sequence file which contains all the involved sequences and crosslinks. The "Insulin [human]" file in the sample sequences folder is an example of a linked sequence file.

Return to Table of Contents

2.3 The Sequence Window

Once a sequence is loaded, it appears in a sequence window with the sequence file's name as its title. Each sequence loaded is displayed in its own sequence window.

The amino acid sequence is shown in a text editor and can be edited directly by the user. When changes are made to the sequence, Sherpa instantly updates both the values displayed in the sequence window and the fragments displayed in the digest window, if it has been opened. The location of the cursor selection in the sequence is shown in the small box directly above the sequence editor along with the mass of the region currently selected.

Note:: The mass displayed in the selection box is not a residue mass but a peptide mass containing an N and C-terminal group (H and OH respectively unless the selection includes the protein's N or C-terminus in which case the user specified terminal group is included in the mass).

The N-terminal and C-terminal groups of the protein can be set using the N-terminal Group... and C-terminal Group... buttons in the sequence window or by using the same commands in the Options menu. The amino acid set can be defined by using the Define Residues... button or its corresponding command in the Options menu. (See the Defining Custom Amino Acids and Termini topic for more information.) Any non-standard amino acid assignments are displayed to the right of the Define Residues... button. To quickly replace one residue with another in the sequence use the Search For... command in the Edit menu.

An estimated pI value and molar extinction coefficient at 280nm can be calculated and displayed in the sequence window by selecting More Options... from the Options menu. The isoelectric point is estimated by using pKa's from Bull (1971), in the Henderson-Hasselbach equation to find the pH where the net charge on the protein is less than 0.01. The extinction coefficient is estimated using the coefficients derived by Mach et al. (1992).

The Digest button performs a theoretical digest on the sequence and displays the fragments in a digest window. See the Digest Window topic below for more information.

To the right of the sequence display is a box with the sequence's (or selection's) amino acid composition. The display of the amino acid composition can be turned on and off by by selecting More Options... from the Options menu. There are also additional buttons along the right side of the window. The CID button displays the predicted MS/MS fragments for the highlighted section of sequence in a new window. The Copy to Report button prints the sequence and information about the sequence in the Search Results window. The Set Seq Name button allows the user to change the name of the sequence. The Seq Notes button will display any notes which have been associated with the sequence. The Text to Speech button allows the user to play back the displayed sequence (or selection). The pause length after every third residue can be adjusted via More Options... from the Options menu. This can be useful in proofreading sequences entered by hand. Note that this button is only visible on systems which have the speech manager installed.

The elemental composition of the sequence (or selection) can be displayed below the sequence by selecting More Options... from the Options menu. If any crosslinks have been created, a toggle switch also appears below the sequence which allows the user to include crosslinks in the calculation of mass and elemental composition. See the Defining Crosslinks topic below for more information.

To dispose of a sequence window (and any associated digest or notes windows) use the Close Sequence submenu in the File menu. Clicking on the Go Away Box makes the sequence window disappear but does NOT dispose of it. (This is convenient for keeping down window clutter.) See the Window Behavior topic for more information.

Return to Table of Contents

2.4 Defining Custom Amino Acids and Termini

To define a custom amino acid for a sequence, bring its sequence or digest window to the front and select Define Residues... from the Options menu. Alternatively, use the Define Residues... button in the sequence window. A scrolling list will appear with the 20 common amino acids and 6 undefined amino acids (B, J, O, U, X, and Z).

To change an amino acid assignment, double click on the line in the list. A dialog containing a list of custom amino acids to choose from is then displayed. Buttons in the custom amino acid dialog allow you to create, edit, or delete custom amino acids from the list.

A custom amino acid is selected by double clicking on it or by highlighting it and choosing OK. Termini are selected, created, or deleted in a similar way.

The libraries of custom amino acids and termini are stored in the Sherpa Preferences file located in the Preferences folder in the System folder. Each time that Sherpa is started, it opens the preferences file and loads in this custom information along with the user settings. If the preferences file cannot be found, the program creates a new file with default settings and modification libraries.

Termini and custom residue information for a sequence are stored with the sequence files. If the program opens a Sherpa-created sequnce file and finds a custom residue or termini that is not already in its custom library, it will automaticly add the new modifications to the library in the preference file.

Return to Table of Contents

2.5 The Digest Window

When a sequence in a sequence window is digested, the theoretical fragments are displayed in a digest window. Any changes made in a sequence window are automatically reflected in the corresponding digest window.

The type of enzymatic cleavage can be set by selecting the Digest Method... button. The cleavage agent(s) currently in use are displayed to the right of this button. The display format of the fragments can be either "MW", which gives the monoisotopic and average masses for each fragment, or "Charge-state" which displays the m/z for 1 to 6 charges for each peptide. The number of partial sites allowed, which is the maximum number of consensus cleavage sites tolerated inside a peptide, is set using a popup menu. A popup menu is also used to select whether the fragments are sorted by sequence position, by mass, or by HPLC index (see below).

At the far right is a Display Options button which dislays the digest display options dialog. In the digest display options dialog you can toggle between the first column displaying the fragment number or the HPLC index for the fragment. There is an option that allows for the display of the full sequence of the predicted fragments instead of abbreviating fragments that are too long. The user can also choose the number of charge-states and decimal places to display and whether they are calculated as positive or negative ions.

The HPLC index uses the amino acid retention coefficients derived by Browne et al. (1982), to predict the percent acetonitrile at which the peptide should elute from a C₁₈ reversed-phase column with 0.1% TFA as the ion-pairing agent. Some peptides, especially longer ones, may have values outside of the range of 0 - 100%; but this may still be informative as a relative measure of a peptide's hydrophobic character.

User Tip:: Printing out the charge-state "scorecard" of the peptides sorted by mass is often useful for keeping track of observed and identified ions.

Return to Table of Contents

2.6 Defining Crosslinks

Crosslinks are created within the crosslink dialog which is accessed by selecting Crosslinks... from the Options menu. Crosslink types can be created or edited to suit the users needs. Note that editing of sequences into which crosslinks have been introduced can cause crosslinks to become invalid, which will delete them. Crosslinks are not considered by the secondary searches or by the openended search.

Return to Table of Contents

3.0 LC/MS Interpretation

3.1 Opening an LC/MS File

An LC/MS file in either Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format can be opened by selecting Load LC/MS File 1... from the File menu. Finnigan (".dat") files must be FTP'd to the Macintosh in binary mode and given a file type of '????' or 'BINA'. (The default binary file type can be set in the preferences of either Telnet or Fetch.) If an LC/MS file is already loaded, an additional LC/MS file can be loaded by selecting Load LC/MS File 2... from the File menu or the new LC/MS file can replace the first LC/MS file by selecting Load LC/MS File 1... again. Reloading an LC/MS file into either of the file 1 or file 2 slots will simply replace the LC/MS file currently loaded in that slot.

Return to Table of Contents

3.2 The LC/MS Import Dialog

Once an LC/MS file has been selected, a dialog will appear in which the import parameters are set.

The top portion of the LC/MS import dialog contains information about how the LC/MS file was acquired: the number of scans acquired, whether it was positive or negative ion data, whether the data is centroided or profile, the exact step size used in acquisition (including any mass defect), and the m/z range that was scanned.

Note:: There is a user override mode to allow the toggling of positive ion/ negative ion and profile/centroid. Hold down the option key and click on the line of text to toggle it. This capability is provided in the event that the file type is somehow misinterpreted in the initial analysis.

The bottom portion of the dialog contains the import settings which are initially set to the default values. (For information on changing the default import settings see the Customizing Settings topic.) The first settings box determines which portion of the LC/MS file will be imported for analysis. It will always default to importing the entire scan range and mass range of the file.

The second settings box contains the settings for processing each scan of the LC/MS file to find its peaks - the intensity threshold, peaktop minimum and minimum m/z peak width. These are the most critical import parameters and also the parameters that vary the most from LC/MS file to LC/MS file. The relationship of these parameters is illustrated in figure below. If the data is centroided the minimum m/z peak width parameter is ignored. Also, each scan can now, optionally, be smoothed before processing.

The third settings box contains chromatographic parameters. The minimum number of scans for a "chromatographic peak" determines how many consecutive scans an ion must appear in to be considered a chromatographic peak. The chromatographic hole tolerance is the number of consecutive scans without signal that are tolerated within a chromatographic peak.

Note:: In the case of Sciex API 100/300 LC/MS flles which contain mulltiple experiments in a single period, when importing one of these experiments for processing , the chromatographic hole tolerance must be set at least to the number of scans per cycle - 1 or no peaks will be found.

Once the OK button in the LC/MS Import Dialog is pressed, a small progress dialog will appear while the LC/MS data is being imported and processed. The peaks found in each scan are compared to a list of ongoing chromatographic peaks. If an m/z matches that of an ongoing peak it is added to it. If an m/z does not match that of an ongoing peak it becomes a new peak. If a chromatographic peak does not extend for the minimum number of scans, as set in the import dialog, it is removed from the list. The importing process can be aborted by pressing command-period; any data processed up to that point will be saved.

When Sherpa is finished processing the LC/MS data it will display the list of chromatographic peaks, sorted by mass, in an LC/MS Peaks window and give the total number of peaks at the bottom. This window is for display purposes only; the data cannot be edited and fed back into the program. Reloading an LC/MS file into either of the file 1 or file 2 slots will simply replace the LC/MS file currently loaded in that slot and replace the data in its LC/MS Peaks window.

It cannot be stressed enough that using proper thresholds and import ranges when importing LC/MS data is critical to all subsequent analyses. If the progress of the processing appears to be going very slowly or you get an exorbatent number of peaks (greater that a thousand or so), try reimporting the data with higher thresholds or with cropping of the import range to exclude as much salt, junk, noise, etc. as possible. It is often useful to examine the data imported under several different conditions.

Return to Table of Contents

3.3 Running a Primary Search

Once an LC/MS file has been loaded, the Primary Search option in the Searches menu becomes active. The Primary Search consists of four search options: a MW grouping search which locates potential charge state families, a peptide matching search which compares one or more theoretically digested sequences with the LC/MS data, an unidentified ions option to report prominent peaks that were not MW grouped or matched to a peptide, and a data file comparison option which correlates search results by whether they appear in one or both of the loaded LC/MS files (if two LC/MS files are loaded). The results of a Primary Search are displayed in the Search Results Window.

3.3.1 MW Grouper

The MW grouper combs the peak list data from the LC/MS file to find potential ion charge-state families derived from the same molecular species (assuming a proton as the charge agent). To be considered a group, the ions must 1) be consecutive charge-states of the same MW and 2) elute at the same time (have the same apex scan). In the MW Grouper Settings dialog the following parameters can be set: the minimum and maximum charge states to consider, the error tolerance method used, the apex scan tolerance, and the minimum number of items for a group. The MW Grouper Settings dialog can be accessed either through the Options menu or through the Primary Search Settings dialog. The results of the MW grouper are often used by other searches. The default Sherpa method for the MW error tolerances is as follows: the error for each peak is equal to the charge-state times the step size. The minimum MW error tolerance for each peak is 0.5 Da.

Return to Table of Contents

3.3.2 Peptide Match

The peptide match search compares the ions from the LC/MS peak list and any MW groups to the predicted digest fragments for the sequence or sequences selected. MW groups that match a peptide are termed "probable peptide matches". Single ions that match a peptide are termed "possible peptide matches".

Parameters for peptide matching are set in the Peptide Match Settings dialog which can be accessed either through Seq Match Settings... in the Options menu or through the Primary Search Settings dialog. The peptides can be matched either using their average masses or by monoisotopic masses below, and average masses above, a specified switch mass. [This is the only location to set this parameter, which is used by any other search that matches to peptides.] The charge state search range for ungrouped ions and the error tolerance used in matching are other user-definable parameters. The error tolerance also has an option to boost the matching error by 0.25 Da for every 1000 Da of peptide MW.

Return to Table of Contents

3.3.3 Unidentified Ions

Unidentified ions are those ions which were not considered part of a group by the MW grouper or matched to peptides. Parameters for the reporting of unidentified ions are set in the Unidentified Ion Reporting dialog which can be accessed either through the Options menu or through the Primary Search Settings dialog. All unidentified ions above a specified intensity threshold can be reported; or, alternatively, the top 10, 25, 50, or 100 most intense unidentified ions can be reported. The list of unidentified ions that is displayed can be sorted be m/z, intensity, or scan.

Return to Table of Contents

3.3.4 Comparing Data Files

To compare the data in two LC/MS files, load them into the two LC/MS data file slots and select "Compare Datafiles" in the Primary Search settings. LC/MS data acquired with different acquisition settings (i.e. a different step size or mass range) can be compared. It is strongly recommended, however, that the same gradient/HPLC conditions be used for LC/MS data being compared. Currently only like-ion data can be compared; e.g. two positive ion LC/MS files or two negative ion LC/MS files can be compared but not a positive ion file and a negative ion file.

The comparison is based on the use of a linear fit of the apexscans of overlapping chromatographic ions to derive a relationship for what the corresponding scan number would be in the first LC/MS file given a scan number in the second LC/MS file. The fitting can be done automatically or a manual fit can be entered. For the first step in the automatic fitting, the lists of chromatographic ions (characterized by three parameters - m/z, apexscan, and intensity) from each LC/MS file are compared and any ions whose m/z values are within a small user specified tolerance of one another add a point to an x,y plot where x is the apexscan of the ion from the first LC/MS file and y is the apexscan of the ion from the second LC/MS file. If more than 3 points have the same value of x or y, they are all removed to prevent them from skewing the fit. A recursive linear least-squares fit is then performed on the remaining points. If the most outlying point's deviation from the fit line is greater than 1.5 times the standard deviation computed without that point then the point is removed. This is repeated until the most outlying point is within tolerance or until 40% of the points have been thrown out. The final fit line is considered valid if there were more than three points before fitting, the correlation coefficient for the line is greater than 0.7, and the slope of the line is greater than 0.1.

The settings for LC/MS data file comparison can be set by selecting LC/MS File Comparison Settings... from the Options menu or via the Datafile Comparison Settings... button in the Primary Search Settings dialog. Automatic fitting can be selected or the user can manually enter the linear equation used to predicted in which scan a peak that appears at a certain scan in one data file would appear at in the other LC/MS file. The user can also set the error tolerance (in m/z) and the apexscan tolerance around the predicted scan which are used as criteria for a match.

Note:: Currently only Primary Search results can be compared.

Return to Table of Contents

3.4 Running a Secondary Search

The secondary searches are searches to help identify modified peptides in the digest. At present the secondary searches available are glycosylation and phosphorylation. Once an LC/MS file has been loaded, the Secondary Search option in the Searches menu becomes active. The results of a Secondary Search are displayed in the Search Results Window.

3.4.1 Glycosylation

The glycosylation search works by looking for staircases due to carbohydrate micro heterogeneity on a given peptide. It is based on the observation that nearly every glycosylation site is heterogeneous with the differences being small increments of several carbohydrates. The isoform with more glycosylation is always more hydrophilic so the staircases appear to step down and to the left in conventional, increasing organic gradient, reverse phase setups. And, at least in positive-ion data, the glycopeptide ions are usually quite high in m/z due to addition of substantial carbohydrate mass without a corresponding increase in possible charge sites. Glycosylation indicator ions of HexNAc⁺, HexHexNAc⁺, and NeuNAc⁺ are also indicative of glycopeptides when scanned in a stepped orifice voltage mode to produce collision-induced fragmentation while scanning at low m/z (Bean et al. 1995, Carr et al 1993).

The glycosylation search algorithm scans through the MW groups and ungrouped LC/MS peak data looking for mass differences that correspond to the selected carbohydrates to search for. The apex scan difference of a match must be within the user specified range; and if indicator ion verification is turned on an indicator peak must be within a specified number of scans of each match component. The indicator ions used are m/z 204 (HexNAc⁺), m/z 366 (HexHexNAc⁺), and, if NeuNAc is included in the staircase search, m/z 292 (NeuNAc⁺), and m/z 274 (NeuNAc⁺ - H20).

Once staircases are found in the data, if a sequence or sequences have been loaded and selected in the Secondary Search Settings dialog Sherpa attempts to jump from the lowest step of each staircase to the predicted peptides by carbohydrate increments. There is an N-link search which only tries to match to peptides containing an N-link consensus site, Asn-X-Ser/Thr-X where X can be any amino acid but proline. There is also an N&O-link search which tries to match the steps to peptides which contain Ser or Thr (as there is no real consensus O-link attachment site).

The settings for the glycosylation search are accessed via the Secondary Search Settings submenu in the Options menu or from the Secondary Search Settings dialog from the Searches menu. The origin of the protein can be specified as either yeast or mammalian. Selecting yeast as the origin will exclude NeuNAc and Fucose from being searched for and will also slightly alter the carbohydrate ranges used when attempting to match the staircases to the peptides. The apex scan difference range for each carbohydrate can be set by a pair of sliders. The upper (yellow) slider sets the minimum apex scan difference for ions differing by that sugar and the lower (red) slider sets the maximum apex scan difference.

Return to Table of Contents

3.4.2 Phosphorylation

Phosphorylation can be searched for with a data template search which searches for the difference of the mass of a phosphate between MW groups and peaks and/or a sequence template search which searches for matches between the peaks and the predicted peptides by adding the mass of a phosphate to potential phosphorylation sites. The settings for the phosphorylation search are accessed via the Secondary Search Settings submenu in the Options menu or from the Secondary Search Settings dialog from the Searches menu. For the data template search, an apex scan difference range can be set. The sequence template search can be set to search for Ser/Thr phosphorylation and/or Tyr phosphorylation and a maximum number of sites to consider per peptide can be specified. The charge state range to consider for ungrouped ions, the minimum rating to display, and the error tolerance for matching can also be set in this dialog.

Note:: The indicator ion verification option for the phosphorylation search is currently only usable for Sciex API 100/300 files .

Return to Table of Contents

3.5 Open-ended Search

The purpose of the open-ended search is to search a sequence or sequences for a given MW, or m/z regardless of consensus proteolytic sites. Hence it is useful for detecting spurious proteolytic cleavages and identifying MW groups that were not matched to the theoretical digest fragments by peptide matching. The open-ended search window can be accessed from the Searches menu. FASTA formatted protein databases (such as the OWL non-redundant protein database - downloadable from the NCBI at ftp://ncbi.nlm.nih.gov/repository/OWL/) can be used to perform a mass fingerprint search on a list of molecular weights input into the small text box on the right of the window. The database search algorithm is still being optimized and hence is rather slow unless the two termini option is utilized.

The user can specify whether the mass to be searched for is a MW or an m/z. Alternatively, if a MW grouper search has been run the groups that did not match to the digest can be automatically run. One or all sequences can be chosen to be searched. Termini constraints, the number (0, 1, or 2) of the termini that must match predicted proteolytic sites, and the error tolerance to be used in matching can also be specified. The Mod. Options... button gives a dialog that allows for the consideration of post-translational and user-defined modifications (see the Potential Modifications topic which follows.)

Return to Table of Contents

3.5.1 Potential Modifications

The Potential Modifications Dialog displays an array of common post-translational modifications which are considered during the open-ended or PepID search if they are selected. There is also a single AA substitution option (which should be used sparingly as it can generate large numbers of possible matches).

If a modification you wish to consider is not among the pre-set possibities, you can now define your own modifcations in the user defined modifications section at the bottom of the dialog. Select custom modifications you have created by clicking once on the modification in the scrolling list. A check appears to the left of the line to indicate that the modification will be included in the search. Clicking a second time on the line removes the custom modification from the search and removes the check to the left. The Create..., Edit..., and Delete buttons are used to manage the custom modificatons.

The Create... and Edit... buttons display a dialog for you to input or check information about the custom modification.

Each modification must have: 1) a name of up to 29 characters, 2) a modification type - AA modification, N-terminal protein modification, N-terminal peptide modification, C-terminal protein modification, or C-terminal peptide modification, 3) a maximum number of modifications allowed per peptide, 4) a motif - which has a specified length with each position having a specified list of possible residues, and one position of the motif is specified as the modified residue, 5) A net compositional change associated with the modification which in turn is used to calculate the modification's delta mass, and 6) if the modification is a N-terminal or C-terminal modification then a unique short label of up to 6 characters must be specified to identify the modification in the output.

Return to Table of Contents

4.0 MS/MS Interpretation

4.1 Opening an MS/MS File

MS/MS files in Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format formats can be read natively. Finnigan (".dat") files must be FTP'd to the Macintosh in binary mode and given a file type of '????' or 'BINA'. (The default binary file type can be set in the preferences of either Telnet or Fetch.) MS/MS files in any format not currently supported natively can be imported by saving the raw data in a text only format where each line of the file is an m/z value followed by a tab and then its corresponding intensity. An MS/MS file is imported by selecting Load MS/MS File... from the File menu.

Return to Table of Contents

4.2 The MS/MS Import Dialog

Once an MS/MS file has been selected, a dialog will appear in which the import parameters are set. The MS/MS Import Dialog is very similar to the LC/MS Import Dialog. The top portion of the MS/MS import dialog contains information about how the MS/MS file was acquired: the m/z of the parent ion, whether it was positive or negative ion data, whether the data is centroided or profile, the exact step size used in acquisition (including any mass defect), and the m/z range that was scanned.

Note:: There is a user override mode to allow the toggling of positive ion/ negative ion and profile/centroid. Hold down the option key and click on the line of text to toggle it. The m/z of the parent ion can similarly be modified. This capability is provided in the event that the file type is somehow misinterpreted in the initial analysis.

The bottom portion of the dialog contains the import settings which are initially set to the default values. The first settings box determines which portion of the MS/MS file will be imported for analysis. It will always default to importing the entire mass range of the file.

The second settings box contains the settings for processing the scan to find its peaks - the intensity threshold, peaktop minimum and minimum m/z peak width. (See LC/MS Import Dialog for a discussion of these parameters.)

The third settings box contains a popup menu to set the number of times the scan should be smoothed before processing. The smoothing algorithm is identical to that used in MacSpec. This box also contains an option to limit the final number of imported ions to the X most intense ions, where X can be defined by the user.

Once the OK button in the MS/MS Import Dialog is pressed, the scan is processed and the MS/MS PepID Window will appear with the processed peaks listed in the small text editor on the left side of the window. If an MS/MS file is loaded after a previous MS/MS file had already been loaded its peak data will replace the previous peak data but the results of any PepID searches displayed in the main text editor at the bottom of the MS/MS PepID Window will remain.

Return to Table of Contents

4.3 Using PepID to Interpret an MS/MS Spectrum

PepID, an algorithm developed by Richard Johnson at the University of Washington can be used to correlate an MS/MS spectrum with a protein sequence. The selected sequence(s) or a selected FASTA formatted protein database (such as the OWL non-redundant protein database - downloadable from the NCBI at ftp://ncbi.nlm.nih.gov/repository/OWL/) are searched to find all possible peptides with a mass within tolerance of the MW of the parent ion. If a database is chosen to be searched, the database's amino acid set and the digest method can be set via the Database Settings... button. Selected posttranslational modificatons or single amino acid substitutions can also be considered when identifying peptides with the correct mass. (This initial portion is identical to an open-ended search, which was described in the preceeding section.) The database search algorithm is still being optimized and hence is rather slow unless the two termini option is utilized.

The candidate sequences are then scored in two ways: The first score is the percentage of the total ion current from the MS/MS ion list the candidate sequence can account for by conventional low energy fragmentation (listed in the results in the "%" column). The second score, ISCOR, is a more sophisticated intensity-based scoring system which sums a fraction of each matched ions intensity based on the likelihood of the ion type and charge-state. Bonuses and penalties are then added to this based on the continuity of b and y ion series, the percentage of unmatched b and y ions, the presence or absence of particular immonium ions which should be present if their corresponding amino acids are present in the candidate peptide, etc. The ISCOR values are normalized to 1.0 and the candidate peptides of each charge-state are sorted based on this value.

In order to run a PepID search an MS/MS file must be loaded and one or more sequences must also be loaded. The following is a brief description of the PepID search parameters in the MS/MS PepID Window.

Possible charges of parent:: The charge state range to consider in calculating the parent ion's MW.
Sequences to Search:: By selecting the radio button to the left of the menu of sequences, only the sequence selected in that menu will be searched. Alternatively, all loaded sequences can be searched by selecting the All Sequences radio button.
Mod. Options:: This button leads to a dialog box in which specified, potential, posttranslational modifications can be taken into account when calculating peptide masses to compare with the parent ion's MW; and if a modification you wish to consider is not among the preset possibities, you can now define your own modifcations. (See the Potential Modifications topic for more information.)
Give Details:: This button gives specific information about a particular PepID match whose results line is triple click selected.
Termini Constraints:: The number of a peptide's termini that must match consensus cleavage sites for the method in which it was digested.
Error Tolerance:: The tolerance in MW used when comparing charge states of the parent ion to potential peptides from the sequence.
Daughter Ion Tolerance:: The tolerance in m/z used when matching predicted fragmentation of candidate peptides to observed MS/MS peaks.
Mass Offset:: The offset to add to the observed MS/MS peak's m/z when matching it with the predicted fragmentation of candidate peptides.
Max Daughter Ion Charge:: The maximum charge state to consider for daughter ions.

Note:: When getting details for an old result the CURRENT analysis parameters are used.

User Tip:: When doing PepID analysis, it is often helpful to view the MS/MS PepID Window zoomed to full screen so that more of the search results text is visible without scrolling.

Return to Table of Contents

5.0 Miscellaneous

5.1 Customizing Settings

To globally save the current settings as Sherpa's default settings for the next time you run the program simply select Save Preferences... from the Options menu and the current settings (program wide) will be saved as the default values. Preferences can also be set or reset to the program defaults on a more local basis. Windows and dialogs that have settings that can be saved to the preference file have two buttons: a Standard Settings button which will reset the values associated with that window (and ONLY the values associated with that window) back to the program's default values; and a Save Settings button which saves the current settings for the window as the default settings. Note that selecting only Standard Settings does not save the standard settings as the default settings.

The custom amino acids, termini, and modifications that you create in Sherpa are stored in its preferences file. If you install Sherpa on a new computer and wish to retain the custom AA's / termini that you have entered using Sherpa on another computer you can simply copy the Sherpa preferences from the Preferences folder in the System folder and place it in the Preferences folder of the new computer.

Return to Table of Contents

5.2 Window Behavior

Clicking on the Go Away Box in the upper left hand corner of a window will hide the window but will not destroy the window or its contents. Any existing window can be brought to the front by selecting its name from the list of windows at the bottom of the Window menu. The active/ frontmost window is denoted with a check in this list; and any hidden windows are denoted with a dash.

Some windows contain a shredder button above their text editor. These are temporary windows. They can be hidden like other windows by using the Go Away Box or they can be permanently disposed of by clicking the shredder button. Predicted CID fragment windows and sequence notes windows are examples of temporary windows.

Return to Table of Contents

5.3 References

Bean, M. F., Annan R. S., Hemling M. E., Mentzer M., Huddleston M. J., Carr, S. A. (1995), "LC-MS methods for selective detection of posttranslational modifications in proteins: glycosylation, phosphorylation, sulfation, and acylation", in "Techniques in Protein Chemistry VI". 107-116.

Browne, C. A., Bennett, H. P., Solomon, S. (1982), "The isolation of peptides by high-perfromance liquid chromatography using predicted elution positions", Anal. Biochem.. 124, 201-208.

Bull, H. B. (1971), "An introduction to physical biochemistry", Ed. 2, F. A. Davis Co., Philadelphia.

Carr, S. A., Huddleston, M. J., Bean, M. F. (1993), "Selective identification and differentiation of N- and O-linked oligosaccharides in glycoproteins by liquid chromatography-mass spectrometry", Protein Science. 2, 183-196.

Mach, H. K., Middaugh, C. R., Lewis, R. V. (1992), "Statistical determination of the average values of the extinction coefficients of tryptophan and tyrosine in native proteins", Anal. Biochem.. 200, 74-80.

Taylor, J. A., Walsh, K. A., Johnson, R. S. (1996), "Sherpa: A Macintosh-based expert system for the interpretation of ESI LC/MS and MS/MS of protein digests",Rapid Commun. Mass Spectrom.. 10, 679-687.

Return to Table of Contents

5.4 Legalities

This guide and the software described in it are copyrighted with all rights reserved. Neither the guide nor the software may be copied in whole or part without the written consent of the author, except as described in the license agreement provided with this product.

5.4.1 Contact Information:

Alex Taylor
10244 SE 16th
Bellevue, WA 98004 USA

Internet: jataylor@nwlink.com
WWW: http://www.hairyfatguy.com/Sherpa/

Macintosh, and Mac are trademarks or registered trademarks of Apple Computer, Inc. MacSpec and MacBioSpec are trademarks of Perkin-Elmer Sciex Instruments. All other trade names referenced are the service mark, trademark, or registered trademark of the respective manufacturer.

5.4.2 Software License Agreement

CAREFULLY READ THE TERMS AND CONDITIONS OF THIS LICENSE AGREEMENT PRIOR TO USING THIS PACKAGE. USE OF ANY PORTION OF THIS PACKAGE INDICATES YOUR AGREEMENT TO THE FOLLOWING TERMS AND CONDITIONS. IF YOU DO NOT AGREE WITH SUCH TERMS AND CONDITIONS, YOU SHOULD PROMPTLY DESTROY ALL COPIES OF THIS SOFTWARE PACKAGE FROM YOUR COMPUTER AND REMOVABLE MEDIA.

DEFINITIONS

The following definitions apply to the terms as they appear in this agreement:

Package means the software, manual(s), and other items accompanying this agreement.

Software means the computer programs contained in the Package, together with all codes, techniques, software tools, formats, designs, concepts, methods, and ideas associated with these computer programs.

Site means a department within a company and a laboratory group elsewhere.

You and Your refer to any person or entity that acquires or uses the Package.

GRANT

Alex Taylor and the University of Washington hereby grant to Licensee, and Licensee accepts, a non-exclusive license to use the Software in this Package at the Site on the licensed number of computers , to modify the Software for use at the Site, and to make such copies of the Software as are necessary for back-up or archive purposes. Licensee shall retain in the Software the copyright, trademark, or other notices pertaining to the Software as provided by the University of Washington. Licensee shall not distribute, publish, or otherwise transfer or allow to be transferred the Software or any modified or unmodified copies thereof, in whole or in part, without prior written permission of the University of Washington and Alex Taylor.

Alex Taylor and the University of Washington shall endeavor to correct program bugs, and to provide to Licensee advice and answers to inquiries made in the form of electronic mail to Alex Taylor at the following address: jataylor@u.washington.edu. Any such efforts, however, shall be on an "as available" basis.

PROHIBITED USES

You may not:

Distribute the Software to others outside of your Site.
Reverse-engineer, disassemble, decompile, or make any attempt to discover the source code to the Software.
Remove, obscure, or alter any copyright notice or other proprietary rights related to the Software or Package including manuals.
Sub-license, sell, lend, rent, or lease any portions of the Software.
Copy any portion of the Software, except as described above under the Grant.
Transfer the Package or any portion of the Package to any person or entity in violation of the United States Export Administration Act.

The Software involves valuable proprietary rights of the Alex Taylor, the University of Washington and others. Alex Taylor and the University of Washington retain title to and ownership of the Software and all copyright, trade secret, trade name, trademark, and other property rights related to the Software, regardless of the form that the original or other copies exist in. You may not violate these rights, and you must take appropriate steps to protect these rights. Alex Taylor and the University of Washington may at any time replace, modify, alter, improve, enhance, or change the Software.

Both the license and your right to use the software terminate automatically if you violate any part of this agreement. In the event of termination, you must immediately destroy all copies of the Software.

REPRESENTATIONS, WARRANTIES, and RISK

The University of Washington represents and warrants that it has the right to grant the licenses as set forth in this agreement.

The software is experimental in nature and is supplied "AS IS", without obligation by the University of Washington to provide accompanying services or support except as specified in this agreement. The entire risk as to the quality and performance of the Software is with Licensee. THE UNIVERSITY OF WASHINGTON EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES REGARDING THE SOFTWARE, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES PERTAINING TO MERCHANTABILITY OR FITNESS FOR A PARTICLAR PURPOSE.

Licensee shall indemnify, hold harmless and defend the University of Washington, its officers, Software developers, employees, students, and agents, against any and all claims, suits, losses, damages, costs, fees, and expenses resulting from or arising out of exercise of rights under this agreement, including but not limited to any damage, losses, or liabilities whatsoever with respect to death or injury to any person and damage to any property arising from the possession, use, or operation of the Software by Licensee or any customers, users, or others affected by the Software through the actions of Licensee. This indemnification clause shall survive the termination of this agreement.