User Guide for
Sherpa
"Your Guide to the Peaks"

Version 3.3.1
Documentation updated February 14, 2000

A Macintosh-based expert system for LC/MS and
MS/MS analysis of protein digests




Sherpa Homepage: http://www.hairyfatguy.com/Sherpa/


Copyright © 1994-2000 J. Alex Taylor and the University of Washington
All Rights Reserved Worldwide.


Table of Contents

1.0 Welcome to Sherpa
1.1 Statement of Program Philosophy
1.2 System Requirements
1.3 Quick Start
1.3.1 Opening a Protein Sequence
1.3.2 The Sequence Window
1.3.3 The Digest Window
1.3.4 Opening an LC/MS File
1.3.5 Running a Primary Search
1.3.6 Using PepID to Interpret an MS/MS Spectrum

2.0 Handling Protein Sequences
2.1 Opening a Protein Sequence
2.2 Saving a Sequence
2.3 The Sequence Window
2.4 Defining Custom Amino Acids and Termini
2.5 The Digest Window
2.6 Defining Crosslinks

3.0 LC/MS Interpretation
3.1 Opening an LC/MS File
3.2 The LC/MS Import Dialog
3.3 Running a Primary Search
3.3.1 MW Grouper
3.3.2 Peptide Match
3.3.3 Unidentified Ions
3.3.4 Comparing Data Files
3.4 Running a Secondary Search
3.4.1 Glycosylation
3.4.2 Phosphorylation
3.5 Openended Search
3.5.1 Potential Modifications

4.0 MS/MS Interpretation
4.1 Opening an MS/MS File
4.2 The MS/MS Import Dialog
4.3 Using PepID to Interpret an MS/MS Spectrum

5.0 Miscellaneous
5.1 Customizing Settings
5.2 Window Behavior
5.3 References
5.4 Legalities
5.4.1 Contact Information
5.4.2 Software License Agreement

6.0 Version History



1.0 Welcome to Sherpa

1.1 Statement of Program Philosophy

Sherpa is designed to be a robust, easy to use aide-de-camp in LC/MS and MS/MS interpretation. By automating simple but tediously repetitive calculations it allows the user to quickly determine the obvious and spend time exploring interpretations for the not so obvious. It is not, however, a black box that takes in raw data at one end and returns a full interpretation at the other end. At least a basic understanding of the principles involved in mass spectral interpretation as well as data acquisition is necessary to obtain useful information from its searches.

In an ideal world users would not need to use manuals in order to understand how to run programs. To that end I have tried to keep simplicity and user-friendliness foremost in the design of Sherpa. Interpretation with Sherpa is set up to be a dynamic process in which the user can play with a search's settings, run the search, change the setting, run the search again, etc. as opposed to a linear, one shot interpretation. This means that the user should feel free to experiment with optimizing the various setting without fear of not being able to easily get back to where they had started from.

This user guide primarily contains information about what the various user interface items of the program do. See Taylor et al. (1996), for a more applied discussion of the program.

Return to Table of Contents

1.2 System Requirements

To use Sherpa you must have:

LC/MS files must be in either Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format.

MS/MS files in any of the above formats can be read natively while MS/MS files in any format not currently supported natively can be imported by saving the data in a text only format where each line of the file is an m/z value followed by a tab and then its corresponding intensity.

Return to Table of Contents

1.3 Quick Start

Return to Table of Contents






2.0 Handling Protein Sequences

2.1 Opening a Protein Sequence

Sequences can be opened in one of three ways:

  1. Selecting New Sequence from the File menu and typing or pasting the sequence into the text editor at the bottom of the new sequence window.

  2. Selecting Load Sequence... from the File menu to open an existing sequence file previously created by Sherpa, a MacBioSpec file, or any file saved as text only.

  3. Double clicking on a previously created Sherpa sequence file or opening a previously created Sherpa sequence file from the Finder by using the Open command in the Finder's File menu.

Any typed or opened sequence is stripped of any non-alphabetic characters and then capitalized before being displayed in the sequence window. The only limit to the number of sequences that can be open at one time is the available memory.

Return to Table of Contents


2.2 Saving a Sequence

Selecting Save As... from the File menu will display a standard save dialog with the sequence name as the default file name. Selecting Save (command-S) from the File menu will save the sequence and its information back to the sequence file if one already exists and can be located. If Save is selected and a sequence file does not already exist or cannot be located then Sherpa will display a standard save dialog as though Save As... had been selected.

When Sherpa saves a sequence file it places the sequence in the data fork of the file and information about it's termini and any custom amino acids used in the resource fork of the file. Hence, to any other program the sequence file will appear to be a normal text file containing just the sequence. But when a Sherpa-created sequence file is opened again in Sherpa, it will retrieve and reset the custom information saved with the sequence.

When saving a sequence that has been crosslinked to another sequence Sherpa gives the user the option to save the sequence as a regular sequence or as part of a linked sequence file which contains all the involved sequences and crosslinks. The "Insulin [human]" file in the sample sequences folder is an example of a linked sequence file.

Return to Table of Contents


2.3 The Sequence Window

Once a sequence is loaded, it appears in a sequence window with the sequence file's name as its title. Each sequence loaded is displayed in its own sequence window.

The amino acid sequence is shown in a text editor and can be edited directly by the user. When changes are made to the sequence, Sherpa instantly updates both the values displayed in the sequence window and the fragments displayed in the digest window, if it has been opened. The location of the cursor selection in the sequence is shown in the small box directly above the sequence editor along with the mass of the region currently selected.


Note:
The mass displayed in the selection box is not a residue mass but a peptide mass containing an N and C-terminal group (H and OH respectively unless the selection includes the protein's N or C-terminus in which case the user specified terminal group is included in the mass).

The N-terminal and C-terminal groups of the protein can be set using the N-terminal Group... and C-terminal Group... buttons in the sequence window or by using the same commands in the Options menu. The amino acid set can be defined by using the Define Residues... button or its corresponding command in the Options menu. (See the Defining Custom Amino Acids and Termini topic for more information.) Any non-standard amino acid assignments are displayed to the right of the Define Residues... button. To quickly replace one residue with another in the sequence use the Search For... command in the Edit menu.

An estimated pI value and molar extinction coefficient at 280nm can be calculated and displayed in the sequence window by selecting More Options... from the Options menu. The isoelectric point is estimated by using pKa's from Bull (1971), in the Henderson-Hasselbach equation to find the pH where the net charge on the protein is less than 0.01. The extinction coefficient is estimated using the coefficients derived by Mach et al. (1992).

The Digest button performs a theoretical digest on the sequence and displays the fragments in a digest window. See the Digest Window topic below for more information.

To the right of the sequence display is a box with the sequence's (or selection's) amino acid composition. The display of the amino acid composition can be turned on and off by by selecting More Options... from the Options menu. There are also additional buttons along the right side of the window. The CID button displays the predicted MS/MS fragments for the highlighted section of sequence in a new window. The Copy to Report button prints the sequence and information about the sequence in the Search Results window. The Set Seq Name button allows the user to change the name of the sequence. The Seq Notes button will display any notes which have been associated with the sequence. The Text to Speech button allows the user to play back the displayed sequence (or selection). The pause length after every third residue can be adjusted via More Options... from the Options menu. This can be useful in proofreading sequences entered by hand. Note that this button is only visible on systems which have the speech manager installed.

The elemental composition of the sequence (or selection) can be displayed below the sequence by selecting More Options... from the Options menu. If any crosslinks have been created, a toggle switch also appears below the sequence which allows the user to include crosslinks in the calculation of mass and elemental composition. See the Defining Crosslinks topic below for more information.

To dispose of a sequence window (and any associated digest or notes windows) use the Close Sequence submenu in the File menu. Clicking on the Go Away Box makes the sequence window disappear but does NOT dispose of it. (This is convenient for keeping down window clutter.) See the Window Behavior topic for more information.

Return to Table of Contents


2.4 Defining Custom Amino Acids and Termini

To define a custom amino acid for a sequence, bring its sequence or digest window to the front and select Define Residues... from the Options menu. Alternatively, use the Define Residues... button in the sequence window. A scrolling list will appear with the 20 common amino acids and 6 undefined amino acids (B, J, O, U, X, and Z).

To change an amino acid assignment, double click on the line in the list. A dialog containing a list of custom amino acids to choose from is then displayed. Buttons in the custom amino acid dialog allow you to create, edit, or delete custom amino acids from the list.

A custom amino acid is selected by double clicking on it or by highlighting it and choosing OK. Termini are selected, created, or deleted in a similar way.

The libraries of custom amino acids and termini are stored in the Sherpa Preferences file located in the Preferences folder in the System folder. Each time that Sherpa is started, it opens the preferences file and loads in this custom information along with the user settings. If the preferences file cannot be found, the program creates a new file with default settings and modification libraries.

Termini and custom residue information for a sequence are stored with the sequence files. If the program opens a Sherpa-created sequnce file and finds a custom residue or termini that is not already in its custom library, it will automaticly add the new modifications to the library in the preference file.

Return to Table of Contents


2.5 The Digest Window

When a sequence in a sequence window is digested, the theoretical fragments are displayed in a digest window. Any changes made in a sequence window are automatically reflected in the corresponding digest window.

The type of enzymatic cleavage can be set by selecting the Digest Method... button. The cleavage agent(s) currently in use are displayed to the right of this button. The display format of the fragments can be either "MW", which gives the monoisotopic and average masses for each fragment, or "Charge-state" which displays the m/z for 1 to 6 charges for each peptide. The number of partial sites allowed, which is the maximum number of consensus cleavage sites tolerated inside a peptide, is set using a popup menu. A popup menu is also used to select whether the fragments are sorted by sequence position, by mass, or by HPLC index (see below).

At the far right is a Display Options button which dislays the digest display options dialog. In the digest display options dialog you can toggle between the first column displaying the fragment number or the HPLC index for the fragment. There is an option that allows for the display of the full sequence of the predicted fragments instead of abbreviating fragments that are too long. The user can also choose the number of charge-states and decimal places to display and whether they are calculated as positive or negative ions.

The HPLC index uses the amino acid retention coefficients derived by Browne et al. (1982), to predict the percent acetonitrile at which the peptide should elute from a C18 reversed-phase column with 0.1% TFA as the ion-pairing agent. Some peptides, especially longer ones, may have values outside of the range of 0 - 100%; but this may still be informative as a relative measure of a peptide's hydrophobic character.


User Tip:
Printing out the charge-state "scorecard" of the peptides sorted by mass is often useful for keeping track of observed and identified ions.

Return to Table of Contents


2.6 Defining Crosslinks

Crosslinks are created within the crosslink dialog which is accessed by selecting Crosslinks... from the Options menu. Crosslink types can be created or edited to suit the users needs. Note that editing of sequences into which crosslinks have been introduced can cause crosslinks to become invalid, which will delete them. Crosslinks are not considered by the secondary searches or by the openended search.

Return to Table of Contents







3.0 LC/MS Interpretation

3.1 Opening an LC/MS File

An LC/MS file in either Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format can be opened by selecting Load LC/MS File 1... from the File menu. Finnigan (".dat") files must be FTP'd to the Macintosh in binary mode and given a file type of '????' or 'BINA'. (The default binary file type can be set in the preferences of either Telnet or Fetch.) If an LC/MS file is already loaded, an additional LC/MS file can be loaded by selecting Load LC/MS File 2... from the File menu or the new LC/MS file can replace the first LC/MS file by selecting Load LC/MS File 1... again. Reloading an LC/MS file into either of the file 1 or file 2 slots will simply replace the LC/MS file currently loaded in that slot.

Return to Table of Contents


3.2 The LC/MS Import Dialog

Once an LC/MS file has been selected, a dialog will appear in which the import parameters are set.

The top portion of the LC/MS import dialog contains information about how the LC/MS file was acquired: the number of scans acquired, whether it was positive or negative ion data, whether the data is centroided or profile, the exact step size used in acquisition (including any mass defect), and the m/z range that was scanned.


Note:
There is a user override mode to allow the toggling of positive ion/ negative ion and profile/centroid. Hold down the option key and click on the line of text to toggle it. This capability is provided in the event that the file type is somehow misinterpreted in the initial analysis.

The bottom portion of the dialog contains the import settings which are initially set to the default values. (For information on changing the default import settings see the Customizing Settings topic.) The first settings box determines which portion of the LC/MS file will be imported for analysis. It will always default to importing the entire scan range and mass range of the file.

The second settings box contains the settings for processing each scan of the LC/MS file to find its peaks - the intensity threshold, peaktop minimum and minimum m/z peak width. These are the most critical import parameters and also the parameters that vary the most from LC/MS file to LC/MS file. The relationship of these parameters is illustrated in figure below. If the data is centroided the minimum m/z peak width parameter is ignored. Also, each scan can now, optionally, be smoothed before processing.


The third settings box contains chromatographic parameters. The minimum number of scans for a "chromatographic peak" determines how many consecutive scans an ion must appear in to be considered a chromatographic peak. The chromatographic hole tolerance is the number of consecutive scans without signal that are tolerated within a chromatographic peak.


Note:
In the case of Sciex API 100/300 LC/MS flles which contain mulltiple experiments in a single period, when importing one of these experiments for processing , the chromatographic hole tolerance must be set at least to the number of scans per cycle - 1 or no peaks will be found.

Once the OK button in the LC/MS Import Dialog is pressed, a small progress dialog will appear while the LC/MS data is being imported and processed. The peaks found in each scan are compared to a list of ongoing chromatographic peaks. If an m/z matches that of an ongoing peak it is added to it. If an m/z does not match that of an ongoing peak it becomes a new peak. If a chromatographic peak does not extend for the minimum number of scans, as set in the import dialog, it is removed from the list. The importing process can be aborted by pressing command-period; any data processed up to that point will be saved.

When Sherpa is finished processing the LC/MS data it will display the list of chromatographic peaks, sorted by mass, in an LC/MS Peaks window and give the total number of peaks at the bottom. This window is for display purposes only; the data cannot be edited and fed back into the program. Reloading an LC/MS file into either of the file 1 or file 2 slots will simply replace the LC/MS file currently loaded in that slot and replace the data in its LC/MS Peaks window.

It cannot be stressed enough that using proper thresholds and import ranges when importing LC/MS data is critical to all subsequent analyses. If the progress of the processing appears to be going very slowly or you get an exorbatent number of peaks (greater that a thousand or so), try reimporting the data with higher thresholds or with cropping of the import range to exclude as much salt, junk, noise, etc. as possible. It is often useful to examine the data imported under several different conditions.

Return to Table of Contents


3.3 Running a Primary Search

Once an LC/MS file has been loaded, the Primary Search option in the Searches menu becomes active. The Primary Search consists of four search options: a MW grouping search which locates potential charge state families, a peptide matching search which compares one or more theoretically digested sequences with the LC/MS data, an unidentified ions option to report prominent peaks that were not MW grouped or matched to a peptide, and a data file comparison option which correlates search results by whether they appear in one or both of the loaded LC/MS files (if two LC/MS files are loaded). The results of a Primary Search are displayed in the Search Results Window.

The Primary Search Settings Dialog can be accessed by the Primary Search Settings... option in the Searches menu. The search settings can also be accessed individually in the Options menu.


Return to Table of Contents


3.4 Running a Secondary Search

The secondary searches are searches to help identify modified peptides in the digest. At present the secondary searches available are glycosylation and phosphorylation. Once an LC/MS file has been loaded, the Secondary Search option in the Searches menu becomes active. The results of a Secondary Search are displayed in the Search Results Window.


Return to Table of Contents



3.5 Open-ended Search

The purpose of the open-ended search is to search a sequence or sequences for a given MW, or m/z regardless of consensus proteolytic sites. Hence it is useful for detecting spurious proteolytic cleavages and identifying MW groups that were not matched to the theoretical digest fragments by peptide matching. The open-ended search window can be accessed from the Searches menu. FASTA formatted protein databases (such as the OWL non-redundant protein database - downloadable from the NCBI at ftp://ncbi.nlm.nih.gov/repository/OWL/) can be used to perform a mass fingerprint search on a list of molecular weights input into the small text box on the right of the window. The database search algorithm is still being optimized and hence is rather slow unless the two termini option is utilized.

The user can specify whether the mass to be searched for is a MW or an m/z. Alternatively, if a MW grouper search has been run the groups that did not match to the digest can be automatically run. One or all sequences can be chosen to be searched. Termini constraints, the number (0, 1, or 2) of the termini that must match predicted proteolytic sites, and the error tolerance to be used in matching can also be specified. The Mod. Options... button gives a dialog that allows for the consideration of post-translational and user-defined modifications (see the Potential Modifications topic which follows.)

Return to Table of Contents







4.0 MS/MS Interpretation

4.1 Opening an MS/MS File

MS/MS files in Sciex APIIII , Sciex API 100/300, or Finnigan SSQ/TSQ format formats can be read natively. Finnigan (".dat") files must be FTP'd to the Macintosh in binary mode and given a file type of '????' or 'BINA'. (The default binary file type can be set in the preferences of either Telnet or Fetch.) MS/MS files in any format not currently supported natively can be imported by saving the raw data in a text only format where each line of the file is an m/z value followed by a tab and then its corresponding intensity. An MS/MS file is imported by selecting Load MS/MS File... from the File menu.

Return to Table of Contents


4.2 The MS/MS Import Dialog

Once an MS/MS file has been selected, a dialog will appear in which the import parameters are set. The MS/MS Import Dialog is very similar to the LC/MS Import Dialog. The top portion of the MS/MS import dialog contains information about how the MS/MS file was acquired: the m/z of the parent ion, whether it was positive or negative ion data, whether the data is centroided or profile, the exact step size used in acquisition (including any mass defect), and the m/z range that was scanned.


Note:
There is a user override mode to allow the toggling of positive ion/ negative ion and profile/centroid. Hold down the option key and click on the line of text to toggle it. The m/z of the parent ion can similarly be modified. This capability is provided in the event that the file type is somehow misinterpreted in the initial analysis.

The bottom portion of the dialog contains the import settings which are initially set to the default values. The first settings box determines which portion of the MS/MS file will be imported for analysis. It will always default to importing the entire mass range of the file.

The second settings box contains the settings for processing the scan to find its peaks - the intensity threshold, peaktop minimum and minimum m/z peak width. (See LC/MS Import Dialog for a discussion of these parameters.)

The third settings box contains a popup menu to set the number of times the scan should be smoothed before processing. The smoothing algorithm is identical to that used in MacSpec. This box also contains an option to limit the final number of imported ions to the X most intense ions, where X can be defined by the user.

Once the OK button in the MS/MS Import Dialog is pressed, the scan is processed and the MS/MS PepID Window will appear with the processed peaks listed in the small text editor on the left side of the window. If an MS/MS file is loaded after a previous MS/MS file had already been loaded its peak data will replace the previous peak data but the results of any PepID searches displayed in the main text editor at the bottom of the MS/MS PepID Window will remain.

Return to Table of Contents


4.3 Using PepID to Interpret an MS/MS Spectrum

PepID, an algorithm developed by Richard Johnson at the University of Washington can be used to correlate an MS/MS spectrum with a protein sequence. The selected sequence(s) or a selected FASTA formatted protein database (such as the OWL non-redundant protein database - downloadable from the NCBI at ftp://ncbi.nlm.nih.gov/repository/OWL/) are searched to find all possible peptides with a mass within tolerance of the MW of the parent ion. If a database is chosen to be searched, the database's amino acid set and the digest method can be set via the Database Settings... button. Selected posttranslational modificatons or single amino acid substitutions can also be considered when identifying peptides with the correct mass. (This initial portion is identical to an open-ended search, which was described in the preceeding section.) The database search algorithm is still being optimized and hence is rather slow unless the two termini option is utilized.

The candidate sequences are then scored in two ways: The first score is the percentage of the total ion current from the MS/MS ion list the candidate sequence can account for by conventional low energy fragmentation (listed in the results in the "%" column). The second score, ISCOR, is a more sophisticated intensity-based scoring system which sums a fraction of each matched ions intensity based on the likelihood of the ion type and charge-state. Bonuses and penalties are then added to this based on the continuity of b and y ion series, the percentage of unmatched b and y ions, the presence or absence of particular immonium ions which should be present if their corresponding amino acids are present in the candidate peptide, etc. The ISCOR values are normalized to 1.0 and the candidate peptides of each charge-state are sorted based on this value.

In order to run a PepID search an MS/MS file must be loaded and one or more sequences must also be loaded. The following is a brief description of the PepID search parameters in the MS/MS PepID Window.

Possible charges of parent:
The charge state range to consider in calculating the parent ion's MW.
Sequences to Search:
By selecting the radio button to the left of the menu of sequences, only the sequence selected in that menu will be searched. Alternatively, all loaded sequences can be searched by selecting the All Sequences radio button.
Mod. Options:
This button leads to a dialog box in which specified, potential, posttranslational modifications can be taken into account when calculating peptide masses to compare with the parent ion's MW; and if a modification you wish to consider is not among the preset possibities, you can now define your own modifcations. (See the Potential Modifications topic for more information.)
Give Details:
This button gives specific information about a particular PepID match whose results line is triple click selected.
Termini Constraints:
The number of a peptide's termini that must match consensus cleavage sites for the method in which it was digested.
Error Tolerance:
The tolerance in MW used when comparing charge states of the parent ion to potential peptides from the sequence.
Daughter Ion Tolerance:
The tolerance in m/z used when matching predicted fragmentation of candidate peptides to observed MS/MS peaks.
Mass Offset:
The offset to add to the observed MS/MS peak's m/z when matching it with the predicted fragmentation of candidate peptides.
Max Daughter Ion Charge:
The maximum charge state to consider for daughter ions.


Note:
When getting details for an old result the CURRENT analysis parameters are used.


User Tip:
When doing PepID analysis, it is often helpful to view the MS/MS PepID Window zoomed to full screen so that more of the search results text is visible without scrolling.

Return to Table of Contents







5.0 Miscellaneous

5.1 Customizing Settings

To globally save the current settings as Sherpa's default settings for the next time you run the program simply select Save Preferences... from the Options menu and the current settings (program wide) will be saved as the default values. Preferences can also be set or reset to the program defaults on a more local basis. Windows and dialogs that have settings that can be saved to the preference file have two buttons: a Standard Settings button which will reset the values associated with that window (and ONLY the values associated with that window) back to the program's default values; and a Save Settings button which saves the current settings for the window as the default settings. Note that selecting only Standard Settings does not save the standard settings as the default settings.

The custom amino acids, termini, and modifications that you create in Sherpa are stored in its preferences file. If you install Sherpa on a new computer and wish to retain the custom AA's / termini that you have entered using Sherpa on another computer you can simply copy the Sherpa preferences from the Preferences folder in the System folder and place it in the Preferences folder of the new computer.

Return to Table of Contents


5.2 Window Behavior

Clicking on the Go Away Box in the upper left hand corner of a window will hide the window but will not destroy the window or its contents. Any existing window can be brought to the front by selecting its name from the list of windows at the bottom of the Window menu. The active/ frontmost window is denoted with a check in this list; and any hidden windows are denoted with a dash.

Some windows contain a shredder button above their text editor. These are temporary windows. They can be hidden like other windows by using the Go Away Box or they can be permanently disposed of by clicking the shredder button. Predicted CID fragment windows and sequence notes windows are examples of temporary windows.

Return to Table of Contents


5.3 References

Return to Table of Contents


5.4 Legalities

This guide and the software described in it are copyrighted with all rights reserved. Neither the guide nor the software may be copied in whole or part without the written consent of the author, except as described in the license agreement provided with this product.