identification¶
This module is the “real” entry point of the program. It starts the spectra loading process, creates the database, starts any processes, and starts the alignment process for ever spectrum. While there are only a few functions in this module, each are large and very imporatant.
- src.identification.id_spectrum(spectrum: src.objects.Spectrum, db: src.objects.Database, b_hits: dict, y_hits: dict, ppm_tolerance: int, precursor_tolerance: int, n: int, digest_type: str = '', truth: Optional[dict] = None, fall_off: Optional[dict] = None, is_last: bool = False) → src.objects.Alignments¶
Given the spectrum and initial hits, start the alignment process for the input spectrum
- Parameters
spectrum (Spectrum) – observed spectrum in question
db (Database) – Holds all the source sequences
b_hits (list) – all k-mers found from the b-ion search
y_hits (list) – all k-mers found from the y-ion search
ppm_tolerance (int) – the parts per million error allowed when trying to match masses
precursor_tolerance – the parts per million error allowed when trying to match precursor masses
n (int) – the number of alignments to save
digest_type (str) – the digest performed on the sample (default is ‘’)
truth (dict) – a set of id keyed spectra with the desired spectra. A better description of what this looks like can be seen in the param.py file. If left None, the program will continue normally (default is None)
fall_off (dict) – only works if the truth param is set to a dictionary. This is a dictionary (if using multiprocessing, needs to be process safe) where, if a sequence loses the desired sequence, a key value pair of spectrum id, DevFallOffEntry object are added to it. (default is None)
is_last (bool) – Only works if DEV is set to true in params. If set to true, timing evaluations are done. (default is False)
- Returns
Alignments for the spectrum. If no alignment can be created, and empty Alignments object is inserted
- Return type
Alignments
- src.identification.id_spectra(spectra_files: list, database_file: str, verbose: bool = True, min_peptide_len: int = 5, max_peptide_len: int = 20, peak_filter: int = 0, relative_abundance_filter: float = 0.0, ppm_tolerance: int = 20, precursor_tolerance: int = 10, digest: str = '', cores: int = 1, n: int = 5, DEBUG: bool = False, truth_set: str = '', output_dir: str = '') → dict¶
Load in all the spectra and try to create an alignment for every spectrum
- Parameters
spectra_files (list) – file names of input spectra
database_file (str) – file name of the fasta database
verbose (bool) – print progress to the console. (default is True)
min_peptide_len (int) – the minimum length alignment to create (default is 5)
max_peptide_len (int) – the maximum length alignment to create (default is 20)
peak_filter (int) – If set to a number, this metric is used over the relative abundance filter. The most abundanct X peaks to use in the alignment. (default is 0)
relative_abundance_filter (float) – If peak_filter is set, this parameter is ignored. The relative abundance threshold (in percent as a decimal) a peak must be of the total intensity to be used in the alignment. (default is 0.0)
ppm_tolerance (int) – the parts per million error allowed when trying to match masses (default is 20)
precursor_tolerance – the parts per million error allowed when trying to match a calculated precursor mass to the observed precursor mass (default is 10)
digest (str) – the type of digest used in the sample preparation. If left blank, a digest-free search is performed. (default is ‘’)
cores (int) – the number of cores allowed to use in running the program. If a number provided is greater than the number of cores available, the maximum number of cores is used. (default is 1)
n (int) – the number of aligments to keep per spectrum. (default is 5)
DEBUG (bool) – DEVELOPMENT USE ONLY. Used only for timing of modules. (default is False)
truth_set (str) – the path to a json file of the desired alignments to make for each spectrum. The format of the file is {spectrum_id: {‘sequence’: str, ‘hybrid’: bool, ‘parent’: str}}. If left an empty string, the program proceeds as normal. Otherwise results of the analysis will be saved in the file ‘fall_off.json’ saved in the output directory specified. (default is ‘’)
output_dir (str) – the full path to the output directory to save all output files. (default is ‘’)
- Returns
alignments for all spectra save in the form {spectrum.id: Alignments}
- Return type
dict
- src.identification.mp_id_spectrum(input_q: multiprocessing.context.BaseContext.Queue, db_copy: src.objects.Database, results: dict, fall_off: Optional[dict] = None, truth: Optional[dict] = None) → None¶
Multiprocessing function for to identify a spectrum. Each entry in the input_q must be a MPSpectrumID object
- Parameters
input_q (mp.Queue) – a queue to pull MPSpectrumID objects from for analysis
db_copy (Database) – a copy of the original database for alignments
results (dict) – a multiprocesses safe dictionary to save the alignments in
truth_set (dict) – dictionary containing all the desired alignments to make. The format of the file is {spectrum_id: {‘sequence’: str, ‘hybrid’: bool, ‘parent’: str}}. If left as None, the program will continue as normal (default is None)
fall_off (dict) – only used if the truth_set param is set to a valid json. Must be a multiprocess safe dictionary to store the fall off information to
- Returns
None
- Return type
None