identification

This module is the “real” entry point of the program. It starts the spectra loading process, creates the database, starts any processes, and starts the alignment process for ever spectrum. While there are only a few functions in this module, each are large and very imporatant.

src.identification.id_spectrum(spectrum: src.objects.Spectrum, db: src.objects.Database, b_hits: dict, y_hits: dict, ppm_tolerance: int, precursor_tolerance: int, n: int, digest_type: str = '', truth: Optional[dict] = None, fall_off: Optional[dict] = None, is_last: bool = False)src.objects.Alignments

Given the spectrum and initial hits, start the alignment process for the input spectrum

Parameters
  • spectrum (Spectrum) – observed spectrum in question

  • db (Database) – Holds all the source sequences

  • b_hits (list) – all k-mers found from the b-ion search

  • y_hits (list) – all k-mers found from the y-ion search

  • ppm_tolerance (int) – the parts per million error allowed when trying to match masses

  • precursor_tolerance – the parts per million error allowed when trying to match precursor masses

  • n (int) – the number of alignments to save

  • digest_type (str) – the digest performed on the sample (default is ‘’)

  • truth (dict) – a set of id keyed spectra with the desired spectra. A better description of what this looks like can be seen in the param.py file. If left None, the program will continue normally (default is None)

  • fall_off (dict) – only works if the truth param is set to a dictionary. This is a dictionary (if using multiprocessing, needs to be process safe) where, if a sequence loses the desired sequence, a key value pair of spectrum id, DevFallOffEntry object are added to it. (default is None)

  • is_last (bool) – Only works if DEV is set to true in params. If set to true, timing evaluations are done. (default is False)

Returns

Alignments for the spectrum. If no alignment can be created, and empty Alignments object is inserted

Return type

Alignments


src.identification.id_spectra(spectra_files: list, database_file: str, verbose: bool = True, min_peptide_len: int = 5, max_peptide_len: int = 20, peak_filter: int = 0, relative_abundance_filter: float = 0.0, ppm_tolerance: int = 20, precursor_tolerance: int = 10, digest: str = '', cores: int = 1, n: int = 5, DEBUG: bool = False, truth_set: str = '', output_dir: str = '')dict

Load in all the spectra and try to create an alignment for every spectrum

Parameters
  • spectra_files (list) – file names of input spectra

  • database_file (str) – file name of the fasta database

  • verbose (bool) – print progress to the console. (default is True)

  • min_peptide_len (int) – the minimum length alignment to create (default is 5)

  • max_peptide_len (int) – the maximum length alignment to create (default is 20)

  • peak_filter (int) – If set to a number, this metric is used over the relative abundance filter. The most abundanct X peaks to use in the alignment. (default is 0)

  • relative_abundance_filter (float) – If peak_filter is set, this parameter is ignored. The relative abundance threshold (in percent as a decimal) a peak must be of the total intensity to be used in the alignment. (default is 0.0)

  • ppm_tolerance (int) – the parts per million error allowed when trying to match masses (default is 20)

  • precursor_tolerance – the parts per million error allowed when trying to match a calculated precursor mass to the observed precursor mass (default is 10)

  • digest (str) – the type of digest used in the sample preparation. If left blank, a digest-free search is performed. (default is ‘’)

  • cores (int) – the number of cores allowed to use in running the program. If a number provided is greater than the number of cores available, the maximum number of cores is used. (default is 1)

  • n (int) – the number of aligments to keep per spectrum. (default is 5)

  • DEBUG (bool) – DEVELOPMENT USE ONLY. Used only for timing of modules. (default is False)

  • truth_set (str) – the path to a json file of the desired alignments to make for each spectrum. The format of the file is {spectrum_id: {‘sequence’: str, ‘hybrid’: bool, ‘parent’: str}}. If left an empty string, the program proceeds as normal. Otherwise results of the analysis will be saved in the file ‘fall_off.json’ saved in the output directory specified. (default is ‘’)

  • output_dir (str) – the full path to the output directory to save all output files. (default is ‘’)

Returns

alignments for all spectra save in the form {spectrum.id: Alignments}

Return type

dict


src.identification.mp_id_spectrum(input_q: multiprocessing.context.BaseContext.Queue, db_copy: src.objects.Database, results: dict, fall_off: Optional[dict] = None, truth: Optional[dict] = None)None

Multiprocessing function for to identify a spectrum. Each entry in the input_q must be a MPSpectrumID object

Parameters
  • input_q (mp.Queue) – a queue to pull MPSpectrumID objects from for analysis

  • db_copy (Database) – a copy of the original database for alignments

  • results (dict) – a multiprocesses safe dictionary to save the alignments in

  • truth_set (dict) – dictionary containing all the desired alignments to make. The format of the file is {spectrum_id: {‘sequence’: str, ‘hybrid’: bool, ‘parent’: str}}. If left as None, the program will continue as normal (default is None)

  • fall_off (dict) – only used if the truth_set param is set to a valid json. Must be a multiprocess safe dictionary to store the fall off information to

Returns

None

Return type

None