objects

hypedsearch uses Python namedtuples as objects throught the project. Their low memory usage and ease of use makes them the perfect tool for keeping things organized.

src.objects.Database(fasta_file, proteins, kmers)

Holds proteins, fasta file, protein tree, and kmer masses

Variables
  • fasta_file – The name of the input fasta file

  • proteins – A dictionary of proteins where keys are the entry name and the value is a list of DatabaseEntry objects

  • kmers – A dictionary mapping kmers to a list of source protein names


src.objects.DatabaseEntry(sequence, description)

Contains protein information

Variables
  • sequence – the full protein sequence

  • description – the name of the protein


src.objects.Spectrum(spectrum, abundance, total_intensity, ms_level, scan_number, precursor_mass, precursor_charge, file_name, id, other_metadata)

Holds information regarding an MS or MS/MS spectrum

Variables
  • spectrum – m/z float values of an MS run

  • abundance – floats describing the abundance of each peak value. Index i is the abundance of the m/z value at index i of the spectrum

  • ms_level – MS experiment level

  • scan_number – scan number of spectrum in the MS run

  • precursor_mass – precursor mass of the MS run (i.e. the mass of the whole sequence)

  • precursor_charge – charge of the precursor mass

  • file_name – name of the source file of the spectrum

  • other_metadata – other metadata associated with the spectrum not in the above


src.objects.SequenceAlignment(proteins, sequence, b_score, y_score, total_score, precursor_distance, total_mass_error)

Alignment information for a non-hybrid sequence alignment

Variables
  • proteins – proteins where the aligned sequence is found

  • sequence – the string of amino acids that were found as the alignment

  • b_score – b ion score of the sequence

  • y_score – y ion score of the sequence

  • total_score – the score given to the sequence

  • precursor_distance – the absolute value of the difference between the observed precursor mass and the calculated precursor mass of the aligned sequence

  • total_mass_error – the sum of the absolute values of the error between an aligned amino acid mass and the matched observed mass


src.objects.HybridSequenceAlignment(left_proteins, right_proteins, sequence, hybrid_sequence, b_score, y_score, total_score, precursor_distance, total_mass_error)

Alignment information for a non-hybrid sequence alignment

Variables
  • left_proteins – proteins that contain the sequence of amino acids that contribute to the left side of the hybrid peptide

  • right_proteins – proteins that contain the sequence of amino acids that contribute to the right side of the hybrid peptide

  • sequence – the string of amino acids that were found as the alignment

  • hybrid_sequence – the string of amino acids that were found as the alignment with special characters [(), -] where - denotes a hybrid sequence with no overlap (left-right) and () denotes a hybrid with an overlap (left(overlap)right)

  • b_score – b ion score of the sequence

  • y_score – y ion score of the sequence

  • total_score – the score given to the sequence

  • precursor_distance – the absolute value of the difference between the observed precursor mass and the calculated precursor mass of the aligned sequence

  • total_mass_error – the sum of the absolute values of the error between an aligned amino acid mass and the matched observed mass


src.objects.Alignments(spectrum, alignments)

Contains the spectrum with SequenceAlignments and HybridSequenceAlignments

Variables
  • spectrum – the observed spectrum

  • alignments – SequenceAlignment and HybridSequenceAlignment objects


src.objects.MPSpectrumID(b_hits, y_hits, spectrum, ppm_tolerance, precursor_tolerance, n, digest_type)

Holds information to pass to processes during multiprocessing (MP)

Variables
  • b_hits – k-mers found from the b ion search

  • y_hits – k-mers found from the y ion search

  • spectrum – observed spectrum

  • ppm_tolerance – parts per million error allowed when matching masses

  • precursor_tolerance – parts per million error allowed when matching precursor mass

  • n – the number of aligments to keep

  • digest_type – the digest performed on the sample


src.objects.DEVFallOffEntry(hybrid, truth_sequence, fall_off_operation, meta_data)

DEVELOPMENT USE ONLY

Holds data about when the components that make up the desired overlapping sequence falls off and can no longer make the correct alignment

Variables
  • hybrid – whether or not the desired alignment is a hybrid

  • truth_sequence – the desired string alignment

  • fall_off_operation – which operation the sequence was no longer attainable

  • meta_data – any extra information pertaining to the operation