scoring

The two modules initially were suppossed to be different, but since have become nearly one in the same. Compare masses is the real worker for scoring spectra, and scoring is an interface to that as well as other scoring metrics. Mass comparisons is still separate due to all of the experimentation tried to make the scoring as fast as possible. For both files, all older attempts have just been commented out for reference so that we don’t try the same things over again.

mass_comparisons

src.scoring.mass_comparisons.optimized_compare_masses(observed: list, reference: list, ppm_tolerance: int = 20, needs_sorted: bool = False)float

Score two spectra against eachother. Simple additive scoring of ions found

Parameters
  • observed (list) – observed set of m/z values

  • reference (list) – reference set of m/z values

  • ppm_tolerance (int) – parts per million mass error allowed when matching masses. (default is 20)

  • needs_sorted (bool) – Set to true if either the observed or reference need to be sorted. (default is False)

Returns

the number of matched ions

Return type

int

Example

>>> optimized_compare_masses([1, 2, 4], [1, 3, 4], 1, False)
>>> 2

scoring

src.scoring.scoring.score_sequence(observed: list, theoretical: list, ppm_tolerance: int = 20, needs_sorted: bool = False)float

Score a mass spectrum to a substring of tagged amino acids

Parameters
  • observed (list) – observed set of m/z values

  • reference (list) – reference set of m/z values

  • ppm_tolerance (int) – parts per million mass error allowed when matching masses. (default is 20)

  • needs_sorted (bool) – Set to true if either the observed or reference need to be sorted. (default is False)

Returns

the number of matched ions

Return type

int

Example

>>> score_sequence([1, 2, 4], [1, 3, 4], 1, False)
>>> 2

src.scoring.scoring.hybrid_score(observed: src.objects.Spectrum, hybrid_seq: str, ppm_tolerance: int, lesser_point: float = 0.5, greater_point: float = 1.0)float

A score for hybrid sequences. b ions found to the left of the hybrid junction and y ions found to the right of the hybrid junctions will be rewarded a point of value lesser_point. b ions found to the right of the hybrid junction and y ions found to the left of hybrid junction will be awarded a point of value greater_point.

Parameters
  • observed (Spectrum) – observed spectrum

  • hybrid_seq (str) – hybrid string sequence

  • ppm_tolerance (int) – mass error allowed in parts per million when matching masses

  • lesser_point (float) – point awarded to ions found on their respective side of the hybrid junction. (default is .5)

  • greater_point (float) – point awarded to ions found on their non respective side of the hybrid junction. (default is 1.0)

Returns

the score

Return type

float

Example

>>> hybrid_seq = 'ABC-DEF'
>>> lesser_point = .5
>>> greater_point = 1.0
>>> # say our b ions found are A, C, E
>>> # and y ions found are D, A
>>> # our scoring then works like
>>> # .5(bA) + .5(bC) + 1(bE) + .5 (yD) + 1(yA) 
>>> hybrid_score(spectrum, hybrid_seq, 20, lesser_point, greater_point)
>>> 3.5

src.scoring.scoring.precursor_distance(observed_precursor: float, reference_precursor: float)float

The absolute distance between the observed precursor and reference precursor

Parameters
  • observed_precursor (float) – the observed precursor mass

  • reference_precursor (float) – the precursor mass of the reference sequence

Returns

the absolute value of the difference between the two

Return type

float


src.scoring.scoring.total_mass_error(observed: src.objects.Spectrum, alignment: str, tolerance: int)float

The sum of all of the mass errors for every matched mass between the observed and the alignment.

Parameters
  • observed (Spectrum) – observed spectrum

  • alignment (str) – the string alignment

  • tolerance (int) – parts per million tolerance allowed when matching masses

Returns

sum of the absolute values of all mass errors

Return type

float


src.scoring.scoring.digest_score(sequence: str, db: src.objects.Database, digest_type: str)int

The additional points sequence gets if it follows the digest rules of the specified digest type

Parameters
  • sequence (str) – hybrid or non hybrid sequence to analyze

  • db (Database) – source proteins

  • digest_type (str) – what kind of digest was performed

Returns

additional points the sequence gets by following the digest

Return type

int