scoring¶
The two modules initially were suppossed to be different, but since have become nearly one in the same. Compare masses is the real worker for scoring spectra, and scoring is an interface to that as well as other scoring metrics. Mass comparisons is still separate due to all of the experimentation tried to make the scoring as fast as possible. For both files, all older attempts have just been commented out for reference so that we don’t try the same things over again.
mass_comparisons¶
- src.scoring.mass_comparisons.optimized_compare_masses(observed: list, reference: list, ppm_tolerance: int = 20, needs_sorted: bool = False) → float¶
Score two spectra against eachother. Simple additive scoring of ions found
- Parameters
observed (list) – observed set of m/z values
reference (list) – reference set of m/z values
ppm_tolerance (int) – parts per million mass error allowed when matching masses. (default is 20)
needs_sorted (bool) – Set to true if either the observed or reference need to be sorted. (default is False)
- Returns
the number of matched ions
- Return type
int
- Example
>>> optimized_compare_masses([1, 2, 4], [1, 3, 4], 1, False) >>> 2
scoring¶
- src.scoring.scoring.score_sequence(observed: list, theoretical: list, ppm_tolerance: int = 20, needs_sorted: bool = False) → float¶
Score a mass spectrum to a substring of tagged amino acids
- Parameters
observed (list) – observed set of m/z values
reference (list) – reference set of m/z values
ppm_tolerance (int) – parts per million mass error allowed when matching masses. (default is 20)
needs_sorted (bool) – Set to true if either the observed or reference need to be sorted. (default is False)
- Returns
the number of matched ions
- Return type
int
- Example
>>> score_sequence([1, 2, 4], [1, 3, 4], 1, False) >>> 2
- src.scoring.scoring.hybrid_score(observed: src.objects.Spectrum, hybrid_seq: str, ppm_tolerance: int, lesser_point: float = 0.5, greater_point: float = 1.0) → float¶
A score for hybrid sequences. b ions found to the left of the hybrid junction and y ions found to the right of the hybrid junctions will be rewarded a point of value lesser_point. b ions found to the right of the hybrid junction and y ions found to the left of hybrid junction will be awarded a point of value greater_point.
- Parameters
observed (Spectrum) – observed spectrum
hybrid_seq (str) – hybrid string sequence
ppm_tolerance (int) – mass error allowed in parts per million when matching masses
lesser_point (float) – point awarded to ions found on their respective side of the hybrid junction. (default is .5)
greater_point (float) – point awarded to ions found on their non respective side of the hybrid junction. (default is 1.0)
- Returns
the score
- Return type
float
- Example
>>> hybrid_seq = 'ABC-DEF' >>> lesser_point = .5 >>> greater_point = 1.0 >>> # say our b ions found are A, C, E >>> # and y ions found are D, A >>> # our scoring then works like >>> # .5(bA) + .5(bC) + 1(bE) + .5 (yD) + 1(yA) >>> hybrid_score(spectrum, hybrid_seq, 20, lesser_point, greater_point) >>> 3.5
- src.scoring.scoring.precursor_distance(observed_precursor: float, reference_precursor: float) → float¶
The absolute distance between the observed precursor and reference precursor
- Parameters
observed_precursor (float) – the observed precursor mass
reference_precursor (float) – the precursor mass of the reference sequence
- Returns
the absolute value of the difference between the two
- Return type
float
- src.scoring.scoring.total_mass_error(observed: src.objects.Spectrum, alignment: str, tolerance: int) → float¶
The sum of all of the mass errors for every matched mass between the observed and the alignment.
- Parameters
observed (Spectrum) – observed spectrum
alignment (str) – the string alignment
tolerance (int) – parts per million tolerance allowed when matching masses
- Returns
sum of the absolute values of all mass errors
- Return type
float
- src.scoring.scoring.digest_score(sequence: str, db: src.objects.Database, digest_type: str) → int¶
The additional points sequence gets if it follows the digest rules of the specified digest type
- Parameters
sequence (str) – hybrid or non hybrid sequence to analyze
db (Database) – source proteins
digest_type (str) – what kind of digest was performed
- Returns
additional points the sequence gets by following the digest
- Return type
int