utils

Utils module is used throughout the project for various things. If you don’t know where to put something, it probably belongs here.

src.utils.file_exists(file_name: str)bool

Determine if a file exists

Parameters

file_name (str) – Path to the file in question

Returns

True if the file exists

Return type

bool


src.utils.make_valid_dir_string(dir_path: str)str

Add os separator character to end of directory string to make valid directory path

Parameters

dir_path (str) – Name of directory to check

Returns

Corrected directory path

Return type

str


src.utils.make_dir(dir_path: str)bool

Check directory path for existing directory or make one with the name given. NOTE: this is not recursive, only 1 level of directory will be created

Parameters

dir_path (str) – Full path of directory to create

Returns

True if successful

Return type

bool


src.utils.make_valid_text_file(file_name: str)str

Ensure some string path has .txt appended to it for appropriate .txt extension

Parameters

file_name (str) – File name to validate for txt file type

Returns

Name with the .txt extension

Return type

str


src.utils.make_valid_json_file(file_name: str)str

Ensure some string path has .json appended to it for appropriate .json extension

Parameters

file_name (str) – File name to validate for json file type

Returns

Name with the .json extension

Return type

str


src.utils.make_valid_csv_file(file_name: str)str

Ensure some string path has .csv appended to it for appropriate .csv extension

Parameters

file_name (str) – File name to validate for csv file type

Returns

Name with the .csv extension

Return type

str


src.utils.make_valid_fasta_file(file_name: str)str

Ensure some string path has .fasta appended to it for appropriate .fasta extension

Parameters

file_name (str) – File name to validate for fasta file type

Returns

Name with the .fasta extension

Return type

str


src.utils.is_json(file: str)bool

Determine if a file is a json file

Parameters

file (str) – File name of file in question

Returns

True if is a json file

Return type

bool


src.utils.is_fasta(file: str)bool

Determine if a file is a fasta file

Parameters

file (str) – File name of the file in question

Returns

True if it is a fasta file

Return type

str


src.utils.is_dir(dir_path: str)bool

Determine if a path is a valid path to a directory

Parameters

dir_path (str) – Full path tl the director

Returns

True if the directory exists

Return type

bool


src.utils.is_file(file: str)bool

Determine if a file exists

Parameters

file (str) – Full path to the file

Returns

True if the file exists

Return type

bool


src.utils.all_perms_of_s(s: str, keyletters: str)list

Find all permutations of a string that has values ‘keyletters’ in them

Parameters
  • s (str) – The string to permutate

  • keyletters (str) – The characters to make permutations with

Returns

All permutations of s

Return type

list

Example

>>> all_perms_of_s('LMNOP', 'LI')
>>> ['LMNOP', 'IMNOP']

src.utils.ppm_to_da(mass: float, ppm_tolerance: float)float

Calculate the mass tolerance in Daltons for a particular mass and a parts per million value

Parameters
  • mass (float) – The mass to calculate the Dalton tolerance for

  • ppm_tolerance (float) – The tolerance in parts per million

Returns

Dalton value to add/subtract for upper/lower bounds respectively

Return type

float


src.utils.make_sparse_array(spectrum: list, width: float, value=50)numpy.ndarray

Make a spectrum (a list of floats) into a sparsely populated array for xcorr calculation. Indices are calculated by

idx = int(m/w), m is mass, w is bin width

width is the tolerance in Da to allow when calculating scores. All peaks with some value are given a new value of 50.

Parameters
  • spectrum (list) – Floating point mass values of peaks

  • width (foat) – Mass tolerance for bin width

  • value (number) – Value to put in a bin where a mass is found (default is 50)

Returns

Sparesly populated value-hot array

Return type

numpy.ndarray


src.utils.overlap_intervals(intervals: list)list

Take a list of intervals and turn it into a smaller list by finding any overlapping intervals and making it a larger interval

Parameters

intervals (list) – Intervals (in the form of lists [lower_bound, upper_bound]). Both ends are inclusive

Returns

Overlapped intervals of [lower_bound, upper_bound]

Return type

list


src.utils.predicted_len(precursor_mass: float, precursor_charge: int)int

The predicted length of a spectrum based on its maximum mass

Parameters
  • precursor_mass (float) – The maximum mass of the sequence

  • precursor_charge (int) – The charge of the observed precusor mass

Returns

Predicted sequence length

Return type

int


src.utils.predicted_len_precursor(spectrum: src.objects.Spectrum, sequence: str)int

Make a prediction of the peptide length give a spectrum and the current sequence.

Parameters
  • spectrum (Spectrum) – The observed spectrum

  • sequence (str) – The current alignment made

Returns

Predicted length of a full alignment

Return type

int


src.utils.hashable_boundaries(boundaries: list)str

Turn a lower and upper bound into a string in order to hash

Parameters

boundaries (list) – A list of lists where each internal list is [lower_bound, upper_bound]

Returns

A string of the lower and upper bounds connected that looks like <lower_bound>-<upper_bound>

Return type

str


src.utils.cosine_similarity(a: list, b: list)float

Calculate the cosine similarity of two vectors

Parameters
  • a (list) – First vector

  • b (list) – Second vector

Returns

The cosine similarity of the two vectors

Return type

float


src.utils.__split_hybrid(sequence: str) -> (<class 'str'>, <class 'str'>)

Split a hybrid sequence into it’s left and right components

Parameters

sequence (str) – hybrid sequence with special characters [() -]

Returns

left subsequence, right subsequence

Return type

(str, str)


src.utils.DEV_contains_truth_parts(truth_seq: str, hybrid: bool, b_seqs: list, y_seqs: list)bool

DEVELOPMENT FUNCTION ONLY

Determines if a set of b and y sequences can potentially create the truth. If so, True is returned, otherwise False

Parameters
  • truth_seq (str) – The “truth” sequence or the sequence we want to try and find for this spectrum

  • hybrid (bool) – Is the alignment supposed to be a hybrid sequence

  • b_seqs (list) – k-mers identified from the b-ion score

  • y_seqs (list) – k-mers identified from the y-ion score

Returns

Whether or not the “true” or desired sequence could be found from the b and y sequences

Return type

bool


src.utils.DEV_contains_truth_exact(truth_seq: str, hybrid: bool, seqs: list)bool

DEVELOPMENT FUNCTION ONLY

Determines of the truth sequence is held in the list of sequences. If so, True is returned, otherwise False

Parameters
  • truth_seq (str) – The “truth” sequence or the sequence we want to try and find for this spectrum

  • hybrid (bool) – Is the alignment supposed to be a hybrid sequence

  • seqs (list) – The sequences to look through

Returns

True if the “truth” sequence is found in the list

Return type

bool