utils¶
Utils module is used throughout the project for various things. If you don’t know where to put something, it probably belongs here.
- src.utils.file_exists(file_name: str) → bool¶
Determine if a file exists
- Parameters
file_name (str) – Path to the file in question
- Returns
True if the file exists
- Return type
bool
- src.utils.make_valid_dir_string(dir_path: str) → str¶
Add os separator character to end of directory string to make valid directory path
- Parameters
dir_path (str) – Name of directory to check
- Returns
Corrected directory path
- Return type
str
- src.utils.make_dir(dir_path: str) → bool¶
Check directory path for existing directory or make one with the name given. NOTE: this is not recursive, only 1 level of directory will be created
- Parameters
dir_path (str) – Full path of directory to create
- Returns
True if successful
- Return type
bool
- src.utils.make_valid_text_file(file_name: str) → str¶
Ensure some string path has .txt appended to it for appropriate .txt extension
- Parameters
file_name (str) – File name to validate for txt file type
- Returns
Name with the .txt extension
- Return type
str
- src.utils.make_valid_json_file(file_name: str) → str¶
Ensure some string path has .json appended to it for appropriate .json extension
- Parameters
file_name (str) – File name to validate for json file type
- Returns
Name with the .json extension
- Return type
str
- src.utils.make_valid_csv_file(file_name: str) → str¶
Ensure some string path has .csv appended to it for appropriate .csv extension
- Parameters
file_name (str) – File name to validate for csv file type
- Returns
Name with the .csv extension
- Return type
str
- src.utils.make_valid_fasta_file(file_name: str) → str¶
Ensure some string path has .fasta appended to it for appropriate .fasta extension
- Parameters
file_name (str) – File name to validate for fasta file type
- Returns
Name with the .fasta extension
- Return type
str
- src.utils.is_json(file: str) → bool¶
Determine if a file is a json file
- Parameters
file (str) – File name of file in question
- Returns
True if is a json file
- Return type
bool
- src.utils.is_fasta(file: str) → bool¶
Determine if a file is a fasta file
- Parameters
file (str) – File name of the file in question
- Returns
True if it is a fasta file
- Return type
str
- src.utils.is_dir(dir_path: str) → bool¶
Determine if a path is a valid path to a directory
- Parameters
dir_path (str) – Full path tl the director
- Returns
True if the directory exists
- Return type
bool
- src.utils.is_file(file: str) → bool¶
Determine if a file exists
- Parameters
file (str) – Full path to the file
- Returns
True if the file exists
- Return type
bool
- src.utils.all_perms_of_s(s: str, keyletters: str) → list¶
Find all permutations of a string that has values ‘keyletters’ in them
- Parameters
s (str) – The string to permutate
keyletters (str) – The characters to make permutations with
- Returns
All permutations of s
- Return type
list
- Example
>>> all_perms_of_s('LMNOP', 'LI') >>> ['LMNOP', 'IMNOP']
- src.utils.ppm_to_da(mass: float, ppm_tolerance: float) → float¶
Calculate the mass tolerance in Daltons for a particular mass and a parts per million value
- Parameters
mass (float) – The mass to calculate the Dalton tolerance for
ppm_tolerance (float) – The tolerance in parts per million
- Returns
Dalton value to add/subtract for upper/lower bounds respectively
- Return type
float
- src.utils.make_sparse_array(spectrum: list, width: float, value=50) → numpy.ndarray¶
Make a spectrum (a list of floats) into a sparsely populated array for xcorr calculation. Indices are calculated by
idx = int(m/w), m is mass, w is bin width
width is the tolerance in Da to allow when calculating scores. All peaks with some value are given a new value of 50.
- Parameters
spectrum (list) – Floating point mass values of peaks
width (foat) – Mass tolerance for bin width
value (number) – Value to put in a bin where a mass is found (default is 50)
- Returns
Sparesly populated value-hot array
- Return type
numpy.ndarray
- src.utils.overlap_intervals(intervals: list) → list¶
Take a list of intervals and turn it into a smaller list by finding any overlapping intervals and making it a larger interval
- Parameters
intervals (list) – Intervals (in the form of lists [lower_bound, upper_bound]). Both ends are inclusive
- Returns
Overlapped intervals of [lower_bound, upper_bound]
- Return type
list
- src.utils.predicted_len(precursor_mass: float, precursor_charge: int) → int¶
The predicted length of a spectrum based on its maximum mass
- Parameters
precursor_mass (float) – The maximum mass of the sequence
precursor_charge (int) – The charge of the observed precusor mass
- Returns
Predicted sequence length
- Return type
int
- src.utils.predicted_len_precursor(spectrum: src.objects.Spectrum, sequence: str) → int¶
Make a prediction of the peptide length give a spectrum and the current sequence.
- Parameters
spectrum (Spectrum) – The observed spectrum
sequence (str) – The current alignment made
- Returns
Predicted length of a full alignment
- Return type
int
- src.utils.hashable_boundaries(boundaries: list) → str¶
Turn a lower and upper bound into a string in order to hash
- Parameters
boundaries (list) – A list of lists where each internal list is [lower_bound, upper_bound]
- Returns
A string of the lower and upper bounds connected that looks like <lower_bound>-<upper_bound>
- Return type
str
- src.utils.cosine_similarity(a: list, b: list) → float¶
Calculate the cosine similarity of two vectors
- Parameters
a (list) – First vector
b (list) – Second vector
- Returns
The cosine similarity of the two vectors
- Return type
float
- src.utils.__split_hybrid(sequence: str) -> (<class 'str'>, <class 'str'>)¶
Split a hybrid sequence into it’s left and right components
- Parameters
sequence (str) – hybrid sequence with special characters [() -]
- Returns
left subsequence, right subsequence
- Return type
(str, str)
- src.utils.DEV_contains_truth_parts(truth_seq: str, hybrid: bool, b_seqs: list, y_seqs: list) → bool¶
DEVELOPMENT FUNCTION ONLY
Determines if a set of b and y sequences can potentially create the truth. If so, True is returned, otherwise False
- Parameters
truth_seq (str) – The “truth” sequence or the sequence we want to try and find for this spectrum
hybrid (bool) – Is the alignment supposed to be a hybrid sequence
b_seqs (list) – k-mers identified from the b-ion score
y_seqs (list) – k-mers identified from the y-ion score
- Returns
Whether or not the “true” or desired sequence could be found from the b and y sequences
- Return type
bool
- src.utils.DEV_contains_truth_exact(truth_seq: str, hybrid: bool, seqs: list) → bool¶
DEVELOPMENT FUNCTION ONLY
Determines of the truth sequence is held in the list of sequences. If so, True is returned, otherwise False
- Parameters
truth_seq (str) – The “truth” sequence or the sequence we want to try and find for this spectrum
hybrid (bool) – Is the alignment supposed to be a hybrid sequence
seqs (list) – The sequences to look through
- Returns
True if the “truth” sequence is found in the list
- Return type
bool