database

This module acts on the Database namedtuple object. We do this C-esque work instead of a python class for the slight memory efficiency and speed bump.

src.database.extract_protein_name(prot_entry: collections.namedtuple)str

Extract the protein name from a protein entry namedtuple from pyteomics fasta read.

Parameters

prot_entry (namedtuple) – a namedtuple with a value of ‘description’

Returns

the name of the protein

Return type

str


src.database.build(fasta_file: str)src.objects.Database

Create a Database namedtuple from a fasta file

Parameters

fasta_file (str) – the full path to a fasta database file

Returns

a Database object with the fasta file and protein fields filled in

Return type

Database


src.database.get_proteins_with_subsequence(db: src.objects.Database, sequence: str)list

Find the name of all proteins that have the subsequence provided. A list of these names are returned

Parameters
  • db (Database) – source of the proteins

  • sequence (str) – the subsequence to look for

Returns

all protein names of source proteins

Return type

list


src.database.get_proteins_with_subsequence_ion(db: src.objects.Database, sequence: str, ion: str)list

Find all protein names that have the subsequence. Recursivley search if the full sequence is not found immediately

Parameters
  • db (Database) – source of the proteins

  • sequence (str) – subsequence to look for

  • ion (str) – the ion type. Either ‘b’ or ‘y’

Returns

names of the source protein(s)

Return type

list


src.database.get_entry_by_name(db: src.objects.Database, name: str)collections.namedtuple

Get a namedtuple of the protein entry from the database.

Parameters
  • db (Database) – source of proteins

  • name (str) – the name of the protein to look for

Returns

namedtuple with fields ‘description’ and ‘sequence’

Return type

namedtuple