Classes

CreateVariants class

class mutagenesis_visualization.CreateVariants

Class to create variants for DNA synthesis.

__call__(dna: str, codon_list: Union[list, str]) → pandas.core.frame.DataFrame

Generate a list of all point mutants given a dna sequence and a list of codons.

Parameters:
  • dna (str,) – Contains the DNA sequence of the allele of reference (usually wild-type).
  • codon_list (list or str) – Input a list of the codons that were used to create point mutations. Example: [“GCC”, “GCG”, “TGC”]. It is important to know that the order of the codon_list will determine the output order.
Returns:

df_output – Dataframe containing the generated sequences.

Return type:

pandas dataframe

Counts class

class mutagenesis_visualization.Counts(dataframes: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], start_position: Optional[int] = None, aminoacids: Optional[List[str]] = None)

Counts represents the output of reading a fastq file.

Parameters:
  • dataframes (dataframe, list dataframes) – 2D matrix containing the counts per codon. Columns will contain the amino acid substitutions, rows will contain the counts for each residue in the protein sequence. If multiple replicates, pass items in a list.
  • start_position (int, default None) – First position in the protein sequence that will be used for the first column of the array. If a protein has been mutated only from residue 100-150, then if start_position = 100, the algorithm will trim the first 99 amino acids in the input sequence. The last residue will be calculated based on the length of the input array. We have set the default value to 2 because normally the Methionine in position 1 is not mutated.
  • aminoacids (list, default None) – List of aminoacids (in order). Stop codon needs to be ‘*’. If none, it will use the index of the dataframe
mean_counts()
library_representation()

LibraryRepresentation pyplot class

class mutagenesis_visualization.main.bar_graphs.library_representation.LibraryRepresentation(dataframes_raw: List[pandas.core.frame.DataFrame], positions: List[int], aminoacids: List[str])

Class to generate a library representation bar plot.

__call__(replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a cumulative stacked bar plot. Each bar represents an amino acid position, and each color indicates the observed variant frequency.

Parameters:
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

MeanCounts pyplot class

class mutagenesis_visualization.main.bar_graphs.mean_counts.MeanCounts(dataframes_raw: List[pandas.core.frame.DataFrame], positions: List[int], aminoacids: List[str])

Class to generate a mean counts bar plot.

__call__(normalize: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Plot in a bargraph the mean counts for each residue of the protein.

Parameters:
  • normalize (bool, default False) – If set to true, the mean counts will be normalized so the highest value is 100.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    text_labels : list of lists, default empty
    If you want to add a label to the graph, add the coordinates and the text. Example: text_labels = [[x0,y0,text0] [x1,y1,text1]].

GeneratePrimers class

class mutagenesis_visualization.GeneratePrimers(dna: str, start: str, end: str)

Class that will generate primers for saturation mutagenesis.

__call__(codon: str = 'NNS', length_primer: int = 15, melting_temp: Optional[float] = None) → pandas.core.frame.DataFrame

Generate primers for saturation mutagenesis.

Parameters:
  • codon (str, default 'NNS') – Degenerate codon that will be used to create the primers. Check idt’s website for a list of all mixed bases and letter code (https://www.idtdna.com/pages/products/custom-dna-rna/mixed-bases). This parameter should contain 3 letters, although can contain more.
  • length_primer (int, default 15) – Number of bases that the primers will have to each side of the mutated codon. Total primer length will be 2*length_primer+3.
  • melting_temp (int, default None) – Melting temperature in Celsius of the primers. Will override length_primer. If none, primers will have a total length of 2*length_primer+3
Returns:

df – Dataframe containing the primers.

Return type:

pandas dataframe

Screen class

class mutagenesis_visualization.Screen(datasets: Union[numpy.ndarray[Any, numpy.dtype[ScalarType]], pandas.core.frame.DataFrame, List[Union[numpy.ndarray[Any, numpy.dtype[ScalarType]], pandas.core.frame.DataFrame]]], sequence: str, aminoacids: List[str], start_position: int = 2, delete_position: Union[int, List[int], None] = None, fillna: float = 0, secondary: Optional[List[List[str]]] = None)

Screen represents a mutagenesis experiment. If you are doing deep scan mutagenesis, then every amino acid in the protein has been mutated to every possible amino acid. For example, if there was a leucine at position 2, then this leucine would be mutated to the other 19 naturally occurring amino acids. However, you can also use the package if you only have a handful of amino acid substitutions.

Parameters:
  • datasets (array, list arrays, dataframe, list dataframes) – 2D matrix containing the enrichment scores of the point mutants. Columns will contain the amino acid substitutions, rows will contain the enrichment for each residue in the protein sequence. If multiple replicates, pass items in a list.
  • sequence (str) – Protein sequence in 1 letter code format.
  • aminoacids (list, default list('ACDEFGHIKLMNPQRSTVWY*')) – Amino acid substitutions (rows). Submit in the same order that is used for the array.
  • start_position (int, default 2) – First position in the protein sequence that will be used for the first column of the array. If a protein has been mutated only from residue 100-150, then if start_position = 100, the algorithm will trim the first 99 amino acids in the input sequence. The last residue will be calculated based on the length of the input array. We have set the default value to 2 because normally the Methionine in position 1 is not mutated.
  • delete_position (int, List[int], default None) – Can delete positions (columns) in the dataset. For example, if you set start_position = 2 and delete_position = 122, you will be deleting the column 120 of the input dataset. The sequence parameter won’t delete anything, so if you plan on deleting a few columns in your dataset, adjust the input sequence and secondary list.
  • fillna (float, default 0) – How to replace NaN values.
  • secondary (list, optional) – This parameter is used to group the data by secondary structure. The format is the name of the secondary structure multiplied by the residue length of that motif. Example : [[‘β1’]*(8),[‘L1’]*(7),[‘α1’]*(9),…,].
  • replicates (list[np.array, Dataframe], optional) – If you have multiple replicates for that experiment, pass them in the same format as dataset.
dataframe

Contains the enrichment scores, position, sequence.

Type:pandas dataframe
Other attributes are same as input parameters
Type:dataset, aminoacids,
start_position, roc_df, secondary

The following classes are integrated into Screen, thus, you only have to use the __call__ method.

EnrichmentBar pyplot class

class mutagenesis_visualization.main.bar_graphs.enrichment_bar.EnrichmentBar(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a enrichment bar plot per position.

__call__(mode: str = 'mean', show_cartoon: bool = False, min_score: Optional[float] = None, max_score: Optional[float] = None, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Plot in a bargraph the enrichment for each residue of the protein. Red for gain of function, blue for loss of function.

Parameters:
  • mode (str, default 'mean') – Specify what enrichment scores to show. If mode = ‘mean’, it will show the mean of each position. If mode = ‘A’, it will show the alanine substitution profile. Can be used for each amino acid. Use the one-letter code and upper case.
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • show_cartoon (boolean, default False) – If true, the plot will display a cartoon with the secondary structure. The user must have added the secondary structure to the object.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    color_gof : str, default ‘red’
    Choose color to color positions with an enrichment score > 0.
    color_lof : str, default ‘blue’
    Choose color to color positions with an enrichment score < 0.

Differential pyplot class

class mutagenesis_visualization.main.bar_graphs.differential.Differential(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate the difference between two experiments.

__call__(screen_object: Screen, metric: Literal[rmse, mean, squared, hard_cutoff] = 'rmse', plot_type: str = 'bar', show_cartoon: bool = False, min_score: Optional[float] = None, max_score: Optional[float] = None, hard_cutoff: Optional[float] = None, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Plot the mean positional difference between two experiments.

Parameters:
  • screen_object (another Screen object to compare with.) –
  • metric (str, default 'rmse') – The way to compare the two objects. Options are ‘rmse’ ((x-y)**2/N)**0.5, ‘squared’ ((x**2-y**2)/N, ‘mean’ (x-y)/N and ‘hard_cutoff’. For ‘hard_cutoff’, select a threshold and the algorithm will count how many mutations are above/below in a pairwise comparison.
  • plot_type (str, default 'bar') – Options are ‘bar’ and ‘line’.
  • show_cartoon (boolean, default False) – If true, the plot will display a cartoon with the secondary structure. The user must have added the secondary structure to the object.
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • hard_cutoff (float, default None) – Only works if metric is selected first.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

PositionBar pyplot class

class mutagenesis_visualization.main.bar_graphs.position_bar.PositionBar(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a mean enrichment bar plot.

__call__(position: int, mask_selfsubstitutions: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Choose a position and plot in a bargraph the enrichment score for each substitution. Red for gain of function, blue for loss of function.

Parameters:
  • position (int) – number of residue of the protein to display.
  • mask_selfsubstitutions (bool, default False) – If set to true, will assing a score of 0 to each self-substitution. ie (A2A = 0)
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    neworder_aminoacids: list, default list(‘DEKHRGNQASTPCVYMILFW*’)
    Set the order (left to right) of the amino acids.
    color_gof : str, default ‘red’
    Color to color mutations > 0.
    color_lof : str, default ‘blue’
    Color to color mutations < 0.

Secondary pyplot class

class mutagenesis_visualization.main.bar_graphs.secondary.Secondary(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate bar plot of data sorted by secondary elements.

__call__(min_score: Optional[float] = None, max_score: Optional[float] = None, replicate: int = -1, show_error_bars: Optional[bool] = True, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a bar plot of data sorted by secondary elements (alpha helices and beta sheets).

Parameters:
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • show_error_bars (bool, default True) – If set to true, show error bars measured as the standard deviation of all replicates.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

Kernel pyplot class

class mutagenesis_visualization.main.kernel.kernel.Kernel(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a kernel density plot.

__call__(show_replicates: bool = False, wt_counts_only: bool = False, show_mean: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Plot univariate or bivariate distributions using kernel density estimation.

Parameters:
  • show_replicates (bool, optional default False) – If set to true, will plot the kernel of each replicate.
  • wt_counts_only (bool, optional default False) – If set to true, it will plot the kernel distribution of the wild type alleles only.
  • show_mean (bool, optional default False) – If set to true, it will plot the kernel distribution mean of replicates when wt_counts_only is True. Otherwise, it will show the mean by default so this parameter won’t work if show_replicates is set to False.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    color: str, default “k”
    Set the color of the mean plot.
    kernel_color_replicates : list of colors, default None
    Add a list of color codes to tune the colors of the plots.
    return_plot_object : boolean, default False
    If true, will return plotting objects (ie. fig, ax_object).

Histogram pyplot class

class mutagenesis_visualization.main.kernel.histogram.Histogram(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a histogram plot.

__call__(population: str = 'All', show_parameters: bool = False, loc: str = 'best', replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a histogram plot. Can plot single nucleotide variants (SNVs) or non-SNVs only.

Parameters:
  • population (str, default 'All'.) – Other options are ‘SNV’ and ‘nonSNV’.
  • show_parameters (bool, default False) – If set to true, will display the mean and the median of the data.
  • loc (str, default "best") – Set the location of the meam and median. Check the matplotlib plt.legend method to see how the parameter loc works. https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    return_plot_object : boolean, default False
    If true, will return plotting objects (ie. fig, ax).
    bins : int or str, default ‘auto’.
    Number of bins for the histogram. By default it will automatically decide the number of bins.
    color: str, default ‘k’
    Change to a different color if desired.

Sequence pyplot class

class mutagenesis_visualization.main.kernel.sequence_differences.SequenceDifferences(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate the sequence differences plot.

__call__(screen_object: Screen, map_sequence_changes: List[Tuple[int, int]], legend_labels: Optional[Tuple[str, str]] = None, replicate: int = -1, replicate_second_object: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate two histogram plots. The first plot will have the impact on fitness to go from protein A -> B, and the second plot will contain the B -> A effect.

Parameters:
  • screen_object (Screen object or list containing Screen) – objects.
  • map_sequence_changes (list of tuples) – Set the residues that differ between protein A and protein B. Example: [(1, 1), (12, 12), (15, 16)]. In the example, the algorithm will compare the residue 1 and 12 of each protein, and the residue 15 of protein A vs the residue 16 of protein B.
  • legend_labels (tuple of str) – Set the labels of the legend.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • replicate_second_object (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    bins : int or str, default ‘auto’.
    Number of bins for the histogram. By default it will automatically decide the number of bins.

MultipleKernel pyplot class

class mutagenesis_visualization.main.kernel.multiple_kernels.MultipleKernel(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate plots of multiple kernels.

Heatmap pyplot class

class mutagenesis_visualization.main.heatmaps.heatmap.Heatmap(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class plots a heatmat with the enrichment scores.

__call__(nancolor: str = 'lime', mask_selfsubstitutions: bool = False, color_selfsubstitutions: Optional[str] = 'k', show_cartoon: bool = False, show_snv: bool = False, hierarchical: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a heatmap plot of the enrichment scores.

Parameters:
  • nancolor (str, default 'lime') – Will color np.nan values with the specified color.
  • mask_selfsubstitutions (bool, default False) – If set to true, will assing a score of 0 to each self-substitution. ie (A2A = 0)
  • color_selfsubstitutions (str, default black) – If set to a color, it will color the self-substitution borders. Set to None to not color the self substitutions.
  • show_carton (boolean, default False) – If true, the plot will display a cartoon with the secondary structure. The user must have added the secondary structure to the object.
  • show_snv (boolean, default False) – If true, it will only display mutants that are a single nucleotide variant (SNV) of the wild-type protein sequence. The algorithm does not take into account the wild-type DNA allele, so it will include any possible mutant that is one base away.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    neworder_aminoacids: list, default list(‘DEKHRGNQASTPCVYMILFW*’)
    Order of amino acids (y-axis) to display in the heatmap.

HeatmapColumns pyplot class

class mutagenesis_visualization.main.heatmaps.heatmap_columns.HeatmapColumns(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class plots a heatmap with the enrichment scores where you can show selected columns.

__call__(segment: Tuple[int, int], ylabel: bool = True, nancolor: str = 'lime', mask_selfsubstitutions: bool = False, color_selfsubstitutions: Optional[str] = 'k', replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a heatmap plot enrichment scores but only plots a selected segment.

Parameters:
  • segment (Tuple[int]) – Segment is typed as [20,40] and includes both residues 20 and 40.
  • ylabel (str, default True) – Choose False to hide.
  • nancolor (str, default 'lime') – Will color np.nan values with the specified color.
  • mask_selfsubstitutions (bool, default False) – If set to true, will assing a score of 0 to each self-substitution. ie (A2A = 0)
  • color_selfsubstitutions (str, default black) – If set to a color, it will color the self-substitution borders. Set to None to not color the self substitutions.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

HeatmapRows pyplot class

class mutagenesis_visualization.main.heatmaps.heatmap_rows.HeatmapRows(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class plots a heatmat withe the enrichment scores.

Miniheatmap pyplot class

class mutagenesis_visualization.main.heatmaps.miniheatmap.Miniheatmap(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a ROC analysis.

__call__(mask_selfsubstitutions: bool = False, position_offset: int = 0, background_correction: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a miniheatmap plot enrichment scores of mutagenesis selection assays.

Parameters:
  • mask_selfsubstitutions (bool, default False) – If set to true, will assing a score of 0 to each self-substitution. i.e., (A2A = 0)
  • position_offset (int, default 0) – Will group columns by residues. If the offset is not 0, it will use the values of the n+offset to group by. For example, you may want to see what happens when you have a Proline in front of the mutated residue. The algorithm can report the difference between the calculated value and the mean score for that particular substitution. Offset of 1 means that you evaluate the effect of following residue n+1 on n. Offset of -1 means that you look at the previous residue (n-1 on n).
  • background_correction (boolean, default False) – If offset is nonzero, whether subtract the average effect of a substitution or not.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    colorbar_scale: tuple, default (-1, 1)
    Scale min and max used in heatmaps and correlation heatmaps.

Rank pyplot class

class mutagenesis_visualization.main.other_stats.rank.Rank(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a mean enrichment bar plot.

__call__(mode: str = 'pointmutant', output_file: Union[None, str, pathlib.Path] = None, replicate: int = -1, **kwargs) → None

Generate a rank plot so every mutation/residue is sorted based on enrichment score.

Parameters:
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position
  • outdf (boolean, default False) – If set to true, will return the df with the rank of mutations
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

Cumulative pyplot class

class mutagenesis_visualization.main.other_stats.cumulative.Cumulative(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class will plot a cumulative function on the enrichment scores from first to last amino acid.

__call__(mode: str = 'all', replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a cumulative plot of the enrichment scores by position.

Parameters:
  • mode (str, default 'all') – Options are ‘mean’, ‘all’,’SNV’ and ‘nonSNV’.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

ROC pyplot class

class mutagenesis_visualization.main.other_stats.roc_analysis.ROC(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a ROC analysis.

Correlation pyplot class

class mutagenesis_visualization.main.pca_analysis.correlation.Correlation(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class will conduct a correlation from the enrichment scores.

__call__(replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a correlation of each amino acid.

Parameters:
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –
    colorbar_scale: tuple, default (-1, 1)
    Scale min and max used in heatmaps and correlation heatmaps.

IndividualCorrelation pyplot class

class mutagenesis_visualization.main.pca_analysis.individual_correlation.IndividualCorrelation(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class will conduct an individual correlation from the enrichment scores.

__call__(replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a bar plot of the correlation of each amino acid mutational profile (row of the heatmap) with the rest of amino acids (rows)

Parameters:
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

PCA pyplot class

class mutagenesis_visualization.main.pca_analysis.pca.PCA(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class will conduct a PCA from the enrichment scores.

__call__(mode: Literal[aminoacid, secondary, residue] = 'aminoacid', dimensions: Tuple[int, int] = (0, 1), adjust_labels: bool = False, replicate: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a plot of two PCA dimensions.

Parameters:
  • mode (list, default 'aminoacid') – Can also do PCA by secondary structure element if set to “secondary” or by individual residue if set to “residue”.
  • dimensions (tuple, default (0,1)) – Specify which two PCA dimensions to plot. By default PCA1 vs PCA2. Max dimension is 5.
  • adjust_labels (boolean, default False) – If set to true, it will adjust the text labels so there is no overlap. It is convenient to increase the size of the figure, otherwise the algorithm will not find a solution. Requires to install adjustText package.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) – random_state : int, default 554

DifferentialP plotly class

class mutagenesis_visualization.main.plotly.differential.DifferentialP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a differential plot.

__call__(screen_object: Any, metric: Literal[rmse, squared, mean] = 'rmse', plot_type: str = 'bar', mode: str = 'mean', replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a plotly mean plot.

Parameters:
  • screen_object (another Screen object to compare with.) –
  • metric (str, default 'rmse') – The way to compare the two objects. Options are ‘rmse’ ((x-y)**2/N)**0.5, ‘squared’ ((x**2-y**2)/N and ‘mean’ (x-y)/N.
  • plot_type (str, default 'bar') – Options are ‘bar’ and ‘line’.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

EnrichmentBarP plotly class

class mutagenesis_visualization.main.plotly.enrichment_bar.EnrichmentBarP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a mean enrichment plot.

HeatmapP plotly class

class mutagenesis_visualization.main.plotly.heatmap.HeatmapP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a heatmap.

__call__(mask_selfsubstitutions: bool = False, replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a plotly histogram plot.

Parameters:
  • mask_selfsubstitutions (bool, default False) – If set to true, will assing a score of 0 to each self-substitution. ie (A2A = 0)
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

HistogramP plotly class

class mutagenesis_visualization.main.plotly.histogram.HistogramP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a histogram plot.

__call__(mode: str = 'pointmutant', replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a plotly histogram plot.

Parameters:
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

RankP plotly class

class mutagenesis_visualization.main.plotly.rank.RankP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a rank plot.

__call__(mode: str = 'pointmutant', replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a plotly rank plot so every mutation/residue is sorted based on enrichment score.

Parameters:
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

Scatter3DPDB plotly class

class mutagenesis_visualization.main.plotly.scatter_3d_pdb.Scatter3DPDB(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a 3D scatter plot of the protein and the enrichment scores where you can add PDB properties.

__call__(pdb_path: str = None, plot: Optional[List[str]] = None, mode: str = 'mean', custom: Any = None, position_correction: int = 0, chain: str = 'A', replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a 3-D scatter plot of different properties obtained from the PDB. PDBs may have atoms missing, you should fix the PDB before using this method. We recommend you use matplotlib for interactive plot.

Parameters:
  • pdb_path (str, default None) – User should specify the path PDB.
  • plot (list, default ['Distance', 'SASA', 'log B-factor']) – List of 3 elements to plot. Other options are ‘Score’ and Custom. If custom, add the label to the third element of the list ie. [‘Distance’, ‘SASA’, ‘Conservation’].
  • mode (str, default 'mean') – Specify what enrichment scores to use. If mode = ‘mean’, it will use the mean of each position to classify the residues. If mode = ‘A’, it will use the Alanine substitution profile. Can be used for each amino acid. Use the one-letter code and upper case.
  • custom (list or dataframe or np.array, default None) – If you want to add a custom dataset to plot, use custom. On the parameter plot, the 3rd item of the list will be the label for your custom dataset.
  • df_color (pandas dataframe, default None) – The color of each residue can also be included. You must label that label column.
  • color_by_score (boolean, default True) – If set to False, the points in the scatter will not be colored based on the enrichment score.
  • position_correction (int, default 0) – If the pdb structure has a different numbering of positions than you dataset, you can correct for that. If your start_position = 2, but in the PDB that same residue is at position 20, position_correction needs to be set at 18.
  • chain (str, default 'A') – Chain of the PDB file to get the coordinates and SASA from.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

Scatter3D plotly class

class mutagenesis_visualization.main.plotly.scatter_3d.Scatter3D(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a 3D scatter plot of the protein and the enrichment scores.

__call__(pdb_path: str, mode: str = 'mean', df_coordinates: bool = None, position_correction: int = 0, chain: str = 'A', squared: bool = False, replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generates a 3-D scatter plot of the x,y,z coordinates of the C-alpha atoms of the residues, color coded by the enrichment scores. PDBs may have atoms missing, you should fix the PDB before using this method. Use matplotlib for interactive plot.

Parameters:
  • pdb (str, default None) – User should specify the path PDB.
  • mode (str, default 'mean') – Specify what enrichment scores to use. If mode = ‘mean’, it will use the mean of each position to classify the residues. If mode = ‘A’, it will use the Alanine substitution profile. Can be used for each amino acid. Use the one-letter code and upper case.
  • df_coordinates (pandas dataframe, default None) – If no pdb is included, the user must pass the 3-D coordinates of the residues to plot. In here you have more flexibility and you can select other atoms besides the C-alpha.
  • position_correction (int, default 0) – If the pdb structure has a different numbering of positions than you dataset, you can correct for that. If your start_position = 2, but in the PDB that same residue is at position 20, position_correction needs to be set at 18.
  • chain (str, default 'A') – Chain of the PDB file to get the coordinates and SASA from.
  • squared (boolean, False) – If this parameter is True, the algorithm will center the data, and plot the square value of the distance.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

ScatterP plotly class

class mutagenesis_visualization.main.plotly.scatter.ScatterP(dataframes: mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: Optional[str] = None, start_position: Optional[int] = None, end_position: Optional[int] = None)

This class uses plotly to generate a scatter plot.

__call__(screen_object: Any, mode: str = 'pointmutant', show_results: bool = False, replicate: int = -1, output_html: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a scatter plot between object and a second object of the same class.

Parameters:
  • screen_object (object from class Screen to do the scatter with) –
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position.
  • show_results (boolean, default False) – If set to true, will export the details of the linear fit.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_html (str, default None) – If you want to export the generated graph into html, add the path and name of the file. Example: ‘path/filename.html’.
  • **kwargs (other keyword arguments) –

Scatter pyplot class

class mutagenesis_visualization.main.scatter.scatter.Scatter(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate a kernel density plot.

__call__(screen_object: Union[Screen, Any], mode: Literal[mean, pointmutant] = 'pointmutant', min_score: Optional[float] = None, max_score: Optional[float] = None, replicate: int = -1, replicate_second_object: int = -1, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a scatter plot between object and a second object of the same class.

Parameters:
  • screen_object (object from class Screen to do the scatter with) –
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position.
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • replicate_second_object (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

ScatterReplicates pyplot class

class mutagenesis_visualization.main.scatter.scatter_replicates.ScatterReplicates(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

Class to generate scatter plots of each pairwise replicate combination.

__call__(wt_counts_only: bool = False, mode: Literal[mean, pointmutant] = 'pointmutant', min_score: Optional[float] = None, max_score: Optional[float] = None, output_file: Union[None, str, pathlib.Path] = None, **kwargs) → None

Generate a series of scatter plots between replicates.

Parameters:
  • wt_counts_only (bool, optional default False) – If set to true, it will plot the kernel distribution of the wild type alleles only. mode will be pointmutant by default.
  • mode (str, default 'pointmutant'.) – Alternative set to “mean” for the mean of each position.
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • output_file (str, default None) – If you want to export the generated graph, add the path and name of the file. Example: ‘path/filename.png’ or ‘path/filename.svg’.
  • **kwargs (other keyword arguments) –

Pymol pyplot class

class mutagenesis_visualization.main.pymol.pymol.Pymol(dataframes: Optional[mutagenesis_visualization.main.utils.replicates_screen_input.DataframesHolder] = None, dataframes_raw: Optional[List[pandas.core.frame.DataFrame]] = None, aminoacids: Union[str, List[str]] = '', datasets: Optional[List[numpy.ndarray[Any, numpy.dtype[ScalarType]]]] = None, sequence: str = '', sequence_raw: str = '', start_position: int = 0, secondary: Optional[List[T]] = None, secondary_dup: Optional[List[T]] = None)

This class acts as a wrapper with the ipymol github repo.

__call__(pdb: Union[str, pathlib.Path], mode: str = 'mean', residues: List[str] = None, position_correction: int = 0, esthetic_parameters: bool = True, min_score: Optional[float] = None, max_score: Optional[float] = None, replicate: int = -1, **kwargs) → None

Color pymol structure residues. User can specify the residues to color, or can use the mutagenesis data. Activating mutations will be colored red and loss of function blue. Neutral mutations in green. Only works if pymol is your $PATH as pymol or you can start PyMOL in server mode. Uses the ipymol package, which needs to be installed from Github $pip install git+https://github.com/cxhernandez/ipymol, not from pypi (not updated here).

Please ensure that PyMOL is in your $PATH as pymol.

Parameters:
  • pdb (str) – User should specify the PDB chain in the following format 4G0N_A. If you have internet connection, Pymol will download the pdb. Otherwise, include the path were your PDB is stored locally.
  • mode (str, default 'mean') – Others: ‘snv’ ‘nonsnv’, ‘aminoacid’ Specify what enrichment scores to use. If mode = ‘mean’, it will use the mean of each position to classify the residues. If mode = ‘A’, it will use the Alanine substitution profile. Can be used for each amino acid. Use the one-letter code and upper case.
  • residues (list , optional) – If user decides to pass custom arguments, use the following format residues = [‘1,2,3,4-10’,’12-15,23,24,35’,’48,49,50,52-60’] which are [blue,red,green].
  • position_correction (int, default 0) – If the pdb structure has a different numbering of positions than you dataset, you can correct for that. If your start_position = 2, but in the PDB that same residue is at position 20, position_correction needs to be set at 18.
  • esthetic_parameters (bool, default True) – If set to True, pymol will apply the mutagenesis_visualization custom parameters instead of the default Pymol ones.
  • min_score (float, default None) – Change values below a minimum score to be that score. i.e., setting min_score = -1 will change any value smaller than -1 to -1.
  • max_score (float, default None) – Change values below a maximum score to be that score. i.e., setting max_score = 1 will change any value greater than 1 to 1.
  • replicate (int, default -1) – Set the replicate to plot. By default, the mean is plotted. First replicate start with index 0. If there is only one replicate, then leave this parameter untouched.
  • **kwargs (other keyword arguments) –
    gof : int, default is 1
    cutoff for determining gain of function mutations based on mutagenesis data.
    lof : int, default is -1
    cutoff for determining loss of function mutations based on mutagenesis data.
    color : str, default ‘chlorine’
    Choose color to color neutral.
    color_gof : str, default ‘red’
    Choose color to color positions with an enrichment score > gof.
    color_lof : str, default ‘neptunium’
    Choose color to color positions with an enrichment score < lof.
Returns:

  • Open pymol session with a fetched pdb structure where the residues
  • are colored according to the enrichment scores.

Functions

mutagenesis_visualization.calculate_enrichment(aminoacids: Union[List[str], str], pre_lib: Union[str, pandas.core.frame.DataFrame, numpy.ndarray[Any, numpy.dtype[ScalarType]]], post_lib: Union[str, pandas.core.frame.DataFrame, numpy.ndarray[Any, numpy.dtype[ScalarType]]], pre_wt: Union[str, None, numpy.ndarray[Any, numpy.dtype[ScalarType]]] = None, post_wt: Union[str, None, numpy.ndarray[Any, numpy.dtype[ScalarType]]] = None, zeroing_method: Literal[none, zscore, counts, wt, wt synonymous, kernel, population] = 'population', zeroing_metric: Literal[mean, mode, median] = 'median', stopcodon: bool = True, min_counts: int = 25, min_countswt: int = 100, std_scale: Optional[float] = 0.2, mad_filtering: int = 2, mwt: float = 2, infinite: float = 3, output_file: Union[None, str, pathlib.Path] = None) → numpy.ndarray[Any, numpy.dtype[ScalarType]]

Determine the enrichment scores of a selection experiment, where there is a preselected population (input) and a selected population (output).

Parameters:
  • aminoacids (list, str) – Index of aminoacids (in order). Stop codon needs to be ‘*’.
  • pre_lib (str, pandas dataframe or np.array) – Can be filepath and name of the exported txt file, dataframe or np.array.
  • post_lib (str, pandas dataframe or np.array) – Can be filepath and name of the exported txt file, dataframe or np.array.
  • pre_wt (str, or np.array, optional) – Str with filepath and name of the exported txt file or np.array.
  • post_wt (str, or np.array, optional) – Str with filepath and name of the exported txt file or np.array.
  • zeroing_method (str, default 'population') – Method to normalize the data. Can also use ‘none’, ‘zscore’, ‘counts’, ‘wt’ or ‘kernel’. If ‘wt’ is used ‘pre_wt’ must not be set to None.
  • zeroing_metric (str, default 'median') – Metric to zero the data. Only works if zeroing_method=’population’ or ‘wt’. Can also be set to ‘mean’ or ‘mode’.
  • stopcodon (boolean, default False) – Use the enrichment score stop codons as a metric to determine the minimum enrichment score.
  • min_counts (int, default 25) – If mutant has less than the min_counts, it will be replaced by np.nan.
  • min_countswt (int, default 100) – If synonymous wild-type mutant has less than the min_counts, it will be replaced by np.nan.
  • std_scale (float, default 0.2) – Factor by which the population is scaled. Set to None if you don’t want to scale the data.
  • mad_filtering (int, default 2) – Will apply MAD (median absolute deviation) filtering to data.
  • mwt (int, default 2) – When MAD filtering is applied, mad_filtering is the number of medians away a data point must be to be discarded. mwt is only used when the population of wild-type alleles is the reference for data zeroing_method.
  • infinite (int, default 3) – It will replace +infinite values with +3 and -infinite with -3.
  • output_file (str, default None) – If you want to export the generated files, add the path and name. Example: ‘path/filename.txt’. File will be save as a txt, csv, xlsx file.
Returns:

zeroed – A np.array containing the enrichment scores.

Return type:

ndarray

mutagenesis_visualization.count_reads(dna_sequence: str, input_file: Union[str, pathlib.Path], codon_list: Union[List[str], str] = 'NNS', counts_wt: bool = True, start_position: int = 2, output_file: Union[None, str, pathlib.Path] = None, full: bool = False) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Process a trimmed fastq file containing DNA reads and returns the counts of each DNA sequence specified by the user.

Parameters:
  • dna_sequence (str,) – Contains the DNA sequence of the allele of reference (usually wild-type).
  • input_file (str, default None) – Path and name of the fastq file (full name including suffix “.fastq”).
  • codon_list (list or str, default 'NNS') – Input a list of the codons that were used to create point mutations. Example: [“GCC”, “GCG”, “TGC”]. If the library was built using NNS and NNK codons, it is enough to input ‘NNS’ or ‘NNK’ as a string. It is important to know that the order of the codon_list will determine the output order.
  • counts_wt (boolean, default True) – If true it will add the counts to the wt allele. If false, it will set it up to np.nan.
  • start_position (int, default 2) – First position in the protein sequence that will be used for the first column of the array. If a protein has been mutated only from residue 100-150, then if start_position = 100, the algorithm will trim the first 99 amino acids in the input sequence. The last residue will be calculated based on the length of the input array. We have set the default value to 2 because normally the Methionine in position 1 is not mutated.
  • output_file (str, default None) – If you want to export the generated files, add the path and name of the file without suffix. Example: ‘path/filename.xlsx’.
  • full (bool, optional) – Switch determining nature of return value. When it is False (the default) just the reads are returned, when True diagnostic information from the fastq analysis is also returned.
Returns:

  • df_counts (dataframe) – Dataframe with the counts for each point mutant.
  • wt_counts (list) – List of the counts for each for each DNA sequence that codes for the wild-type protein.
  • useful_reads (str) – Present only if full = True. Contains the useful reads.

mutagenesis_visualization.count_fastq(variants: List[str], input_file: Union[str, pathlib.Path]) → Tuple[dict, int, int]

Count the frequency of variants in the input fastq file.

Parameters:
  • variants (list) –
  • input_file (str, default None) – Path and name of the fastq file (full name including suffix “.fastq”).
Returns:

  • variants (ordered dict) – Same input dictionary by now has the values updated with the counts.
  • totalreads (int) – Total number of DNA chains that appear in the fastq file.
  • usefulreads (int) – Total number of identified DNA chains. Calculated as the sum of all the key values.

mutagenesis_visualization.load_demo_datasets() → Dict[str, pandas.core.frame.DataFrame]

Loads example datasets so the user can play with it.

Returns:data_dict – Dictionary that contains the datasets used to create the plots on the documentation.
Return type:Dict[str, DataFrame]
mutagenesis_visualization.run_demo(figure: Literal[heatmap, miniheatmap, mean, kernel, pca, position, secondary_mean, correlation, individual_correlation, pymol] = 'heatmap', show: bool = True) → None

Performs a demonstration of the mutagenesis_visualization software.

Parameters:
  • figure (str, default 'heatmap') – There are a few example plots that can be displayed to test the package is working on your station. The options are ‘heatmap’, ‘miniheatmap’, ‘mean’, ‘kernel’, ‘pca’ ‘position’, ‘secondary_mean’, ‘correlation’, ‘individual_correlation’ and ‘pymol’. Check the documentation for more information.
  • show (boolean, default True) – If True, will execute plt.show() for each figure.

The following function generate_default_kwargs is not called by the user as a function. It contains the kwargs that are parameters of the Screen methods.

mutagenesis_visualization.main.utils.kwargs.generate_default_kwargs() → Dict[str, Any]

Kwargs used in the methods and some other functions. Not all kwargs work on each method, read the individual description. Don’t call this function on its own, use the parameters within the plotting methods.

Example: mut.heatmap(colormap=colormap of interest)

Parameters:
  • colormap (cmap, default custom bluewhitered) – Used for heatmaps. You can use your own colormap or the ones provided by matplotlib. Example colormap = copy.copy((plt.cm.get_cmap(‘Blues_r’)))
  • colorbar_scale (tuple, default [-1, 1]) – Scale min and max used in heatmaps and correlation heatmaps.
  • color (str, default 'k') – Color used for the kernel plot line, the histogram, the bar plots.
  • title (str, default 'Title') – Title of plot.
  • x_label (str, default 'x_label') – Label of x axis.
  • y_label (str, default 'y_label') – Label of y axis.
  • xscale (tuple, default (None, None)) – MinMax of x axis.
  • yscale (tuple, default (None, None)) – MinMax of y axis.
  • tick_spacing (int, default 1) – Space of axis ticks. Used for scatter and cumulative plots.
  • outputfilepath (str, default '') – Path where file will be exported to.
  • outputfilename (str, default '') – Name of the exported file.
  • dpi (int, default 600) – Dots Per Inch in the created image.
  • neworder_aminoacids (list, default list('DEKHRGNQASTPCVYMILFW*')) – Order of amino acids to display in heatmaps. Used for heatmaps.
  • gof (int, default 1) – Cutoff of the enrichment score to classify a mutation as gain of function. Used on pymol and 3D methods.
  • lof (int, default -1) – Cutoff of the enrichment score to classify a mutation as loss of funtion. Used on pymol and 3D methods.
  • color_gof (str, default 'red') – Color to color mutations above the gof cutoff. Used in pymol, 3D and mean methods.
  • color_lof (str, default 'blue') – Color to color mutations below the lof cutoff. Used in pymol, 3D and mean methods.
  • cartoon_colors (list, default ['lightgreen', 'lavender', 'k']) – Colors used for secondary structure cartoon. Used for heatmap, mean and mean_count plots.
  • text_labels (str, default 'None') – Text labels that you can add to mean and mean_count plots. You will need to specify the coordinates.
  • show (boolean, default True) – Whether to execute plt.show() or not on a matplotlib object.
  • close (boolean, default False) – Whether to execute plt.close() or not on a matplotlib object.
  • random_state (int, default 554) – Random state used for PCA function.
  • bins (int or str, default 'auto'.) – Number of bins for the histogram. By default it will automatically decide the number of bins.
  • return_plot_object (boolean, default False) – If true, will return plotting object.
  • figsize_x (int) –
  • figsize_y (int) –
  • legend_fontsize (int, default 10) –
Returns:

default_kwargs – Dictionary with the default kwargs.

Return type:

dict