Other datasets¶
Up to this moment, we have only shown how the package performs with our
own dataset. The moment of truth is when we test our software with other
people’s datasets. In this section we have compiled saturation
mutagenesis datasets found in the literature and we reproduce the
analysis. Not only does the package works with other datasets, but also
it allows to customize a wide range of parameters such as color maps,
scales, etc. Furthermore, on top of testing the resilience of
mutagenesis_visualization
, we are providing extra examples on how to
use this API.
%matplotlib inline
from typing import Dict, Union, List
from pandas.core.frame import DataFrame
import numpy as np
import pandas as pd
import matplotlib as plt
import copy
from mutagenesis_visualization import Screen
from mutagenesis_visualization import load_demo_datasets
from mutagenesis_visualization import DemoObjects
from mutagenesis_visualization.main.utils.data_paths import PDB_1ERM, PDB_1A5R, PDB_1ND4
DEMO_DATASETS: Dict[str, Union[np.array, DataFrame]] = load_demo_datasets()
- Function reviewed in this section:
Load objects¶
For simplicity, we also have added the option of loading those datasets into objects automatically There are 10 objects to load from other papers (hras_rbd, hras_gapgef, bla_obj, sumo_obj, mapk1_obj, ube2i_obj, tat_obj, rev_obj, asynuclein_obj, aph_obj, b11l5f_obj) and all the heatmaps from the Hidalgo et al. eLife 2022 paper (hras_166_gap, hras_166_rbd, hras_188_baf3, hras_180_gap, hras_180_rbd, kras_165_gap, kras_165_gapgef, kras_173_gapgef, kras_165_gef, kras_173_gef, kras_165_rbd, kras_173_gap, kras_173_rbd, hras_166_gapgef, krasq61l_173_gap, and krasq61l_173_rbd.).
Hidalgo et al. eLife 2022 paper¶
The figures of the Hidalgo et al. eLife 2022 paper were created using
mutagenesis-visualization. To retrieve the objects, just instantiate the
class DemoObjects
and then access each of the datasets.
# instantiate
demo_objects = DemoObjects()
# access the datasets
demo_objects.hras_188_baf3.heatmap()
Beta Lactamase¶
Create object¶
#https://www.uniprot.org/uniprot/P62593#sequences
# Order of amino acid substitutions in the hras_enrichment dataset
aminoacids: List[str] = list(DEMO_DATASETS['df_bla'].index)
neworder_aminoacids: List[str] = list('DEKHRGNQASTPCVYMILFW')
# First residue of the hras_enrichment dataset. Because 1-Met was not mutated, the dataset starts at residue 2
start_position = DEMO_DATASETS['df_bla'].columns[0]
# Define sequence. If you dont know the start of the sequence, just add X's
sequence_bla_x = 'MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRP' + 'EERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVREL' + 'CSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTM' + 'PAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGS' + 'RGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW'
# Define secondary structure
secondary_bla = [['L0'] * 23, ['α1'] * (38 - 23), ['L1'] * 2, ['β1'] * (48 - 40),
['L2'] * 5, ['β2'] * (57 - 53), ['L3'] * (68 - 57), ['α2'] * (84 - 68),
['L4'] * (95 - 84), ['α3'] * (100 - 95), ['L5'] * (103 - 100),
['α4'] * (110 - 103), ['L6'] * (116 - 110), ['α5'] * (140 - 116),
['L7'] * (1), ['α6'] * (153 - 141), ['L8'] * (164 - 153),
['α7'] * (169 - 164), ['L9'] * (179 - 169), ['α8'] * (194 - 179), ['L10'] *
3, ['α9'] * (210 - 197), ['L11'] * (227 - 210), ['β3'] * (235 - 227),
['L12'] * (240 - 235), ['β4'] * (249 - 240), ['L13'] * (254 - 249),
['β5'] * (262 - 254), ['L14'] * (266 - 262), ['α10'] * (286 - 266)]
bla_obj: Screen = Screen(
DEMO_DATASETS['df_bla'], sequence_bla_x, aminoacids, start_position, 0, secondary_bla
)
2D Plots¶
# Create full heatmap
bla_obj.heatmap(
colorbar_scale=(-3, 3),
neworder_aminoacids=neworder_aminoacids,
title='Beta Lactamase',
show_cartoon=True,
)
# Miniheatmap
bla_obj.miniheatmap(
title='Wt residue Beta Lactamase',
neworder_aminoacids=neworder_aminoacids,
)
# Positional mean
bla_obj.enrichment_bar(
figsize=[10, 2.5],
mode='mean',
show_cartoon=True,
yscale=[-3, 0.25],
title='',
)
# Kernel
bla_obj.kernel(
histogram=True, title='Beta Lactamase', xscale=[-4, 1]
)
# Graph bar of the mean of each secondary motif
bla_obj.secondary_mean(
yscale=[-1.5, 0],
figsize=[5, 2],
title='Mean of secondary motifs',
)
# Correlation between amino acids
bla_obj.correlation(
colorbar_scale=[0.5, 1],
title='Correlation',
neworder_aminoacids=neworder_aminoacids,
)
# Explained variability by amino acid
bla_obj.individual_correlation(
yscale=[0, 0.6],
title='Explained variability by amino acid',
)
# PCA by amino acid substitution
bla_obj.pca(
title='',
dimensions=[0, 1],
figsize=(2, 2),
adjustlabels=True,
)
# PCA by secondary structure motif
bla_obj.pca(
title='',
mode='secondary',
dimensions=[0, 1],
figsize=(2, 2),
adjustlabels=True,
)









3D Plots¶
# Plot 3-D plot
bla_obj.plotly_scatter_3d(
mode='mean',
pdb_path=PDB_1ERM,
position_correction=2,
title='Scatter 3D',
squared=False,
x_label='x',
y_label='y',
z_label='z',
)
# Plot 3-D of distance to center of protein, SASA and B-factor
bla_obj.plotly_scatter_3d_pdbprop(
plot=['Distance', 'SASA', 'log B-factor'],
position_correction=2,
pdb_path=PDB_1ERM,
title='Scatter 3D - PDB properties',
)