Creating heatmaps

This section shows how to use the mutagenesis_visualization package. The plotting functions can be used regardless of how you process your data. For the examples, we are using two datasets that are derived from Pradeep’s legacy. [1]

Import modules

# running locally, if you pip install then you just have to import the module
%matplotlib inline
from typing import List
import numpy as np
import matplotlib as plt
import copy
from mutagenesis_visualization import Screen
from mutagenesis_visualization.main.utils.data_paths import HRAS_RBD_COUNTS_CSV, HRAS_GAPGEF_COUNTS_CSV

Create object of class Screen

Class reviewed in this section:
  • mutagenesis_visualization.main.classes.screen.Screen

In order to create plots, the first step is to create a Screen.object. The enrichment scores will be passed using the parameter dataset . The protein sequence sequence and the amino acid substitutions order aminoacids need to be defined for the object to be created. Adding the secondary structure secondary is optional, but without it some plots will not work. In this example, we are importing two datasets and creating two objects named hras_GAPGEF and hras_RBD.

# Load enrichment scores. This is how you would load them from a local file.
hras_enrichment_GAPGEF = np.genfromtxt(HRAS_GAPGEF_COUNTS_CSV, delimiter=',')
hras_enrichment_RBD = np.genfromtxt(HRAS_RBD_COUNTS_CSV, delimiter=',')

# Define protein sequence

# Order of amino acid substitutions in the hras_enrichment dataset
aminoacids: List[str] = list('ACDEFGHIKLMNPQRSTVWY*')

# First residue of the hras_enrichment dataset. Because 1-Met was not mutated, the dataset starts at residue 2
start_position: int = 2

# Define secondary structure
secondary = [['L0'], ['β1'] * (9 - 1), ['L1'] * (15 - 9), ['α1'] * (25 - 15),
             ['L2'] * (36 - 25), ['β2'] * (46 - 36), ['L3'] * (48 - 46),
             ['β3'] * (58 - 48), ['L4'] * (64 - 58), ['α2'] * (74 - 64),
             ['L5'] * (76 - 74), ['β4'] * (83 - 76), ['L6'] * (86 - 83),
             ['α3'] * (103 - 86), ['L7'] * (110 - 103), ['β5'] * (116 - 110),
             ['L8'] * (126 - 116), ['α4'] * (137 - 126), ['L9'] * (140 - 137),
             ['β6'] * (143 - 140), ['L10'] * (151 - 143), ['α5'] * (172 - 151),
             ['L11'] * (190 - 172)]

# Substitute Nan values with 0
fillna: int = 0

# Create objects
hras_GAPGEF: Screen = Screen(
    hras_enrichment_GAPGEF, hras_sequence, aminoacids, start_position, fillna,
hras_RBD: Screen = Screen(
    hras_enrichment_RBD, hras_sequence, aminoacids, start_position, fillna,


Methods reviewed in this section:

Once the object hras_RBD is created, we will plot a heatmap of the enrichment scores using the method object.heatmap.

# Create full heatmap
hras_RBD.heatmap(title='H-Ras 2-166', show_cartoon=True)

If you set the parameter hierarchical=True, it will sort the columns using hierarchical clustering

hras_RBD.heatmap(title='H-Ras 2-166', hierarchical=True, output_file=None)

You can change the scale and the color map using the parameters colorbar_scale and colormap. You can also mask self-substitutions (ie T2T) by setting mask_selfsubstitutions=True. The noise in the assay may cause self-substitutions to have a score different than 0, which may confuse the reader. If you use this masking, please make sure that there is no systematic error related to the centering of the data.

# Load a color map from matplotlib
colormap = copy.copy(('PuOr')))

# Change scale and colormap
    title='H-Ras 2-166',
    colorbar_scale=(-2, 2),

If you set the parameter show_snv=True, the algorithm will color green every mutation that is not a single nucleotide variant (SNV) of the wild-type protein. You will notice how many mutations are not accessible through a nucleotide change. This option may be useful to you so you can quickly evaluate which mutations are accessible through random DNA mutations. In the example of Ras, the frequency of non-SNV substitutions at residues 12 and 13 is dramatically lower.

# Create full heatmap showing only SNV mutants
    title='H-Ras 2-166', show_cartoon=True, show_snv=True)

We can slice the full heatmap by either showing only some columns or some rows. To show only a few amino acid mutational profiles (rows), we will use the method object.heatmap_rows. Note that we need to specify which amino acids to show with selection.

Heatmap slices

# Create heatmap of selected aminoacid substitutions
    title='H-Ras 2-166',
    selection=['E', 'Q', 'A', 'P', 'V', 'Y'],

If we want to display only a few positions in the protein (columns), we will use the method object.heatmap_columns. The parameter segment will indicate which are the contigous columns to show.

# Create a heatmap of a subset region in the protein
hras_RBD.heatmap_columns(segment=[20, 40])


A summarized heatmap can also be generated. It is useful to evaluate global trends in the data. The command to use is object.miniheatmap.

# Condensed heatmap
hras_RBD.miniheatmap(title='Wt residue H-Ras')

Now lets look at the effect of having a certain residue in front the mutated residue. For instance, the column of prolines is the average of all the columns that had a proline in the n-1 position. To accomplish this, set offset=-1.

# Condensed heatmap offset no background correction
    title='Wt residue H-Ras',

Now lets do a background correction by setting background_correction=True. To the calculated values, it will subtract the mean enrichment score for every substitution type. In the example, proline is the only residues than wen situated before the mutation, it seems to have a detrimental effect.

# Condensed heatmap offset with background correction
    title='Wt residue H-Ras',


[1]Bandaru, P., Shah, N. H., Bhattacharyya, M., Barton, J. P., Kondo, Y., Cofsky, J. C., … Kuriyan, J. (2017). Deconstruction of the Ras switching cycle through saturation mutagenesis. ELife, 6. DOI: 10.7554/eLife.27810