Tutorial: Protein Graph Mining
This tutorial can help you start working with the pyemap.graph_mining
module
of PyeMap.
Parsing
The first step of graph mining with PyeMap is to create a PDBGroup
object,
and populate it with emap
objects for the PDBs of interest.
Here, we’ll fetch and parse a small set of flavoprotein PDBs.
import pyemap
from pyemap.graph_mining import PDBGroup
pdb_ids = ['1X0P', '1DNP', '1EFP', '1G28', '1IQR', '1IQU',
'1NP7', '1O96', '1O97', '1QNF', '1U3C', '1U3D', '2IYG', '2J4D',
'2WB2', '2Z6C', '3FY4', '3ZXS', '4EER', '4GU5', '4I6G', '4U63',
'6FN2', '6KII', '6LZ3', '6PU0', '6RKF']
pg = PDBGroup('My Group')
for pdb in pdb_ids:
pg.add_emap(pyemap.fetch_and_parse(pdb))
Generating Protein Graphs
The next step is to generate the graphs for each PDB.
One can specify the chains, ET active moieties, residues,
and any other kwargs from process()
. If no arguments
are specified, the first chain from each PDB will be chosen, and all ET active moieties on those chains will be included. See process_emaps()
for more details.
pg.process_emaps()
Generate graph database
The next step is to classify the nodes and edges. One can define “substitutions” for nodes, and thresholds for edges.
See generate_graph_database()
for more details.
# W and Y would be interchangeable and given the label 'X'
substitutions = ['W','Y']
# edges with weights > 12 are distinguished from those with weights < 12
edge_thresholds = [12]
pg.generate_graph_database(sub=[],edge_thresh=edge_thresholds)
Mine for subgraphs
There are two types of searches available in PyeMap.
mine for all possible subgraphs:
pg.run_gspan(19)
search for a specific pattern:
pg.find_subgraph('WWW#')
Analysis: Subgraph patterns
The identified subgraphs are stored in the subgraph_patterns dictionary.
>>> pg.subgraph_patterns
{'1_WWW#_18': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x10d652430>,
'2_WWW#_14': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x183c3a1f0>,
'3_WWW#_4': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x183c3aca0>,
'4_WWW#_2': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x183c2de50>,
'5_WWW#_2': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x183c24a30>,
'6_WWW#_2': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x183c24730>,
'7_WWW#_1': <pyemap.graph_mining.frequent_subgraph.SubgraphPattern at 0x12c680f10>}
The subgraph pattern can be visualized using pyemap.graph_mining.SubgraphPattern.subgraph_to_Image()
or
pyemap.graph_mining.SubgraphPattern.subgraph_to_file()
.
sg = pg.subgraph_patterns['1_WWW#_18']
sg.subgraph_to_Image()
Analysis: Protein subgraphs
To identify the specific residues in each PDB involved in the identified patterns,
one should first call pyemap.graph_mining.SubgraphPattern.find_protein_subgraphs()
. The
identified protein subgraphs are stored in the protein_subgraphs dictionary. Each protein subgraph
has a unique ID.
sg.find_protein_subgraphs()
sg.protein_subgraphs
{'4U63_1': <networkx.classes.graph.Graph at 0x183d58fa0>,
'4U63_2': <networkx.classes.graph.Graph at 0x183b66be0>,
'4U63_3': <networkx.classes.graph.Graph at 0x183cf7040>,
'4U63_4': <networkx.classes.graph.Graph at 0x183d53eb0>,
'4U63_5': <networkx.classes.graph.Graph at 0x183d51220> ...
To visualize a protein subgraph, use pyemap.graph_mining.SubgraphPattern.subgraph_to_Image()
or
pyemap.graph_mining.SubgraphPattern.subgraph_to_file()
and pass the ID as a keyword argument.
sg.subgraph_to_Image('1U3D_51')
Clustering
Protein subgraphs are clustered into groups based on sequence or structural similarity.
By default, structural clustering is used. To switch to sequence clustering, call pyemap.graph_mining.SubgraphPattern.set_clustering()
.
print(sg.groups)
{1: ['4U63_33', '1DNP_34', '2J4D_40', '1IQR_43', '1IQU_47'], ...
44: ['3ZXS_60'] ... }
Visualize in NGLView
Protein subgraphs can be visualized in the crystal structure using the NGLView Jupyter Widget.
Pass the pathway ID of interest along with a nglview.widget.NGLWidget
object to
the visualize_subgraph_in_nglview()
function.
import nglview as nv
view = nv.show_file(sg.support['1U3D'].file_path)
view.clear_representations()
view.add_cartoon(color="lightgray")
sg.visualize_subgraph_in_nglview('1U3D_50',view)
view