Usage
Graph mining with PyeMap occurs in 4 steps:
Generate protein graphs
Classify nodes and edges
Find subgraph patterns
Find and cluster protein subgraphs
Step 1: Generate Protein Graphs
The first step is to fetch and/or parse a list of PDBs using fetch_and_parse() or parse(), and add them
to the PDBGroup object. Once all of the eMap objects have been collected, the second step is to
call process_emaps(), which uses
the same infrastructure as Single Protein Analysis to generate the protein graphs.
pg = pyemap.graph_mining.PDBGroup("My Group")
#pdb_ids = ['1u3d','1iqr'...]
for pdb in pdb_ids:
pg.add_emap(pyemap.fetch_and_parse(pdb))
Step 2: Classify nodes and edges
The next step is to classify the nodes and edges using generate_graph_database(). In
many cases, this function can be called with no arguments, but in some cases it can be useful to allow for node substitutions, or to specify
edge thresholds. See the classification section for more details.
pg.generate_graph_database()
Step 3: Find Subgraph Patterns
The next step is to find subgraph patterns which are shared among the protein graphs. One can either mine for all patterns up to a given support threshold:
pg.run_gspan(10)
Or search for a particular pattern or set of patterns:
pg.find_subgraph('WWW#')
See the mining section for more details.
Step 4: Find and cluster protein subgraphs
The results of the mining calculation are stored in the subgraph_patterns dictionary as
SubgraphPattern objects. To find protein subgraphs,
the find_protein_subgraphs() function
must be called for the subgraph pattern of interest. The identified protein subgraphs are stored in
the protein_subgraphs attribute of the SubgraphPattern object,
and the clustering is described by the groups attribute. One can switch between different types of
clustering using the set_clustering() function.
sg = pg.subgraph_patterns['1_WWW#_18']
sg.find_protein_subgraphs()
sg.set_clustering("sequence")
# print results, including clustering
print(sg.full_report())