Frequent Subgraph
The pyemap.graph_mining.SubgraphPattern object stores all of the data related to a subgraph
pattern identified by the mining algorithm. This class is responsible for finding protein subgraphs in each PDB,
and clustering them based on similarity.
- class pyemap.graph_mining.SubgraphPattern(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]
Stores all information regarding an identified subgraph pattern.
- id
Unique identifier for subgraph pattern.
- Type:
str
- G
Graph representation of this subgraph pattern.
- Type:
- support
List of PDB IDs which contain this subgraph
- Type:
list of str
- protein_subgraphs
Dict which contains protein subgraphs which match this pattern. Each entry has a unique identifier and a
networkx.Graphderived from the graphs generated by theemapclass which match the pattern of this pattern.- Type:
dict of str:
networkx.Graph
- groups
Protein subgraph (IDs) clustered into groups based on similarity
- Type:
dict of str: list of str
- support_number
Number of PDBs this subgraph pattern was identified in
- Type:
int
- __init__(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]
Initializes SubgraphPattern object.
- Parameters:
G (
networkx.Graph) – Graph representation of this subgraph patterngraph_number (int) – Unique numerical ID for this subgraph pattern
support (list of str) – List of PDB IDs which contain this subgraph
res_to_num_label (dict of str: int) – Mapping of residue types to numerical node labels
edge_thresholds (list of float) – Edge thresholds which define edge labels
rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs
- find_protein_subgraphs(clustering_option='structural', rmsd_thresh=0.5, *args, **kwargs)[source]
Finds protein subgraphs which match this pattern.
This function must be executed to analyze protein subgraphs.
- Parameters:
clustering_option (str, optional) – Either ‘structural’ or ‘sequence’
rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs
Notes
Graphs are clustered by both sequence and structrual similarity, and the results are stored in self._structural_groups and self._sequence_groups. The clustering_option argument used here determines which one of these groupings is used for self.groups. This can be changed at any time by calling
set_clustering()and specifying the other clustering option.
- full_report()[source]
Returns a full report of all protein subgraphs which match this pattern.
- Returns:
full_str – Report of protein subgraphs which match this pattern
- Return type:
str
- general_report()[source]
Generates general report which describes this subgraph pattern.
- Returns:
full_str – General report which describes this subgraph pattern
- Return type:
str
- set_clustering(clustering_option)[source]
Sets clustering option.
- Parameters:
clustering_option (str) – Either ‘structural’ or ‘sequence’.
Notes
Since both types of clustering are always computed by
pyemap.graph_mining.SubgraphPattern.find_protein_subgraphs()all this function actually does is swap some private variables. The purpose of this function is to determine what kind of clustering gets shown in the report.
- subgraph_to_Image(id=None)[source]
Returns PIL image of subgraph pattern or protein subgraph
- Parameters:
id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn
- Returns:
img
- Return type:
- subgraph_to_file(id=None, dest='')[source]
Saves image of subgraph pattern or protein subgraph to file
- Parameters:
id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn
str (dest;) – Destination to save the graph
optional – Destination to save the graph
- visualize_subgraph_in_nglview(id, view)[source]
Visualize pathway in nglview widget
- Parameters:
id (str) – Subgraph id to be visualized
view (
nglview.widget.NGLWidget) – NGL Viewer widget