Frequent Subgraph

The pyemap.graph_mining.SubgraphPattern object stores all of the data related to a subgraph pattern identified by the mining algorithm. This class is responsible for finding protein subgraphs in each PDB, and clustering them based on similarity.

class pyemap.graph_mining.SubgraphPattern(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]

Stores all information regarding an identified subgraph pattern.

id

Unique identifier for subgraph pattern.

Type:: str

G

Graph representation of this subgraph pattern.

Type:: networkx.Graph

support

List of PDB IDs which contain this subgraph

Type:: list of str

protein_subgraphs

Dict which contains protein subgraphs which match this pattern. Each entry has a unique identifier and a networkx.Graph derived from the graphs generated by the emap class which match the pattern of this pattern.

Type:: dict of str: networkx.Graph

groups

Protein subgraph (IDs) clustered into groups based on similarity

Type:: dict of str: list of str

support_number

Number of PDBs this subgraph pattern was identified in

Type:: int

__init__(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]

Initializes SubgraphPattern object.

Parameters:

G (networkx.Graph) – Graph representation of this subgraph pattern
graph_number (int) – Unique numerical ID for this subgraph pattern
support (list of str) – List of PDB IDs which contain this subgraph
res_to_num_label (dict of str: int) – Mapping of residue types to numerical node labels
edge_thresholds (list of float) – Edge thresholds which define edge labels
rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs

find_protein_subgraphs(clustering_option='structural', rmsd_thresh=0.5, *args, **kwargs)[source]

Finds protein subgraphs which match this pattern.

This function must be executed to analyze protein subgraphs.

Parameters:

clustering_option (str, optional) – Either ‘structural’ or ‘sequence’
rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs

Notes

Graphs are clustered by both sequence and structrual similarity, and the results are stored in self._structural_groups and self._sequence_groups. The clustering_option argument used here determines which one of these groupings is used for self.groups. This can be changed at any time by calling set_clustering() and specifying the other clustering option.

full_report()[source]

Returns a full report of all protein subgraphs which match this pattern.

Returns:: full_str – Report of protein subgraphs which match this pattern
Return type:: str

general_report()[source]

Generates general report which describes this subgraph pattern.

Returns:: full_str – General report which describes this subgraph pattern
Return type:: str

set_clustering(clustering_option)[source]

Sets clustering option.

Parameters:: clustering_option (str) – Either ‘structural’ or ‘sequence’.

Notes

Since both types of clustering are always computed by pyemap.graph_mining.SubgraphPattern.find_protein_subgraphs() all this function actually does is swap some private variables. The purpose of this function is to determine what kind of clustering gets shown in the report.

subgraph_to_Image(id=None)[source]

Returns PIL image of subgraph pattern or protein subgraph

Parameters:: id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn
Returns:: img
Return type:: PIL.Image.Image

subgraph_to_file(id=None, dest='')[source]

Saves image of subgraph pattern or protein subgraph to file

Parameters:

id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn
str (dest;) – Destination to save the graph
optional – Destination to save the graph

visualize_subgraph_in_nglview(id, view)[source]

Visualize pathway in nglview widget

Parameters:

id (str) – Subgraph id to be visualized
view (nglview.widget.NGLWidget) – NGL Viewer widget