Frequent Subgraph

The pyemap.graph_mining.SubgraphPattern object stores all of the data related to a subgraph pattern identified by the mining algorithm. This class is responsible for finding protein subgraphs in each PDB, and clustering them based on similarity.

class pyemap.graph_mining.SubgraphPattern(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]

Stores all information regarding an identified subgraph pattern.

id

Unique identifier for subgraph pattern.

Type:

str

G

Graph representation of this subgraph pattern.

Type:

networkx.Graph

support

List of PDB IDs which contain this subgraph

Type:

list of str

protein_subgraphs

Dict which contains protein subgraphs which match this pattern. Each entry has a unique identifier and a networkx.Graph derived from the graphs generated by the emap class which match the pattern of this pattern.

Type:

dict of str: networkx.Graph

groups

Protein subgraph (IDs) clustered into groups based on similarity

Type:

dict of str: list of str

support_number

Number of PDBs this subgraph pattern was identified in

Type:

int

__init__(G, graph_number, support, res_to_num_label, edge_thresholds, rmsd_thresh)[source]

Initializes SubgraphPattern object.

Parameters:
  • G (networkx.Graph) – Graph representation of this subgraph pattern

  • graph_number (int) – Unique numerical ID for this subgraph pattern

  • support (list of str) – List of PDB IDs which contain this subgraph

  • res_to_num_label (dict of str: int) – Mapping of residue types to numerical node labels

  • edge_thresholds (list of float) – Edge thresholds which define edge labels

  • rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs

find_protein_subgraphs(clustering_option='structural', rmsd_thresh=0.5, *args, **kwargs)[source]

Finds protein subgraphs which match this pattern.

This function must be executed to analyze protein subgraphs.

Parameters:
  • clustering_option (str, optional) – Either ‘structural’ or ‘sequence’

  • rmsd_thresh (float) – threshold for determining structural similarity between identified subgraphs

Notes

Graphs are clustered by both sequence and structrual similarity, and the results are stored in self._structural_groups and self._sequence_groups. The clustering_option argument used here determines which one of these groupings is used for self.groups. This can be changed at any time by calling set_clustering() and specifying the other clustering option.

full_report()[source]

Returns a full report of all protein subgraphs which match this pattern.

Returns:

full_str – Report of protein subgraphs which match this pattern

Return type:

str

general_report()[source]

Generates general report which describes this subgraph pattern.

Returns:

full_str – General report which describes this subgraph pattern

Return type:

str

set_clustering(clustering_option)[source]

Sets clustering option.

Parameters:

clustering_option (str) – Either ‘structural’ or ‘sequence’.

Notes

Since both types of clustering are always computed by pyemap.graph_mining.SubgraphPattern.find_protein_subgraphs() all this function actually does is swap some private variables. The purpose of this function is to determine what kind of clustering gets shown in the report.

subgraph_to_Image(id=None)[source]

Returns PIL image of subgraph pattern or protein subgraph

Parameters:

id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn

Returns:

img

Return type:

PIL.Image.Image

subgraph_to_file(id=None, dest='')[source]

Saves image of subgraph pattern or protein subgraph to file

Parameters:
  • id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn

  • str (dest;) – Destination to save the graph

  • optional – Destination to save the graph

visualize_subgraph_in_nglview(id, view)[source]

Visualize pathway in nglview widget

Parameters: