Frequent Subgraph

The pyemap.graph_mining.SubgraphPattern object stores all of the data related to a subgraph pattern identified by the mining algorithm. This class is responsible for finding protein subgraphs in each PDB, and clustering them based on similarity.

class pyemap.graph_mining.SubgraphPattern(G, graph_number, support, res_to_num_label, edge_thresholds)[source]

Stores all information regarding an identified subgraph pattern.

id

Unique identifier for subgraph pattern.

Type

str

G

Graph representation of this subgraph pattern.

Type

networkx.Graph

support

List of PDB IDs which contain this subgraph

Type

list of str

protein_subgraphs

Dict which contains protein subgraphs which match this pattern. Each entry has a unique identifier and a networkx.Graph derived from the graphs generated by the emap class which match the pattern of this pattern.

Type

dict of str: networkx.Graph

groups

Protein subgraph (IDs) clustered into groups based on similarity

Type

dict of str: list of str

support_number

Number of PDBs this subgraph pattern was identified in

Type

int

__init__(G, graph_number, support, res_to_num_label, edge_thresholds)[source]

Initializes SubgraphPattern object.

Parameters
  • G (networkx.Graph) – Graph representation of this subgraph pattern

  • graph_number (int) – Unique numerical ID for this subgraph pattern

  • support (list of str) – List of PDB IDs which contain this subgraph

  • res_to_num_label (dict of str: int) – Mapping of residue types to numerical node labels

  • edge_thresholds (list of float) – Edge thresholds which define edge labels

find_protein_subgraphs(clustering_option='structural')[source]

Finds protein subgraphs which match this pattern.

This function must be executed to analyze protein subgraphs.

Parameters

clustering_option (str, optional) – Either ‘structural’ or ‘sequence’

Notes

Graphs are clustered by both sequence and structrual similarity, and the results are stored in self._structural_groups and self._sequence_groups. The clustering_option argument used here determines which one of these groupings is used for self.groups. This can be changed at any time by calling set_clustering() and specifying the other clustering option.

full_report()[source]

Returns a full report of all protein subgraphs which match this pattern.

Returns

full_str – Report of protein subgraphs which match this pattern

Return type

str

general_report()[source]

Generates general report which describes this subgraph pattern.

Returns

full_str – General report which describes this subgraph pattern

Return type

str

set_clustering(clustering_option)[source]

Sets clustering option.

Parameters

clustering_option (str) – Either ‘structural’ or ‘sequence’.

Notes

Since both types of clustering are always computed by pyemap.graph_mining.SubgraphPattern.find_protein_subgraphs() all this function actually does is swap some private variables. The purpose of this function is to determine what kind of clustering gets shown in the report.

subgraph_to_Image(id=None)[source]

Returns PIL image of subgraph pattern or protein subgraph

Parameters

id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn

Returns

img

Return type

PIL.Image.Image

subgraph_to_file(id=None, dest='')[source]

Saves image of subgraph pattern or protein subgraph to file

Parameters
  • id (str, optional) – Protein subgraph ID. If not specified, generic subgraph pattern will be drawn

  • str (dest;) – Destination to save the graph

  • optional – Destination to save the graph

visualize_subgraph_in_nglview(id, view)[source]

Visualize pathway in nglview widget

Parameters