Classification
Introduction
The efficiency and descriptive power of graph mining is enhanced when the algorithms are able to distinguish between different types of nodes and edges. Graph mining in PyeMap relies on each node and edge in the graph database being assigned a numerical label which corresponds to its category. PyeMap offers some customization of these labels in order to broaden or narrow the search space.
Nodes
By default, each standard amino acid residue receives its own category, and all non-standard residues
included in the analysis are labeled as ‘NP’ for non-protein (processed internally as ‘#’). One can specify a group
of standard amino-acid residue types to be given the label ‘X’ (which is standard notation for unknown residue type), which enables
these residues to be substituted for another in subgraph patterns. This is done by passing a list of 1-letter amino acid characters as
the sub keyword argument to generate_graph_database().
Example
Set isoleucine and leucine to be given the label ‘X’ in subgraph patterns.
pg.generate_graph_database(sub=['I','L'])
Edges
By default, all edges are assigned the same numerical label of 1. One can classify edges based on their weights by passing the edge_thresholds
argument to generate_graph_database(). edge_thresholds should be formatted
as a list of floats in ascending order, where each value indicates a cutoff threshold for an edge category.
Example
pg.generate_graph_database(edge_thresholds=[8,12])