Graph¶
-
class
comptox_ai.graph.
Graph
(data: comptox_ai.graph.io.GraphDataMixin)¶ A graph representation of ComptoxAI data.
The internal data storage can be in several different formats, each of which has advantages in different scenarios.
Read more in the User Guide.
- Parameters
- datacomptox_ai.graph.io.GraphDataMixin
A graph data structure that is of one of the formats compliant with ComptoxAI’s standardized graph API.
- Attributes
- format{“graphsage”, “networkx”, “neo4j”}
Internal format of the graph data. The format determines many aspects of how you interact with the graph, including the set of methods that can be called on it and the types of models that you can construct without first converting to another format.
Methods
add_edges
(self, edges, tuple])Add one or more edges to the graph.
add_nodes
(self, nodes, tuple])Add one or more nodes to the graph.
classes
(self)Get a list of ontology classes present in the graph.
convert
(self, to_fmt)Convert the graph data structure into the specified format.
edges
(self)Get all edges in the graph and return as an iterable of tuples.
from_dgl
()Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).
from_graphsage
(prefix, directory)Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.
from_neo4j
(config_file, verbose)Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.
Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.
nodes
(self)Get all nodes in the graph and return as an iterable of tuples.
is_heterogeneous
node_id_map
-
add_edges
(self, edges: Union[List[tuple], tuple])¶ Add one or more edges to the graph.
-
add_nodes
(self, nodes: Union[List[tuple], tuple])¶ Add one or more nodes to the graph.
-
classes
(self)¶ Get a list of ontology classes present in the graph.
-
convert
(self, to_fmt: str)¶ Convert the graph data structure into the specified format.
The actual graph contained in a comptox_ai.Graph can be in a variety of different formats. When the user loads a graph
-
edges
(self)¶ Get all edges in the graph and return as an iterable of tuples.
- Returns
- iterable
Iterable over tuples containing graph edge triples.
-
classmethod
from_dgl
()¶ Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).
- Raises
- NotImplementedError
-
classmethod
from_graphsage
(prefix: str, directory: str = None)¶ Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.
- Parameters
- prefixstr
The prefix used at the beginning of each file name (see above for format specification).
- directorystr, default=None
The directory (fully specified or relative) containing the data files to load.
See also
Notes
The parameters should point to files with the following structure:
- {prefix}-G.json
JSON file containing a NetworkX ‘node link’ instance of the input graph. GraphSAGE usually expects there to be ‘val’ and ‘test’ attributes on each node indicating if they are part of the validation and test sets, but this isn’t enforced by ComptoxAI (at least not currently).
- {prefix}-id_map.json
A JSON object that maps graph node ids (integers) to consecutive integers (0-indexed).
- {prefix}-class_map.json (OPTIONAL)
A JSON object that maps graph node ids (integers) to a one-hot list of binary class membership (e.g., {2: [0, 0, 1, 0, 1]} means that node 2 is a member of classes 3 and 5). NOTE: While this is shown as a mandatory component of a dataset in GraphSAGE’s documentation, we don’t enforce that. NOTE: The notion of a class in terms of GraphSAGE is different than the notion of a class in heterogeneous network theory. Here, a ‘class’ is a label to be used in a supervised learning setting (such as classifying chemicals as likely carcinogens versus likely non-carcinogens).
- {prefix}-feats.npy (OPTIONAL)
A NumPy ndarray containing numerical node features. NOTE: This serialization is currently not compatible with heterogeneous graphs, as GraphSAGE was originally implemented for nonheterogeneous graphs only.
- {prefix}-walks.txt (OPTIONAL)
A text file containing precomputed random walks along the graph. Each line is a pair of node integers (e.g., the second fields in the id_map file) indicating an edge included in random walks. The lines should be arranged in ascending order, starting with the first item in each pair.
-
classmethod
from_neo4j
(config_file: str = None, verbose: bool = False)¶ Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.
NOTE: All we do here is create a driver for the graph database; the Neo4j constructor handles building the node index and other important attributes. This is different from most of the other formats, where the attributes are provided by the constructor
- Parameters
- config_filestr, default None
Path to a ComptoxAI configuration file. If None, ComptoxAI will search for a configuration file in the default location. For more information, refer to http://comptox.ai/docs/guide/building.html.
- Raises
- RuntimeError
If the data in the configuration file does not point to a valid Neo4j graph database.
See also
-
classmethod
from_networkx
()¶ Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.
See also
-
nodes
(self)¶ Get all nodes in the graph and return as an iterable of tuples.
- Returns
- iterable
Iterable over 2-tuples containing graph nodes. The first element is the node’s integer ID and the second is the URI of that node (if available).
-
class
comptox_ai.graph.
Neo4jData
(database: py2neo.database.Database, verbose: bool = False)¶ Internal representation of a connection to a Neo4j graph database containing ComptoxAI data.
Importantly, this data structure does not load the complete contents of the database into Python’s memory space. This places significantly less demand on system resources when not executing large queries or performing complex data manipulations. This representation is also able to unload a fair deal of logic onto Neo4j’s standard library in implementing various standardized operations.
The recommended way to instantiate this class is by calling comptox_ai.Graph.from_neo4j(), which handles establishing a database driver connection.
- Parameters
- driverneo4j.Driver
A driver connected to a Neo4j graph database containing ComptoxAI data.
- Attributes
- edges
- is_heterogeneous
nodes
Get a list of all nodes corresponding to a named individual in the ComptoxAI ontology.
Methods
add_edge
(self, edge)Add an edge to the graph and synchronize it to the remote database.
add_edges
(self, edges)Add a list of edges to the graph and synchronize them to the remote database.
add_node
(self, node)Add a node to the graph and synchronize it to the remote database.
add_nodes
(self, nodes)Add a list of nodes to the graph and synchronize them to the remote database.
node_labels
(self)Get all node labels from ns0.
run_query_in_session
(self, query)Submit a cypher query transaction to the connected graph database driver and return the response to the calling function.
save_graph
standardize_edge
standardize_node
-
add_edge
(self, edge: tuple)¶ Add an edge to the graph and synchronize it to the remote database.
-
add_edges
(self, edges: List[tuple])¶ Add a list of edges to the graph and synchronize them to the remote database.
-
add_node
(self, node: tuple)¶ Add a node to the graph and synchronize it to the remote database.
- Parameters
- nodetuple of (int, label, **props)
Node to add to the graph.
-
add_nodes
(self, nodes: List[tuple])¶ Add a list of nodes to the graph and synchronize them to the remote database.
-
node_labels
(self)¶ Get all node labels from ns0.
- Returns
- set of str
Set of ontology labels (as strings) present in the graph schema.
-
property
nodes
¶ Get a list of all nodes corresponding to a named individual in the ComptoxAI ontology.
- Returns
- list of py2neo.Node
List of all Neo4j nodes corresponding to a named individual.
-
run_query_in_session
(self, query: str)¶ Submit a cypher query transaction to the connected graph database driver and return the response to the calling function.
- Parameters
- querystr
String representation of the cypher query to be executed.
- Returns
- list of neo4j.Record
-
class
comptox_ai.graph.
NetworkXData
(graph: networkx.classes.digraph.DiGraph = None)¶ - Attributes
- edges
- is_heterogeneous
- nodes
Methods
NetworkxJsonEncoder
(*[, skipkeys, …])When encoding JSON, sets are converted to lists.
add_edge
(self, edge)Add one edge to the graph from a tuple.
add_edges
(self, edges)Add one or more edges to the graph from a list of tuples.
save_graph
(self[, format])Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.
add_node
add_nodes
-
class
NetworkxJsonEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶ When encoding JSON, sets are converted to lists.
Methods
default
(self, o)Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).encode
(self, o)Return a JSON string representation of a Python data structure.
iterencode
(self, o[, _one_shot])Encode the given object and yield each string representation as available.
-
default
(self, o)¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
-
add_edge
(self, edge: tuple)¶ Add one edge to the graph from a tuple.
The tuple should be formatted as follows:
( {ID of u}, {relationship type}, {ID of v}, {dict of edge properties (leave empty if none)} )
- Parameters
- edgetuple
Tuple containing edge data (see above for format specification).
-
add_edges
(self, edges: List[tuple])¶ Add one or more edges to the graph from a list of tuples.
See also
add_edge
Add a single edge from a tuple
-
save_graph
(self, format='')¶ Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.
Notes
Users should not need to interact with these JSON files directly, but for reference they should be formatted similarly to the following example:
{ 'directed': True, 'multigraph': False, 'graph': {}, 'nodes': [ { 'ns0__xrefPubchemCID': 71392231, 'ns0__xrefPubChemSID': 316343675, 'ns0__inchi': 'InChI=1S/C8H12Cl2N4S2/c1-5(9)3-11-7(15)13-14-8(16)12-4-6(2)10/h1-4H2,(H2,11,13,15)(H2,12,14,16)', 'ns0__xrefCasRN': '61784-89-2', 'uri': 'http://jdr.bio/ontologies/comptox.owl#chem_n1n2bis2chloroprop2en1ylhydrazine12dicarbothioamide', 'ns0__xrefDtxsid': 'DTXSID70814050', 'ns0__inchiKey': 'UAZDGQNKXGUQPD-UHFFFAOYSA-N', 'LABELS': ['ns0__Chemical', 'Resource', 'owl__NamedIndividual'], 'id': 0 }, ... ], 'links': [ { 'TYPE': 'ns0__keyEventTriggeredBy', 'source': 46954, 'target': 47667 }, ... ] }
Notice that
'graph'
is empty - the contents of the graph are entirely specified in the'nodes'
and'links'
lists.
-
class
comptox_ai.graph.
GraphSAGEData
(graph: networkx.classes.digraph.DiGraph, node_map: Iterable = None, edge_map: Iterable = None, node_classes: List[str] = None, edge_classes: List[str] = None, node_features: Union[numpy.ndarray, pandas.core.frame.DataFrame] = None, edge_features: Union[numpy.ndarray, pandas.core.frame.DataFrame] = None)¶ Internal representation of a GraphSAGE formatted graph/dataset.
This is essentially a NetworkX graph with a few extra data components needed to run the GraphSAGE algorithms. It also provides a more flexible way to work with node features, which are stored in a separate NumPy array (which, unfortunately, isn’t natively compatible with heterogeneous graphs).
- Parameters
- graphnx.DiGraph
A NetworkX directed graph containing the nodes and edges that define the topology of the ComptoxAI graph database. Nodes are identified by the ID assigned to them by Neo4j.
- node_mapIterable, default=None
An iterable where each element maps a Neo4j node id (int) to a consecutively numbered index, used to map nodes to columns of the (optional) matrix of node features. If None, a node map will be generated from scratch.
- edge_mapIterable, default=None
Currently not implemented (:TODO:)
- node_classeslist of str, default=None
Membership for classes to be used in supervised learning tasks. NOTE: there is a semantic difference between the notion of ‘node classes’ in an ontology / graph database (which specifies the semantic type(s) of entities) versus in supervised learning (a target variable used to learn a decision function), although they may be equivalent in some settings.
- edge_classeslist of str, default=None
Currently not implemented (:TODO:)
- node_featuresarray-like, default=None
Array of node features.
- edge_featuresarray-like, default=None
Array of edge features.
- Attributes
- edges
is_heterogeneous
Return True if graph is heterogeneous, False otherwise.
- nodes
Methods
add_edge
(self, edge, str, int])Add an edge to GraphSAGE.
add_node
(self, node, \*\*kwargs)Add a node to GraphSAGE.
add_edges
add_nodes
save_graph
-
add_edge
(self, edge: Tuple[int, str, int])¶ Add an edge to GraphSAGE.
Edge format: 3-tuple with format:
( {ID of u}, {relationship label (str)}, {ID of v} )
If the edge does not have a label, you should use the empty string (‘’) as the second element of edge.
- Parameters
- edgeTuple[int, str, int]
A tuple to add to the GraphSAGE dataset.
-
add_node
(self, node: int, \*\*kwargs)¶ Add a node to GraphSAGE.
A node is simply an ID corresponding to a node in the Neo4j graph. Node features aren’t tied to the NetworkX digraph under GraphSAGE, instead, they are stored in _node_features.
- Parameters
- nodeint
A Neo4j node id
- kwargs :
-
property
is_heterogeneous
¶ Return True if graph is heterogeneous, False otherwise.