Graph¶
- class comptox_ai.graph.Graph(data: GraphDataMixin)¶
A graph representation of ComptoxAI data.
The internal data storage can be in several different formats, each of which has advantages in different scenarios.
Read more in the User Guide.
- Parameters:
- datacomptox_ai.graph.io.GraphDataMixin
A graph data structure that is of one of the formats compliant with ComptoxAI’s standardized graph API.
- Attributes:
- format{“graphsage”, “networkx”, “neo4j”}
Internal format of the graph data. The format determines many aspects of how you interact with the graph, including the set of methods that can be called on it and the types of models that you can construct without first converting to another format.
Methods
add_edges
(edges)Add one or more edges to the graph.
add_nodes
(nodes)Add one or more nodes to the graph.
classes
()Get a list of ontology classes present in the graph.
convert
(to_fmt)Convert the graph data structure into the specified format.
edges
()Get all edges in the graph and return as an iterable of tuples.
from_dgl
()Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).
from_graphsage
(prefix[, directory])Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.
from_neo4j
([config_file, verbose])Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.
Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.
nodes
()Get all nodes in the graph and return as an iterable of tuples.
is_heterogeneous
node_id_map
- add_edges(edges: List[tuple] | tuple)¶
Add one or more edges to the graph.
- Parameters:
- edgestuple or list of tuple
Edge or edges to add to the graph.
- add_nodes(nodes: List[tuple] | tuple)¶
Add one or more nodes to the graph.
- classes()¶
Get a list of ontology classes present in the graph.
- convert(to_fmt: str)¶
Convert the graph data structure into the specified format.
The actual graph contained in a comptox_ai.Graph can be in a variety of different formats. When the user loads a graph
- edges()¶
Get all edges in the graph and return as an iterable of tuples.
- Returns:
- iterable
Iterable over tuples containing graph edge triples.
- classmethod from_dgl()¶
Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).
- Raises:
- NotImplementedError
- classmethod from_graphsage(prefix: str, directory: str | None = None)¶
Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.
- Parameters:
- prefixstr
The prefix used at the beginning of each file name (see above for format specification).
- directorystr, default=None
The directory (fully specified or relative) containing the data files to load.
See also
Notes
The parameters should point to files with the following structure:
- {prefix}-G.json
JSON file containing a NetworkX ‘node link’ instance of the input graph. GraphSAGE usually expects there to be ‘val’ and ‘test’ attributes on each node indicating if they are part of the validation and test sets, but this isn’t enforced by ComptoxAI (at least not currently).
- {prefix}-id_map.json
A JSON object that maps graph node ids (integers) to consecutive integers (0-indexed).
- {prefix}-class_map.json (OPTIONAL)
A JSON object that maps graph node ids (integers) to a one-hot list of binary class membership (e.g., {2: [0, 0, 1, 0, 1]} means that node 2 is a member of classes 3 and 5). NOTE: While this is shown as a mandatory component of a dataset in GraphSAGE’s documentation, we don’t enforce that. NOTE: The notion of a class in terms of GraphSAGE is different than the notion of a class in heterogeneous network theory. Here, a ‘class’ is a label to be used in a supervised learning setting (such as classifying chemicals as likely carcinogens versus likely non-carcinogens).
- {prefix}-feats.npy (OPTIONAL)
A NumPy ndarray containing numerical node features. NOTE: This serialization is currently not compatible with heterogeneous graphs, as GraphSAGE was originally implemented for nonheterogeneous graphs only.
- {prefix}-walks.txt (OPTIONAL)
A text file containing precomputed random walks along the graph. Each line is a pair of node integers (e.g., the second fields in the id_map file) indicating an edge included in random walks. The lines should be arranged in ascending order, starting with the first item in each pair.
- classmethod from_neo4j(config_file: str | None = None, verbose: bool = False)¶
Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.
NOTE: The support for Neo4jData has been deprecated. This function will raise an error if called.
PREVIOUS_NOTE: All we do here is create a driver for the graph database; the Neo4j constructor handles building the node index and other important attributes. This is different from most of the other formats, where the attributes are provided by the constructor
- Parameters:
- config_filestr, default None
Path to a ComptoxAI configuration file. If None, ComptoxAI will search for a configuration file in the default location. For more information, refer to http://comptox.ai/docs/guide/building.html.
- Raises:
- NotImplementedError
- Since Neo4jData is deprecated, this method no longer supports
- creating an object from Neo4j data.
- PREVIOUS_NOTE:
- RuntimeError
If the data in the configuration file does not point to a valid Neo4j graph database.
See also
comptox_ai.graph.Neo4jData
- classmethod from_networkx()¶
Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.
See also
- nodes()¶
Get all nodes in the graph and return as an iterable of tuples.
- Returns:
- iterable
Iterable over 2-tuples containing graph nodes. The first element is the node’s integer ID and the second is the URI of that node (if available).
- class comptox_ai.graph.GraphSAGEData(graph: DiGraph, node_map: Iterable | None = None, edge_map: Iterable | None = None, node_classes: List[str] | None = None, edge_classes: List[str] | None = None, node_features: ndarray | DataFrame | None = None, edge_features: ndarray | DataFrame | None = None)¶
Internal representation of a GraphSAGE formatted graph/dataset.
This is essentially a NetworkX graph with a few extra data components needed to run the GraphSAGE algorithms. It also provides a more flexible way to work with node features, which are stored in a separate NumPy array (which, unfortunately, isn’t natively compatible with heterogeneous graphs).
- Parameters:
- graphnx.DiGraph
A NetworkX directed graph containing the nodes and edges that define the topology of the ComptoxAI graph database. Nodes are identified by the ID assigned to them by Neo4j.
- node_mapIterable, default=None
An iterable where each element maps a Neo4j node id (int) to a consecutively numbered index, used to map nodes to columns of the (optional) matrix of node features. If None, a node map will be generated from scratch.
- edge_mapIterable, default=None
Currently not implemented (:TODO:)
- node_classeslist of str, default=None
Membership for classes to be used in supervised learning tasks. NOTE: there is a semantic difference between the notion of ‘node classes’ in an ontology / graph database (which specifies the semantic type(s) of entities) versus in supervised learning (a target variable used to learn a decision function), although they may be equivalent in some settings.
- edge_classeslist of str, default=None
Currently not implemented (:TODO:)
- node_featuresarray-like, default=None
Array of node features.
- edge_featuresarray-like, default=None
Array of edge features.
- Attributes:
- edges
is_heterogeneous
Return True if graph is heterogeneous, False otherwise.
- nodes
Methods
add_edge
(edge)Add an edge to GraphSAGE.
add_node
(node, **kwargs)Add a node to GraphSAGE.
add_edges
add_nodes
save_graph
- add_edge(edge: Tuple[int, str, int])¶
Add an edge to GraphSAGE.
Edge format: 3-tuple with format:
( {ID of u}, {relationship label (str)}, {ID of v} )
If the edge does not have a label, you should use the empty string (‘’) as the second element of edge.
- Parameters:
- edgeTuple[int, str, int]
A tuple to add to the GraphSAGE dataset.
- add_node(node: int, **kwargs)¶
Add a node to GraphSAGE.
A node is simply an ID corresponding to a node in the Neo4j graph. Node features aren’t tied to the NetworkX digraph under GraphSAGE, instead, they are stored in _node_features.
- Parameters:
- nodeint
A Neo4j node id
- kwargs
- property is_heterogeneous¶
Return True if graph is heterogeneous, False otherwise.
- class comptox_ai.graph.NetworkXData(graph: DiGraph | None = None)¶
- Attributes:
- edges
- is_heterogeneous
- nodes
Methods
NetworkxJsonEncoder
(*[, skipkeys, ...])When encoding JSON, sets are converted to lists.
add_edge
(edge)Add one edge to the graph from a tuple.
add_edges
(edges)Add one or more edges to the graph from a list of tuples.
save_graph
([format])Save NetworkX representation of ComptoxAI's knowledge graph to disk in JSON "node-link" format.
add_node
add_nodes
- class NetworkxJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶
When encoding JSON, sets are converted to lists.
Methods
default
(o)Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).encode
(o)Return a JSON string representation of a Python data structure.
iterencode
(o[, _one_shot])Encode the given object and yield each string representation as available.
- default(o)¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- encode(o)¶
Return a JSON string representation of a Python data structure.
>>> from json.encoder import JSONEncoder >>> JSONEncoder().encode({"foo": ["bar", "baz"]}) '{"foo": ["bar", "baz"]}'
- iterencode(o, _one_shot=False)¶
Encode the given object and yield each string representation as available.
For example:
for chunk in JSONEncoder().iterencode(bigobject): mysocket.write(chunk)
- add_edge(edge: tuple)¶
Add one edge to the graph from a tuple.
The tuple should be formatted as follows:
( {ID of u}, {relationship type}, {ID of v}, {dict of edge properties (leave empty if none)} )
- Parameters:
- edgetuple
Tuple containing edge data (see above for format specification).
- add_edges(edges: List[tuple])¶
Add one or more edges to the graph from a list of tuples.
See also
add_edge
Add a single edge from a tuple
- save_graph(format='')¶
Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.
Notes
Users should not need to interact with these JSON files directly, but for reference they should be formatted similarly to the following example:
{ 'directed': True, 'multigraph': False, 'graph': {}, 'nodes': [ { 'ns0__xrefPubchemCID': 71392231, 'ns0__xrefPubChemSID': 316343675, 'ns0__inchi': 'InChI=1S/C8H12Cl2N4S2/c1-5(9)3-11-7(15)13-14-8(16)12-4-6(2)10/h1-4H2,(H2,11,13,15)(H2,12,14,16)', 'ns0__xrefCasRN': '61784-89-2', 'uri': 'http://jdr.bio/ontologies/comptox.owl#chem_n1n2bis2chloroprop2en1ylhydrazine12dicarbothioamide', 'ns0__xrefDtxsid': 'DTXSID70814050', 'ns0__inchiKey': 'UAZDGQNKXGUQPD-UHFFFAOYSA-N', 'LABELS': ['ns0__Chemical', 'Resource', 'owl__NamedIndividual'], 'id': 0 }, ... ], 'links': [ { 'TYPE': 'ns0__keyEventTriggeredBy', 'source': 46954, 'target': 47667 }, ... ] }
Notice that
'graph'
is empty - the contents of the graph are entirely specified in the'nodes'
and'links'
lists.