Graph

class comptox_ai.graph.Graph(data: GraphDataMixin)

A graph representation of ComptoxAI data.

The internal data storage can be in several different formats, each of which has advantages in different scenarios.

Read more in the User Guide.

Parameters:
datacomptox_ai.graph.io.GraphDataMixin

A graph data structure that is of one of the formats compliant with ComptoxAI’s standardized graph API.

Attributes:
format{“graphsage”, “networkx”, “neo4j”}

Internal format of the graph data. The format determines many aspects of how you interact with the graph, including the set of methods that can be called on it and the types of models that you can construct without first converting to another format.

Methods

add_edges(edges)

Add one or more edges to the graph.

add_nodes(nodes)

Add one or more nodes to the graph.

classes()

Get a list of ontology classes present in the graph.

convert(to_fmt)

Convert the graph data structure into the specified format.

edges()

Get all edges in the graph and return as an iterable of tuples.

from_dgl()

Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).

from_graphsage(prefix[, directory])

Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.

from_neo4j([config_file, verbose])

Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.

from_networkx()

Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.

nodes()

Get all nodes in the graph and return as an iterable of tuples.

is_heterogeneous

node_id_map

add_edges(edges: List[tuple] | tuple)

Add one or more edges to the graph.

Parameters:
edgestuple or list of tuple

Edge or edges to add to the graph.

add_nodes(nodes: List[tuple] | tuple)

Add one or more nodes to the graph.

classes()

Get a list of ontology classes present in the graph.

convert(to_fmt: str)

Convert the graph data structure into the specified format.

The actual graph contained in a comptox_ai.Graph can be in a variety of different formats. When the user loads a graph

edges()

Get all edges in the graph and return as an iterable of tuples.

Returns:
iterable

Iterable over tuples containing graph edge triples.

classmethod from_dgl()

Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).

Raises:
NotImplementedError
classmethod from_graphsage(prefix: str, directory: str | None = None)

Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.

Parameters:
prefixstr

The prefix used at the beginning of each file name (see above for format specification).

directorystr, default=None

The directory (fully specified or relative) containing the data files to load.

Notes

The parameters should point to files with the following structure:

{prefix}-G.json

JSON file containing a NetworkX ‘node link’ instance of the input graph. GraphSAGE usually expects there to be ‘val’ and ‘test’ attributes on each node indicating if they are part of the validation and test sets, but this isn’t enforced by ComptoxAI (at least not currently).

{prefix}-id_map.json

A JSON object that maps graph node ids (integers) to consecutive integers (0-indexed).

{prefix}-class_map.json (OPTIONAL)

A JSON object that maps graph node ids (integers) to a one-hot list of binary class membership (e.g., {2: [0, 0, 1, 0, 1]} means that node 2 is a member of classes 3 and 5). NOTE: While this is shown as a mandatory component of a dataset in GraphSAGE’s documentation, we don’t enforce that. NOTE: The notion of a class in terms of GraphSAGE is different than the notion of a class in heterogeneous network theory. Here, a ‘class’ is a label to be used in a supervised learning setting (such as classifying chemicals as likely carcinogens versus likely non-carcinogens).

{prefix}-feats.npy (OPTIONAL)

A NumPy ndarray containing numerical node features. NOTE: This serialization is currently not compatible with heterogeneous graphs, as GraphSAGE was originally implemented for nonheterogeneous graphs only.

{prefix}-walks.txt (OPTIONAL)

A text file containing precomputed random walks along the graph. Each line is a pair of node integers (e.g., the second fields in the id_map file) indicating an edge included in random walks. The lines should be arranged in ascending order, starting with the first item in each pair.

classmethod from_neo4j(config_file: str | None = None, verbose: bool = False)

Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.

NOTE: The support for Neo4jData has been deprecated. This function will raise an error if called.

PREVIOUS_NOTE: All we do here is create a driver for the graph database; the Neo4j constructor handles building the node index and other important attributes. This is different from most of the other formats, where the attributes are provided by the constructor

Parameters:
config_filestr, default None

Path to a ComptoxAI configuration file. If None, ComptoxAI will search for a configuration file in the default location. For more information, refer to http://comptox.ai/docs/guide/building.html.

Raises:
NotImplementedError
Since Neo4jData is deprecated, this method no longer supports
creating an object from Neo4j data.
PREVIOUS_NOTE:
RuntimeError

If the data in the configuration file does not point to a valid Neo4j graph database.

See also

comptox_ai.graph.Neo4jData
classmethod from_networkx()

Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.

nodes()

Get all nodes in the graph and return as an iterable of tuples.

Returns:
iterable

Iterable over 2-tuples containing graph nodes. The first element is the node’s integer ID and the second is the URI of that node (if available).

class comptox_ai.graph.GraphSAGEData(graph: DiGraph, node_map: Iterable | None = None, edge_map: Iterable | None = None, node_classes: List[str] | None = None, edge_classes: List[str] | None = None, node_features: ndarray | DataFrame | None = None, edge_features: ndarray | DataFrame | None = None)

Internal representation of a GraphSAGE formatted graph/dataset.

This is essentially a NetworkX graph with a few extra data components needed to run the GraphSAGE algorithms. It also provides a more flexible way to work with node features, which are stored in a separate NumPy array (which, unfortunately, isn’t natively compatible with heterogeneous graphs).

Parameters:
graphnx.DiGraph

A NetworkX directed graph containing the nodes and edges that define the topology of the ComptoxAI graph database. Nodes are identified by the ID assigned to them by Neo4j.

node_mapIterable, default=None

An iterable where each element maps a Neo4j node id (int) to a consecutively numbered index, used to map nodes to columns of the (optional) matrix of node features. If None, a node map will be generated from scratch.

edge_mapIterable, default=None

Currently not implemented (:TODO:)

node_classeslist of str, default=None

Membership for classes to be used in supervised learning tasks. NOTE: there is a semantic difference between the notion of ‘node classes’ in an ontology / graph database (which specifies the semantic type(s) of entities) versus in supervised learning (a target variable used to learn a decision function), although they may be equivalent in some settings.

edge_classeslist of str, default=None

Currently not implemented (:TODO:)

node_featuresarray-like, default=None

Array of node features.

edge_featuresarray-like, default=None

Array of edge features.

Attributes:
edges
is_heterogeneous

Return True if graph is heterogeneous, False otherwise.

nodes

Methods

add_edge(edge)

Add an edge to GraphSAGE.

add_node(node, **kwargs)

Add a node to GraphSAGE.

add_edges

add_nodes

save_graph

add_edge(edge: Tuple[int, str, int])

Add an edge to GraphSAGE.

Edge format: 3-tuple with format:

(
    {ID of u},
    {relationship label (str)},
    {ID of v}
)

If the edge does not have a label, you should use the empty string (‘’) as the second element of edge.

Parameters:
edgeTuple[int, str, int]

A tuple to add to the GraphSAGE dataset.

add_node(node: int, **kwargs)

Add a node to GraphSAGE.

A node is simply an ID corresponding to a node in the Neo4j graph. Node features aren’t tied to the NetworkX digraph under GraphSAGE, instead, they are stored in _node_features.

Parameters:
nodeint

A Neo4j node id

kwargs
property is_heterogeneous

Return True if graph is heterogeneous, False otherwise.

class comptox_ai.graph.NetworkXData(graph: DiGraph | None = None)
Attributes:
edges
is_heterogeneous
nodes

Methods

NetworkxJsonEncoder(*[, skipkeys, ...])

When encoding JSON, sets are converted to lists.

add_edge(edge)

Add one edge to the graph from a tuple.

add_edges(edges)

Add one or more edges to the graph from a list of tuples.

save_graph([format])

Save NetworkX representation of ComptoxAI's knowledge graph to disk in JSON "node-link" format.

add_node

add_nodes

class NetworkxJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

When encoding JSON, sets are converted to lists.

Methods

default(o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

encode(o)

Return a JSON string representation of a Python data structure.

iterencode(o[, _one_shot])

Encode the given object and yield each string representation as available.

default(o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
encode(o)

Return a JSON string representation of a Python data structure.

>>> from json.encoder import JSONEncoder
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
iterencode(o, _one_shot=False)

Encode the given object and yield each string representation as available.

For example:

for chunk in JSONEncoder().iterencode(bigobject):
    mysocket.write(chunk)
add_edge(edge: tuple)

Add one edge to the graph from a tuple.

The tuple should be formatted as follows:

(
    {ID of u},
    {relationship type},
    {ID of v},
    {dict of edge properties (leave empty if none)}
)
Parameters:
edgetuple

Tuple containing edge data (see above for format specification).

add_edges(edges: List[tuple])

Add one or more edges to the graph from a list of tuples.

See also

add_edge

Add a single edge from a tuple

save_graph(format='')

Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.

Notes

Users should not need to interact with these JSON files directly, but for reference they should be formatted similarly to the following example:

{
    'directed': True,
    'multigraph': False,
    'graph': {},
    'nodes': [
        {
            'ns0__xrefPubchemCID': 71392231,
            'ns0__xrefPubChemSID': 316343675,
            'ns0__inchi': 'InChI=1S/C8H12Cl2N4S2/c1-5(9)3-11-7(15)13-14-8(16)12-4-6(2)10/h1-4H2,(H2,11,13,15)(H2,12,14,16)',
            'ns0__xrefCasRN': '61784-89-2',
            'uri': 'http://jdr.bio/ontologies/comptox.owl#chem_n1n2bis2chloroprop2en1ylhydrazine12dicarbothioamide',
            'ns0__xrefDtxsid': 'DTXSID70814050',
            'ns0__inchiKey': 'UAZDGQNKXGUQPD-UHFFFAOYSA-N',
            'LABELS': ['ns0__Chemical', 'Resource', 'owl__NamedIndividual'],
            'id': 0
        },
        ...
    ],
    'links': [
        {
            'TYPE': 'ns0__keyEventTriggeredBy',
            'source': 46954,
            'target': 47667
        },
        ...
    ]
}

Notice that 'graph' is empty - the contents of the graph are entirely specified in the 'nodes' and 'links' lists.