Graph

class comptox_ai.graph.Graph(data: comptox_ai.graph.io.GraphDataMixin)

A graph representation of ComptoxAI data.

The internal data storage can be in several different formats, each of which has advantages in different scenarios.

Read more in the User Guide.

Parameters
datacomptox_ai.graph.io.GraphDataMixin

A graph data structure that is of one of the formats compliant with ComptoxAI’s standardized graph API.

Attributes
format{“graphsage”, “networkx”, “neo4j”}

Internal format of the graph data. The format determines many aspects of how you interact with the graph, including the set of methods that can be called on it and the types of models that you can construct without first converting to another format.

Methods

add_edges(self, edges, tuple])

Add one or more edges to the graph.

add_nodes(self, nodes, tuple])

Add one or more nodes to the graph.

classes(self)

Get a list of ontology classes present in the graph.

convert(self, to_fmt)

Convert the graph data structure into the specified format.

edges(self)

Get all edges in the graph and return as an iterable of tuples.

from_dgl()

Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).

from_graphsage(prefix, directory)

Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.

from_neo4j(config_file, verbose)

Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.

from_networkx()

Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.

nodes(self)

Get all nodes in the graph and return as an iterable of tuples.

is_heterogeneous

node_id_map

add_edges(self, edges: Union[List[tuple], tuple])

Add one or more edges to the graph.

add_nodes(self, nodes: Union[List[tuple], tuple])

Add one or more nodes to the graph.

classes(self)

Get a list of ontology classes present in the graph.

convert(self, to_fmt: str)

Convert the graph data structure into the specified format.

The actual graph contained in a comptox_ai.Graph can be in a variety of different formats. When the user loads a graph

edges(self)

Get all edges in the graph and return as an iterable of tuples.

Returns
iterable

Iterable over tuples containing graph edge triples.

classmethod from_dgl()

Create a ComptoxAI graph, populating the contents from a DGL graph (not yet implemented).

Raises
NotImplementedError
classmethod from_graphsage(prefix: str, directory: str = None)

Create a new GraphSAGE data structure from files formatted according to the examples given in https://github.com/williamleif/GraphSAGE.

Parameters
prefixstr

The prefix used at the beginning of each file name (see above for format specification).

directorystr, default=None

The directory (fully specified or relative) containing the data files to load.

Notes

The parameters should point to files with the following structure:

{prefix}-G.json

JSON file containing a NetworkX ‘node link’ instance of the input graph. GraphSAGE usually expects there to be ‘val’ and ‘test’ attributes on each node indicating if they are part of the validation and test sets, but this isn’t enforced by ComptoxAI (at least not currently).

{prefix}-id_map.json

A JSON object that maps graph node ids (integers) to consecutive integers (0-indexed).

{prefix}-class_map.json (OPTIONAL)

A JSON object that maps graph node ids (integers) to a one-hot list of binary class membership (e.g., {2: [0, 0, 1, 0, 1]} means that node 2 is a member of classes 3 and 5). NOTE: While this is shown as a mandatory component of a dataset in GraphSAGE’s documentation, we don’t enforce that. NOTE: The notion of a class in terms of GraphSAGE is different than the notion of a class in heterogeneous network theory. Here, a ‘class’ is a label to be used in a supervised learning setting (such as classifying chemicals as likely carcinogens versus likely non-carcinogens).

{prefix}-feats.npy (OPTIONAL)

A NumPy ndarray containing numerical node features. NOTE: This serialization is currently not compatible with heterogeneous graphs, as GraphSAGE was originally implemented for nonheterogeneous graphs only.

{prefix}-walks.txt (OPTIONAL)

A text file containing precomputed random walks along the graph. Each line is a pair of node integers (e.g., the second fields in the id_map file) indicating an edge included in random walks. The lines should be arranged in ascending order, starting with the first item in each pair.

classmethod from_neo4j(config_file: str = None, verbose: bool = False)

Load a connection to a Neo4j graph database and use it to instantiate a comptox_ai.graph.io.Neo4j object.

NOTE: All we do here is create a driver for the graph database; the Neo4j constructor handles building the node index and other important attributes. This is different from most of the other formats, where the attributes are provided by the constructor

Parameters
config_filestr, default None

Path to a ComptoxAI configuration file. If None, ComptoxAI will search for a configuration file in the default location. For more information, refer to http://comptox.ai/docs/guide/building.html.

Raises
RuntimeError

If the data in the configuration file does not point to a valid Neo4j graph database.

classmethod from_networkx()

Create a new ComptoxAI graph from a JSON node-link graph file, storing the data as a NetworkX graph.

nodes(self)

Get all nodes in the graph and return as an iterable of tuples.

Returns
iterable

Iterable over 2-tuples containing graph nodes. The first element is the node’s integer ID and the second is the URI of that node (if available).

class comptox_ai.graph.Neo4jData(database: py2neo.database.Database, verbose: bool = False)

Internal representation of a connection to a Neo4j graph database containing ComptoxAI data.

Importantly, this data structure does not load the complete contents of the database into Python’s memory space. This places significantly less demand on system resources when not executing large queries or performing complex data manipulations. This representation is also able to unload a fair deal of logic onto Neo4j’s standard library in implementing various standardized operations.

The recommended way to instantiate this class is by calling comptox_ai.Graph.from_neo4j(), which handles establishing a database driver connection.

Parameters
driverneo4j.Driver

A driver connected to a Neo4j graph database containing ComptoxAI data.

Attributes
edges
is_heterogeneous
nodes

Get a list of all nodes corresponding to a named individual in the ComptoxAI ontology.

Methods

add_edge(self, edge)

Add an edge to the graph and synchronize it to the remote database.

add_edges(self, edges)

Add a list of edges to the graph and synchronize them to the remote database.

add_node(self, node)

Add a node to the graph and synchronize it to the remote database.

add_nodes(self, nodes)

Add a list of nodes to the graph and synchronize them to the remote database.

node_labels(self)

Get all node labels from ns0.

run_query_in_session(self, query)

Submit a cypher query transaction to the connected graph database driver and return the response to the calling function.

save_graph

standardize_edge

standardize_node

add_edge(self, edge: tuple)

Add an edge to the graph and synchronize it to the remote database.

add_edges(self, edges: List[tuple])

Add a list of edges to the graph and synchronize them to the remote database.

add_node(self, node: tuple)

Add a node to the graph and synchronize it to the remote database.

Parameters
nodetuple of (int, label, **props)

Node to add to the graph.

add_nodes(self, nodes: List[tuple])

Add a list of nodes to the graph and synchronize them to the remote database.

node_labels(self)

Get all node labels from ns0.

Returns
set of str

Set of ontology labels (as strings) present in the graph schema.

property nodes

Get a list of all nodes corresponding to a named individual in the ComptoxAI ontology.

Returns
list of py2neo.Node

List of all Neo4j nodes corresponding to a named individual.

run_query_in_session(self, query: str)

Submit a cypher query transaction to the connected graph database driver and return the response to the calling function.

Parameters
querystr

String representation of the cypher query to be executed.

Returns
list of neo4j.Record
class comptox_ai.graph.NetworkXData(graph: networkx.classes.digraph.DiGraph = None)
Attributes
edges
is_heterogeneous
nodes

Methods

NetworkxJsonEncoder(*[, skipkeys, …])

When encoding JSON, sets are converted to lists.

add_edge(self, edge)

Add one edge to the graph from a tuple.

add_edges(self, edges)

Add one or more edges to the graph from a list of tuples.

save_graph(self[, format])

Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.

add_node

add_nodes

class NetworkxJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

When encoding JSON, sets are converted to lists.

Methods

default(self, o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

encode(self, o)

Return a JSON string representation of a Python data structure.

iterencode(self, o[, _one_shot])

Encode the given object and yield each string representation as available.

default(self, o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
add_edge(self, edge: tuple)

Add one edge to the graph from a tuple.

The tuple should be formatted as follows:

(
    {ID of u},
    {relationship type},
    {ID of v},
    {dict of edge properties (leave empty if none)}
)
Parameters
edgetuple

Tuple containing edge data (see above for format specification).

add_edges(self, edges: List[tuple])

Add one or more edges to the graph from a list of tuples.

See also

add_edge

Add a single edge from a tuple

save_graph(self, format='')

Save NetworkX representation of ComptoxAI’s knowledge graph to disk in JSON “node-link” format.

Notes

Users should not need to interact with these JSON files directly, but for reference they should be formatted similarly to the following example:

{
    'directed': True,
    'multigraph': False,
    'graph': {},
    'nodes': [
        {
            'ns0__xrefPubchemCID': 71392231,
            'ns0__xrefPubChemSID': 316343675,
            'ns0__inchi': 'InChI=1S/C8H12Cl2N4S2/c1-5(9)3-11-7(15)13-14-8(16)12-4-6(2)10/h1-4H2,(H2,11,13,15)(H2,12,14,16)',
            'ns0__xrefCasRN': '61784-89-2',
            'uri': 'http://jdr.bio/ontologies/comptox.owl#chem_n1n2bis2chloroprop2en1ylhydrazine12dicarbothioamide',
            'ns0__xrefDtxsid': 'DTXSID70814050',
            'ns0__inchiKey': 'UAZDGQNKXGUQPD-UHFFFAOYSA-N',
            'LABELS': ['ns0__Chemical', 'Resource', 'owl__NamedIndividual'],
            'id': 0
        },
        ...
    ],
    'links': [
        {
            'TYPE': 'ns0__keyEventTriggeredBy',
            'source': 46954,
            'target': 47667
        },
        ...
    ]
}

Notice that 'graph' is empty - the contents of the graph are entirely specified in the 'nodes' and 'links' lists.

class comptox_ai.graph.GraphSAGEData(graph: networkx.classes.digraph.DiGraph, node_map: Iterable = None, edge_map: Iterable = None, node_classes: List[str] = None, edge_classes: List[str] = None, node_features: Union[numpy.ndarray, pandas.core.frame.DataFrame] = None, edge_features: Union[numpy.ndarray, pandas.core.frame.DataFrame] = None)

Internal representation of a GraphSAGE formatted graph/dataset.

This is essentially a NetworkX graph with a few extra data components needed to run the GraphSAGE algorithms. It also provides a more flexible way to work with node features, which are stored in a separate NumPy array (which, unfortunately, isn’t natively compatible with heterogeneous graphs).

Parameters
graphnx.DiGraph

A NetworkX directed graph containing the nodes and edges that define the topology of the ComptoxAI graph database. Nodes are identified by the ID assigned to them by Neo4j.

node_mapIterable, default=None

An iterable where each element maps a Neo4j node id (int) to a consecutively numbered index, used to map nodes to columns of the (optional) matrix of node features. If None, a node map will be generated from scratch.

edge_mapIterable, default=None

Currently not implemented (:TODO:)

node_classeslist of str, default=None

Membership for classes to be used in supervised learning tasks. NOTE: there is a semantic difference between the notion of ‘node classes’ in an ontology / graph database (which specifies the semantic type(s) of entities) versus in supervised learning (a target variable used to learn a decision function), although they may be equivalent in some settings.

edge_classeslist of str, default=None

Currently not implemented (:TODO:)

node_featuresarray-like, default=None

Array of node features.

edge_featuresarray-like, default=None

Array of edge features.

Attributes
edges
is_heterogeneous

Return True if graph is heterogeneous, False otherwise.

nodes

Methods

add_edge(self, edge, str, int])

Add an edge to GraphSAGE.

add_node(self, node, \*\*kwargs)

Add a node to GraphSAGE.

add_edges

add_nodes

save_graph

add_edge(self, edge: Tuple[int, str, int])

Add an edge to GraphSAGE.

Edge format: 3-tuple with format:

(
    {ID of u},
    {relationship label (str)},
    {ID of v}
)

If the edge does not have a label, you should use the empty string (‘’) as the second element of edge.

Parameters
edgeTuple[int, str, int]

A tuple to add to the GraphSAGE dataset.

add_node(self, node: int, \*\*kwargs)

Add a node to GraphSAGE.

A node is simply an ID corresponding to a node in the Neo4j graph. Node features aren’t tied to the NetworkX digraph under GraphSAGE, instead, they are stored in _node_features.

Parameters
nodeint

A Neo4j node id

kwargs :
property is_heterogeneous

Return True if graph is heterogeneous, False otherwise.