ComptoxAI Databases

ComptoxAI relies mainly on two databases:

  1. A graph database, implemented in Neo4j

  2. A feature database, implemented in MongoDB

Briefly, the graph database is designed to show the relationships between entities that are relevant to ComptoxAI, comprising a large, complex network structure. The feature database contains quantitative data tied to the entities that make up the graph database. We separate these into two databases largely for performance reasons - graph databases aren’t especially good at storing large quantities of numerical data for each entity, while relational and NoSQL databases (like MongoDB) don’t provide easy interaction with complex network structures.

The comptox_ai.db.GraphDB and comptox_ai.db.FeatureDB classes aim to make interacting with the two relatively painless. For example, you can extract a graph from the graph database, and in a single command, fetch the feature data corresponding to the entities in that graph.

Graph database software is changing constantly. We’re more than willing to consider migrating to a single database solution when a good option is available. If you think you know of a good alternative, let us know on GitHub.

class comptox_ai.db.FeatureDB(config_file=None, verbose=True)

A database of feature data for entities in ComptoxAI, implemented as a document store in MongoDB.

Parameters
config_filestr, default None

Relative path to a config file containing a “NEO4J” block, as described below. If None, ComptoxAI will look in the ComptoxAI root directory for either a “CONFIG.cfg” file or “CONFIG-default.cfg”, in that order. If no config file can be found in any of those locations, an exception will be raised.

verbose: bool, default True

Sets verbosity to on or off. If True, status information will be returned to the user occasionally.

Methods

is_connected(self)

Return True if the connection to MongoDB is active and valid.

fetch

get_corresponding_graph

is_connected(self)

Return True if the connection to MongoDB is active and valid.

Returns
bool

True if there is an active connection to MongoDB, otherwise False.

Notes

This performs a pretty crude test at the moment - basically, if the client can see any databases, the check passes. A more robust approach might be good to implement in the future.

class comptox_ai.db.GraphDB(config_file=None, verbose=True)

A Neo4j graph database containing ComptoxAI graph data.

Parameters
config_filestr, default None

Relative path to a config file containing a “NEO4J” block, as described below. If None, ComptoxAI will look in the ComptoxAI root directory for either a “CONFIG.cfg” file or “CONFIG-default.cfg”, in that order. If no config file can be found in any of those locations, an exception will be raised.

verbose: bool, default True

Sets verbosity to on or off. If True, status information will be returned to the user occasionally.

Methods

build_graph_cypher_projection(self, …[, …])

Create a new graph in the Neo4j Graph Catalog via a Cypher projection.

build_graph_native_projection(self, …[, …])

Create a new graph in the Neo4j Graph Catalog via a native projection.

drop_all_existing_graphs(self)

Delete all graphs currently stored in the GDS graph catalog.

drop_existing_graph(self, graph_name)

Delete a single graph from the GDS graph catalog by graph name.

fetch(self, field, operator, value[, what, …])

Create and execute a query to retrieve nodes, edges, or both.

fetch_edges(self, edges)

Fetch edges (relationships) from the Neo4j graph database.

fetch_nodes(self, nodes)

Fetch nodes from the Neo4j graph database.

get_features(self, graph)

Fetch arrays of features corresponding to entities in a graph from MongoDB.

list_existing_graphs(self)

Fetch a list of projected subgraphs stored in the GDS graph catalog.

run_cypher(self, qry_str[, verbose])

Evaluate a Cypher query on the Neo4j graph database.

build_graph_cypher_projection(self, graph_name, node_query, relationship_query, config_dict=None)

Create a new graph in the Neo4j Graph Catalog via a Cypher projection.

Examples

>>> g = GraphDB()
>>> g.build_graph_cypher_projection(...)
>>> 
build_graph_native_projection(self, graph_name, node_proj, relationship_proj, config_dict=None)

Create a new graph in the Neo4j Graph Catalog via a native projection.

Parameters
graph_namestr

A (string) name for identifying the new graph. If a graph already exists with this name, a ValueError will be raised.

node_projstr, list of str, or dict of

Node projection for the new graph. This can be either a single node label, a list of node labels, or a node projection

Notes

ComptoxAI is meant to hide the implementation and usage details of graph databases from the user, but some advanced features do expose the syntax used in the Neo4j and MongoDB internals. This is especially true when building graph projections in the graph catalog. The following components

NODE PROJECTIONS:

(corresponding argument: `node_proj`)

Node projections take the following format:

{
  <node-label-1>: {
      label: <neo4j-label>,
      properties: <node-property-mappings>
  },
  <node-label-2>: {
      label: <neo4j-label>,
      properties: <node-property-mappings>
  },
  // ...
  <node-label-n>: {
      label: <neo4j-label>,
      properties: <node-property-mappings>
  }
}

where node-label-i is a name for a node label in the projected graph (it can be the same as or different from the label already in neo4j), neo4j-label is a node label to match against in the graph database, and node-property-mappings are filters against Neo4j node properties, as defined below.

NODE PROPERTY MAPPINGS:

RELATIONSHIP PROJECTIONS:

Examples

>>> g = GraphDB()
>>> g.build_graph_native_projection(
  graph_name = "g1",
  node_proj = ['Gene', 'StructuralEntity'],
  relationship_proj = "*"
)
>>> 
drop_all_existing_graphs(self)

Delete all graphs currently stored in the GDS graph catalog.

Returns
list

A list of dicts describing the graphs that were dropped as a result of calling this method. The dicts follow the same format as one of the list elements returned by calling list_current_graphs().

drop_existing_graph(self, graph_name)

Delete a single graph from the GDS graph catalog by graph name.

Parameters
graph_namestr

A name of a graph, corresponding to the ‘graphName’ field in the graph’s entry within the GDS graph catalog.

Returns
dict

A dict object describing the graph that was dropped as a result of calling this method. The dict follows the same format as one of the list elements returned by calling list_current_graphs().

fetch(self, field, operator, value, what='both', register_graph=True, negate=False, query_type='cypher', \*\*kwargs)

Create and execute a query to retrieve nodes, edges, or both.

Parameters
fieldstr

A property label.

what{‘both’, ‘nodes’, edges’}

The type of objects to fetch from the graph database. Note that this functions independently from any subgraph registered in Neo4j during query execution - if register_graph is True, an induced subgraph will be registered in the database, but the components returned by this method call may be only the nodes or edges contained in that subgraph.

filterstr

‘Cypher-like’ filter statement, equivalent to a WHERE clause used in a Neo4j Cypher query (analogous to SQL WHERE clauses).

query_type{‘cypher’, ‘native’}

Whether to create a graph using a Cypher projection or a native projection. The ‘standard’ approach is to use a Cypher projection, but native projections can be (a.) more highly performant and (b.) easier for creating very large subgraphs (e.g., all nodes of several or more types that exist in all of ComptoxAI). See “Notes”, below, for more information, as well as https://neo4j.com/docs/graph-data-science/current/management-ops/graph-catalog-ops/#catalog-graph-create.

fetch_edges(self, edges)

Fetch edges (relationships) from the Neo4j graph database.

fetch_nodes(self, nodes)

Fetch nodes from the Neo4j graph database.

get_features(self, graph)

Fetch arrays of features corresponding to entities in a graph from MongoDB.

list_existing_graphs(self)

Fetch a list of projected subgraphs stored in the GDS graph catalog.

Returns
list

A list of graphs in the GDS graph catalog. If no graphs exist, this will be the empty list [].

run_cypher(self, qry_str, verbose=False)

Evaluate a Cypher query on the Neo4j graph database.

Parameters
qry_strstr

A string containing the Cypher query to run on the graph database server.

Returns
list

The data returned in response to the Cypher query.

Examples

>>> from comptox_ai.db import GraphDB
>>> g = GraphDB()
>>> g.run_cypher("MATCH (c:Chemical) RETURN COUNT(c) AS num_chems;")
[{'num_chems': 719599}]