3. Graph Databases¶
ComptoxAI’s graph database is meant to completely replace traditional relational databases used for computational toxicology data. Rather than storing data in tables with keys linking rows of one table to rows of another, a graph database stores entities as nodes and the relationships that join entities as edges, resulting in a network-like data structure known to computer scientists as a graph.
3.1. Building and installing the database locally¶
(These instructions are incomplete - stay tuned for more info)
First, make sure you have the prerequisites installed.
Building the complete database consists of 4 steps:
Populating the ComptoxAI ontology with individuals for the core graph node types.
Importing the OWL file into Neo4j.
Cleaning up the imported data.
Merging other data into the graph database from external sources.
We’ll walk through these steps individually:
3.1.1. Populating the ontology with individuals¶
The core ontology for ComptoxAI is already provided as part of the
main code distribution, and can be found at data/comptox.owl
. We
use the ontology in the graph database build process because doing so
lets us control the consistency of node labels, data types, and
relationship labels in the final graph database (Neo4j provides no
out-of-box methods for doing so, which is both a strength and a
limitation of the software). The process relies upon the following
rough equivalencies between a populated OWL ontology and a Neo4j-style
graph database:
OWL Ontology |
Neo4j-style graph database |
---|---|
Ontology |
Graph database |
Class |
Node label |
Individual |
Node |
Object property |
Relationship |
Data property |
Node data |
3.1.2. Importing the ontology into Neo4j¶
Now that we have the ontology (preliminarily) populated with individuals and object properties that will become nodes and relationships in the graph database, we can use utility functions from APOC (Neo4j’s standard library) to actually parse the OWL file and load individuals into the (currently empty) graph database.
3.2. Connecting to our instance of the database¶
For users who don’t need (or want) to install the full copy of the database on their own computer, they can access a (possibly slightly out-of-date) version of the database hosted on our server and available for public use. This can be done by tweaking your configuration file to point to our public IP address:
Hosted database configuration settings
Contents of CONFIG.cfg:
[NEO4J]
Username = neo4j
Password = neo4j
Hostname = XXX.XXX.XXX.XXX <--- we'll update this soon!
Protocol = bolt
Port = 7687