# 1. Computational Toxicology¶

Computational toxicology is a field of research that has existed since roughly the mid-1980s, but only in the past decade has it seen significant growth and attention. Briefly, computational toxicology can be defined as the use of computational and informatics techniques to make discoveries regarding the toxic effects of chemicals on organisms and the environment. The term predictive toxicology is often used synonymously with computational toxicology.

## 1.1. Methods used in computational toxicology¶

Before ComptoxAI, the set of computational tools available to toxicologists was relatively limited. Let’s review the most common of these with examples and intuitive explanations:

### 1.1.1. Quantitative Structure-Activity Relationship (QSAR) modeling¶

Quantitative structure-activity relationship (QSAR) modeling is the process of building predictive models that accept structured descriptions of molecular structure and predict a particular measure of activity. This predictive model can be anything you want, but common choices include logistic regression, random forests, support vector machines, and (increasingly) artificial neural networks. The general form is:

where

is a vector representing a chemical structure,

is a predicted activity value,

is an arbitrary error term, and

is the predictive model.

Input features for a QSAR model are usually one of two types:

• Fingerprints: A sequence of binary features where each feature represents presence/absence of a particular structural characteristic. Some examples are whether the chemical contains an aromatic ring, a disulfide bond, has fewer than 3 oxygen atoms, etc. There are a number of different standardized fingerprints, including MACCS, PubChem fingerprints, and others. We like to use MACCS, which has 166 binary features and tends to perform best on most of the tasks we have tested it on. MACCS fingerprints are available for every chemical in ComptoxAI.

• Descriptors: Continuous values characterizing a chemical structure, including molecular weight, log-p, number of aromatic rings, etc.

Depending on the predictive model you use, it is often effective to mix fingerprints and molecular descriptors in the same analysis.

#### 1.1.1.1. QSAR variations¶

You’ll often see other, similar terms used in place of QSAR. These usually aim to more precisely desribe the type of activity being measured, but are all mechanistically the same thing. For example:

• QSPR (Quantitative structure-property relationship) predicts a chemical property.

• QSTR (Quantitative structure-toxicity relationship) predicts a measure of toxicity.

For simplicity’s sake, we usually just refer to everything as QSAR - it’s a good catch-all term.

#### 1.1.1.2. Is QSAR quantitative?¶

Scientists sometimes argue about whether a QSAR analysis is truly quantitative or if it instead is qualitative. E.g., when the predictive task is to determine whether an assay is active or inactive, some researchers say that this outcome is qualitative. We take the view that this is a dichotomous outcome that can be represented as a binary (1/0 or True/False) variable, and is therefore quantitative. If you are predicting a continuous quantity (e.g, lethal dose of a chemical), this is clearly qunatitative no matter who you ask. But in the end, we feel that the difference is minor and not worth worrying about extensively.

From a high level, read-across is a rather simple technique: To predict an unknown property of interest in a certain chemical, you gather a list of similar chemicals where the property is known, and extrapolate the likely value of that property based on the distribution of that property in the related chemicals. This can be done qualitatively (by simply assembling a table of chemicals and visually ‘reading across’ the table to inspect patterns) or quantitatively (by analyzing the trends in the property using statistical or computational techniques).

Read-across can be performed in a one-to-one, one-to-many, many-to-one, or many-to-many fasion, depending on your available data and the other factors. Additionally, you can determine the unknown value(s) by interpolation (e.g., placing a chemical with a length-5 carbon chain between similar chemicals that have length-4 and length-6 carbon chains, and predicting a property halfway between the two known values) or extrapolation.

### 1.1.3. Trend Analysis¶

Trend analysis is similar to QSAR, but rather than predicting activity, you instead collect a group of chemicals already known to have a certain type of activity and build a predictive model that estimates trends in a chemical property of interest. For example, you can use trend analysis to predict the binding affinity of aryl hydrocarbon receptor agonists to the AHR protein. To do this, you need to (a.) already have a set of chemicals known to bind to AHR, and (b.) have binding affinities for a subset of those chemicals that are used to build the predictive model. The trained model is used to “fill in the gaps” for unknown binding affinity measurements.