Skip to content

Datasets

Currently the database is hosted on the following remote file share server: https://bwsyncandshare.kit.edu/s/98NgmGCDty54kik/download/.

Available Datasets

The following datasets can be downloaded either using the command line interface (CLI) or directly using the python API.

Name Category Description No. Elements Target Type
MCF_7 organic - 26776 regression
_price_small organic - 80000
_test organic - 3 regression
_test2 organic - 3 regression
ames organic Ames Mutagenicity Assays 6512 classification
aqsoldb organic Aqueous Solubility 9889 regression
bace_cls organic Beta Secretase 1 Inhibitors 1513 classification
bace_reg organic Beta Secretase 1 Inhibitors 1513 classification
bbbp organic Blood-Brain Barrier Penetration 1934 classification
beet organic Honey Bee Toxicity 254 classification
bl_chembl_cls organic Briem & Lessel ChEMBL Dataset (Multi-Target Classification) 8780 classification
bl_chembl_reg organic Briem & Lessel ChEMBL Dataset (Multi-Target Regression) 52484 regression
clintox organic Clinical Toxicity 1465 classification
compas_1x organic Cata-Condensed Polybenzenoid Hydrocarbons (COMPAS-1x) 34072 regression
compas_3x organic Peri-Condensed Polybenzenoid Hydrocarbons (COMPAS-3x) 39482 regression
dpp4 organic DPP-4 inhibitors 3933 classification
dud_e organic DUD-E (Directory of Useful Decoys - Enhanced) Multi-Target Classification 400040 classification
elanos_bp organic - 5431 regression
elanos_vp organic - 2704 regression
electrum_oxstate tmc ELECTRUM CSD Oxidation State Classification 39166 classification
esol organic Water Solubility 1127 regression
freesolv organic Hydration Free Energy 639 regression
half_life organic Half-Life Biotransformation 892 regression
hiv organic HIV Inhibitors 38040 classification
hopv15_exp organic Harvard Organic Photovoltaic Dataset 175 regression
lipophilicity organic Octanol-Water Distribution Coefficient 4199 regression
muv organic MUV (Maximum Unbiased Validation) Multi-Target Classification 93111 classification
open_melting_point organic Melting Point 27965 regression
pcqm4mv2 organic - 3378606 regression
qm9 organic DFT properties of small molecules 134000 regression
qm9_smiles organic QM9 SMILES Dataset 133882 regression
riniker_1 organic RDKit Benchmarking Platform Subset I (Multi-Target Classification) 168326 classification
riniker_1_filtered organic RDKit Benchmarking Platform Subset I Filtered (Difficulty-Filtered Multi-Target Classification) 105294 classification
riniker_2 organic RDKit Benchmarking Platform Subset II (Multi-Target Classification) 20922 classification
sider organic Drug Side Effects 1220 classification
skin_irritation organic Skin Irritation and Corrosion 1263 classification
skin_sensitizers organic Skin Sensitization Hazard 1263 classification
synth_binary_global organic - 249455 classification
synth_binary_local organic - 249455 regression
tadf organic - 460199 regression
tmqm tmc tmQM Transition Metal Complex QM Properties 100847 regression
tmqmg tmc tmQMg Transition Metal Complex QM Properties 63466 regression
tox21 organic Toxicology 7570 classification
toxcast organic Toxicology 6842 classification
zinc250k organic ZINC250K Subset of Drug-like Molecules 249455 regression