Datasets
Currently the database is hosted on the following remote file share server: https://bwsyncandshare.kit.edu/s/98NgmGCDty54kik/download/.
Available Datasets
The following datasets can be downloaded either using the command line interface (CLI) or directly using the python API.
| Name | Category | Description | No. Elements | Target Type |
|---|---|---|---|---|
| MCF_7 | organic | - | 26776 | regression |
| _price_small | organic | - | 80000 | |
| _test | organic | - | 3 | regression |
| _test2 | organic | - | 3 | regression |
| ames | organic | Ames Mutagenicity Assays | 6512 | classification |
| aqsoldb | organic | Aqueous Solubility | 9889 | regression |
| bace_cls | organic | Beta Secretase 1 Inhibitors | 1513 | classification |
| bace_reg | organic | Beta Secretase 1 Inhibitors | 1513 | classification |
| bbbp | organic | Blood-Brain Barrier Penetration | 1934 | classification |
| beet | organic | Honey Bee Toxicity | 254 | classification |
| bl_chembl_cls | organic | Briem & Lessel ChEMBL Dataset (Multi-Target Classification) | 8780 | classification |
| bl_chembl_reg | organic | Briem & Lessel ChEMBL Dataset (Multi-Target Regression) | 52484 | regression |
| clintox | organic | Clinical Toxicity | 1465 | classification |
| compas_1x | organic | Cata-Condensed Polybenzenoid Hydrocarbons (COMPAS-1x) | 34072 | regression |
| compas_3x | organic | Peri-Condensed Polybenzenoid Hydrocarbons (COMPAS-3x) | 39482 | regression |
| dpp4 | organic | DPP-4 inhibitors | 3933 | classification |
| dud_e | organic | DUD-E (Directory of Useful Decoys - Enhanced) Multi-Target Classification | 400040 | classification |
| elanos_bp | organic | - | 5431 | regression |
| elanos_vp | organic | - | 2704 | regression |
| electrum_oxstate | tmc | ELECTRUM CSD Oxidation State Classification | 39166 | classification |
| esol | organic | Water Solubility | 1127 | regression |
| freesolv | organic | Hydration Free Energy | 639 | regression |
| half_life | organic | Half-Life Biotransformation | 892 | regression |
| hiv | organic | HIV Inhibitors | 38040 | classification |
| hopv15_exp | organic | Harvard Organic Photovoltaic Dataset | 175 | regression |
| lipophilicity | organic | Octanol-Water Distribution Coefficient | 4199 | regression |
| muv | organic | MUV (Maximum Unbiased Validation) Multi-Target Classification | 93111 | classification |
| open_melting_point | organic | Melting Point | 27965 | regression |
| pcqm4mv2 | organic | - | 3378606 | regression |
| qm9 | organic | DFT properties of small molecules | 134000 | regression |
| qm9_smiles | organic | QM9 SMILES Dataset | 133882 | regression |
| riniker_1 | organic | RDKit Benchmarking Platform Subset I (Multi-Target Classification) | 168326 | classification |
| riniker_1_filtered | organic | RDKit Benchmarking Platform Subset I Filtered (Difficulty-Filtered Multi-Target Classification) | 105294 | classification |
| riniker_2 | organic | RDKit Benchmarking Platform Subset II (Multi-Target Classification) | 20922 | classification |
| sider | organic | Drug Side Effects | 1220 | classification |
| skin_irritation | organic | Skin Irritation and Corrosion | 1263 | classification |
| skin_sensitizers | organic | Skin Sensitization Hazard | 1263 | classification |
| synth_binary_global | organic | - | 249455 | classification |
| synth_binary_local | organic | - | 249455 | regression |
| tadf | organic | - | 460199 | regression |
| tmqm | tmc | tmQM Transition Metal Complex QM Properties | 100847 | regression |
| tmqmg | tmc | tmQMg Transition Metal Complex QM Properties | 63466 | regression |
| tox21 | organic | Toxicology | 7570 | classification |
| toxcast | organic | Toxicology | 6842 | classification |
| zinc250k | organic | ZINC250K Subset of Drug-like Molecules | 249455 | regression |