📁 Dataset
The OpenBioLink2021 Dataset is a highly challenging benchmark dataset containing about 4.5 million high quality biomedical facts from various renowned biomedical knowledge bases. The dataset was split randomly with a ratio of 90-5-5.
# Train | # Valid | # Test | # Entities | # Relations |
---|---|---|---|---|
4,192,002 | 186,301 | 180,964 | 180,992 | 28 |
The dataset can be downloaded from Zenodo: KGID_HQ_DIR.zip or loaded with the provided python dataloader module, which is further documented here. Please make sure that you get the dataset from one of the two sources, as other versions of OpenBioLink may differ.