๐ Participation
Please submit your approaches early, you can always overwrite your submission!
Note that your team name will be made publicly available in our leaderboards together with your model performance. For the awardees of each dataset, the team member information will be also made publicly available.
The code of all submissions has to be made public via a Github repository. It should include instructions on how to run the code from beginning to end (i.e. how to train a model, make hyperparameter searches and evaluate it) as we/the community will rerun the experiments of the awardees. Everything should be reproducible. Examples how we expect the repository to look like are our embedding baseline repository and symbolic baseline repository.
Evaluation
โ ๏ธ For approaches that rely heavily on random number generators (such as most embedding approaches) we require three submissions: for each submission the model should be trained with a different random state (seed). All three models have to be trained with the same hyperparameters (if any). |
Protocol and metric
All models are evaluated using the hits@10 metric, which is formally defined in the following. Let \(\mathcal{K}\) be the set of all triples within the dataset, \(\mathcal{K}^{\text{test}}\) be the set of test triples and \(\mathcal{E}\) be the set of entities in \(\mathcal{K}\). Given a triple \((h, r, t)\) of \(\mathcal{K}^{\text{test}}\), the rank of \((t \vert h,r)\) is the filtered rank of object \(t\), i.e. the rank of model score \(s(h,r,t)\) among the collection of all pseudo-negative-object scores
Define rank of \((h \vert r,t)\) likewise. Then Hits@10 is calculated as:
To avoid artificially boosting performances: The score of the positive triple \(s(h,r,t)\) must not be treated differently than the scores of pseudo-negative triples. Thus every approach should generate the ranking of entities according to the random or ordinal policy for dealing with same score entities. (For further information see page 4 in Knowledge Graph Embedding for Link Prediction: A Comparative Analysis; Rossi et al).
โ ๏ธ Two major things to take away here: |
|
Evaluator (Python package)
To create a submission for a model our provided evaluator OBL2021Evaluator
has to be used, which evaluates predictions in a standardized way. A detailed documentation of it can be found here.
First you have to prepare:
top10_heads
: torch.Tensor or numpy.array of shape (num_testing_triplets, 10)- Top 10 ranked predictions for the head entity. The value at (i,j) is the ID of the predicted head entity with rank
j+1
for the tripledl.testing[i]
- Top 10 ranked predictions for the head entity. The value at (i,j) is the ID of the predicted head entity with rank
top10_tails
: torch.Tensor or numpy.array of shape (num_testing_triplets, 10)- Top 10 ranked predictions for the tail entity. The value at (i,j) is the ID of the predicted tail entity with rank
j+1
for the tripledl.testing[i]
- Top 10 ranked predictions for the tail entity. The value at (i,j) is the ID of the predicted tail entity with rank
You can then run the evaluation by calling the evaluators eval(...)
function with top10_heads
, top10_tails
and the set of test triples which can be retrieved from the Dataset module dl.testing
. The eval(...)
function creates a submission file in the location the script is started from and displays the calculated hits@10 metric which you also have to submit.
Minimal working example
The following code produces the score and the file needed for submission. The file is generated in the location from where the script is started. For the submission you need to copy and paste the line that starts with {'h10': ...
to the submission form.
Above code should result in the following output: