# 🚀 Participation

Note that your team name will be made publicly available in our leaderboards together with your model performance. For the awardees of each dataset, the team member information will be also made publicly available.

The code of all submissions has to be made public via a Github repository. It should include instructions on how to run the code from beginning to end (i.e. how to train a model, make hyperparameter searches and evaluate it) as we/the community will rerun the experiments of the awardees. Everything should be reproducible. Examples how we expect the repository to look like are our embedding baseline repository and symbolic baseline repository.

## Evaluation

 ⚠️ For approaches that rely heavily on random number generators (such as most embedding approaches) we require three submissions: for each submission the model should be trained with a different random state (seed). All three models have to be trained with the same hyperparameters (if any).

### Protocol and metric

All models are evaluated using the hits@10 metric, which is formally defined in the following. Let $$\mathcal{K}$$ be the set of all triples within the dataset, $$\mathcal{K}^{\text{test}}$$ be the set of test triples and $$\mathcal{E}$$ be the set of entities in $$\mathcal{K}$$. Given a triple $$(h, r, t)$$ of $$\mathcal{K}^{\text{test}}$$, the rank of $$(t \vert h,r)$$ is the filtered rank of object $$t$$, i.e. the rank of model score $$s(h,r,t)$$ among the collection of all pseudo-negative-object scores

$\{s(h,r,t'): t' \in \mathcal{E} \:\text{and}\: (h,r,t') \notin \mathcal{K}\}$

Define rank of $$(h \vert r,t)$$ likewise. Then Hits@10 is calculated as:

$Hits@10 = \frac{1}{2* \vert \mathcal{\mathcal{K}^{\text{test}}} \vert} \sum_{(h,r,t) \in \mathcal{K}^{\text{test}}} \Big( 1(\text{rank}(t \vert h,r) \leq 10) + 1(\text{rank}(h \vert r,t) \leq 10 \Big)$

To avoid artificially boosting performances: The score of the positive triple $$s(h,r,t)$$ must not be treated differently than the scores of pseudo-negative triples. Thus every approach should generate the ranking of entities according to the random or ordinal policy for dealing with same score entities. (For further information see page 4 in Knowledge Graph Embedding for Link Prediction: A Comparative Analysis; Rossi et al).

 ⚠️ Two major things to take away here: The number of negative evaluation triples is equal to number of entities in the dataset (Corruption with all entities) Do not use top policy (Inserting the correct entity on place one of a group of same score entities)

### Evaluator (Python package)

To create a submission for a model our provided evaluator OBL2021Evaluator has to be used, which evaluates predictions in a standardized way. A detailed documentation of it can be found here.

First you have to prepare:

• top10_heads: torch.Tensor or numpy.array of shape (num_testing_triplets, 10)
• Top 10 ranked predictions for the head entity. The value at (i,j) is the ID of the predicted head entity with rank j+1 for the triple dl.testing[i]
• top10_tails: torch.Tensor or numpy.array of shape (num_testing_triplets, 10)
• Top 10 ranked predictions for the tail entity. The value at (i,j) is the ID of the predicted tail entity with rank j+1 for the triple dl.testing[i]

You can then run the evaluation by calling the evaluators eval(...) function with top10_heads, top10_tails and the set of test triples which can be retrieved from the Dataset module dl.testing. The eval(...) function creates a submission file in the location the script is started from and displays the calculated hits@10 metric which you also have to submit.

## Minimal working example

The following code produces the score and the file needed for submission. The file is generated in the location from where the script is started. For the submission you need to copy and paste the line that starts with {'h10': ... to the submission form.

Above code should result in the following output: