DeepFake technology is developing fast, and realistic face-swaps are growingly deceiving and hard to be detected. On the contrary, DeepFake detection methods are also improving. There is a two-party game between DeepFake creators and detectors. We are organizing this competition to provide a common platform for benchmarking the adversarial game between current state-of-the-art DeepFake creation and detection methods. The expected outcome is a comprehensive study of the DeepFake adversarial game in the current status and to further facilitate the research community to build a better defend against DeepFake together.


The dataset used for this competition is CelebDF-v2, a large and high-quality DeepFake dataset released  in CVPR 2020. Available annotations are the real /fake label of each video and the triplet metadata for each created DeepFake video in the dataset. The triplet metadata refers to the target-donor-fake information, where a donor person’s face is swapped onto the target person to create the final fake result.

We strictly follow the train/test split protocol in the CelebDF dataset in this competition.

Please follow the dataset request procedure in the dataset github site to download the CelebDF-v2 dataset beforehand.

Competition Protocols

The competition is divided to two adversarial tracks: DeepFake creation and DeepFake detection, and they are evaluated against each other.

For the DeepFake creation track, participants are required to create face swap results for 1000 specified target images in the CelebDF testing set, according to CelebDF’s triplet metadata, and then submit these 1000 face-swap images to the competition platform for evaluation. Face-swap submissions will be checked for validity by our evaluation programs and will be examined by organizers in the end to rule out potential cheatings. See specific evaluation methods in the following sections.

For the Deepfake detection track, participants are required to develop their detection methods or models using ONLY the CelebDF training set, and NO external dataset is permitted for method development, and solution reproducibility will be checked by organizers in the end. Detection models and inference codes should then be submitted to the competition platform for evaluation. See specific evaluation methods in the following sections.

Baseline implementations for training DeepFake detection model on the CelebDF-v2 dataset will be provided by organizers. Also, the baseline data for DeepFake creation is the used CelebDF-v2 fake images in the test set.

Competition Platform

A code competition platform will be used, e.g. CodaLab. More information will be released later.

Evaluation Criteria

The submission of DeepFake creation track will be first checked by our face recognition model to make sure the face-swapped image has adequate ID similarity with the donor face, and it will also be checked by some similarity metrics (e.g. SSIM) to make sure the face-swapped image has adequate similarity with the original target image in content or quality . The face-swap submission will then be input to DeepFake detection models submitted in the other track to evaluate their ability to cheat detection models. The final result for a DeepFake creation submission will be a weighted score considering the aforementioned ID similarity, content similarity and cheating ability against detection models.

The submission of DeepFake detection track will be used to classify fake images submitted in the DeepFake creation track versus real images in the CelebDF test set. The classification metrics (e.g. AUC-ROC) will be used to evaluate the submitted detection models.

Note that after the competition ends, the organizers will re-check top competitors’ solutions to rule out any potential violations or cheatings. We remain the rights to declare invalidity of violating competitors’ results.


Top competitors will be invited to co-author the competition summary paper to be submitted to IJCB’21.
Also, award money will be granted to top three competitors in both tracks.
First Place: 2,000 $
Second Place: 1,200 $
Third Place: 800 $


Chinese Academy of Sciences, Institute of Automation, China.
— Dr. Bo Peng
— Prof. Wei Wang
— Prof. Jing Dong
— Prof. Qi Li
— Prof. Zhenan Sun

Ocean University of China.
— Dr. Yuezun Li

University at Buffalo, State University of New York, USA.
— Prof. Siwei Lyu


Tianjin Academy for Intelligent Recognition Technologies

Alibaba Security (Contact: yitong.yyt@alibaba-inc.com)


This competition is funded by National Key Research and Development Program of China 2020AAA0140003


Q: Can a team take part in both tracks?
A: Yes. The same participants can enroll for both creation and detection tracks in the same time.

Q: What is the evaluation frequency?
A: The intended evaluation frequency is one round per day. Meaning that we will re-run the adversarial evaluation between two tracks once every day.

Q: How many submissions can be made?
A: To ensure fairness, each participant can only select up to two submissions to be evaluated in each evaluation round for either track. This is to prevent potential biases caused by repeated submissions of identical or very similar face-swap data or detection models.

Q: Can the creation track submit post-processed Celeb-DF v2 test images?
A: DeepFake creation participants are encouraged to create brand-new realistic face-swap results. However, you are also allowed to generate results based on current CelebDF fake images (i.e. the baseline data). This includes but is not limited to post-processing, enhancing, retouching, adding adversarial noise to the current data, as long as they meet our request for image content/quality metrics (e.g. SSIM may greatly decrease when images are post-processed too harshly).

Q: Can the detection track use the Celeb-DF v2 test set for model training and validation?
A: Absolutely not. Using the test set for model training or model selection (validation) is forbidden. Please pretend you do not have the Celeb-DF v2 test split when you are in the detection track. Top solutions will be strictly checked for reproducibility after the competition.
Using re-created or post-processed data for training is permitted, as long as they are obtained using (and within) the data resource of CelebDF training set.

Q: Can the detection track use pretrained models obtained on ImageNet or DFDC dataset?
A: Yes, as long as they are publicly available to all participants and has been online before 2021.2.8 24:00 Beijing Time. This includes any public pretrained models from ImageNet or other image recognition datasets. Public pretrained models from DFDC dataset is also allowed, e.g. selimsef’s model .
Note that using publicly available pretrained models are permitted, but using any other publicly available (or unavailable) datasets apart from Celeb-DF v2 train set for model training or finetuning or any other purposes is not allowed.

Q: Please define potential violations or cheating behaviors?
— Off-line sharing of any kind between different teams.
— For the creation track, cheatings or adversarial attacks on our automatic checking algorithms for result ID and result content/quality. The will be ruled out by the final check.
— For the detection track, using extra training data outside the Celeb-DF v2 train set, using Celeb-DF v2 test set in training phase.
— Others that determined as violations by the organizers.