Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN（这里有许多开源的实现）, none of large scale face dataset is publicly available（但是没有公开的人脸数据库）. The current situation in the field of face recognition is that data is more important than algorithm（人脸识别领域的现状是数据比算法重要）. To solve this problem, we propose a semi-automatical way to collect face images from Internet and build a large scale dataset containing 10,575 subjects and 494,414 images, called CASIA-WebFace. To the best of our knowledge, the size of this dataset rank second in the literature（第二名）, only smaller than the private dataset of Facebook (SCF). We encourage those data-consuming methods training on this dataset and reporting performance on LFW.
The statistics of the proposed CASIA-WebFace dataset is shown in Table 1. Except for Facebook's SFC dataset, the scale of CASIA-WebFace has the largest scale. For users' privacy issue, maybe SFC will never be open to research community（或许facebook的数据库永远都不会公开）. The features of Microsoft's WDRef dataset was publicly available from 2012 but it is inflexible for advanced researches. Among the datasets listed in the table, CASIA-WebFace+LFW is the most suitable combination for large scale face recognition in the wild（CASIA-WebFace+LFW）. If you feel the accuracy of LFW has been saturated by the current state-of-the-art method. BLUFR is a more challenging protocol to report your results.
Publication and Results:
To illustrate the quality of CASIA-WebFace, we train a deep CNN on it and compare its accuracy to state-of-the-art methods, such as, DeepFace and DeepID2. You can refer the following technical report for details.
♦ Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li, “Learning Face Representation from Scratch”. arXiv preprint arXiv:1411.7923. 2014. (pdf)
The above reference should be cited in all documents and papers that report experimental results based on the CASIA WebFace database.
Download Instructions: （下载指南）
To apply for the database, please follow the steps below:
1.Download and print the document Agreement for using CASIA WebFace database（下载并打印协议书）
2.Sign the agreement（签字）
3.Send the agreement to firstname.lastname@example.org （发送到 email@example.com)
4.Check your email to find a login account and a password of our website after one day, if your application has been approved.(一天以后，检查是否收到用户名和密码)
5.Download the CASIA WebFace database from our website with the authorized account within 48 hours.（48小时内从网站上下载数据）
Copyright Note and Contacts:（版权）
The database is released for research and educational purposes. We hold no liability for any undesirable consequences of using the database. All rights of the CASIA WebFace database are reserved.
 LFW, http://vis-www.cs.umass.edu/lfw/（LFW 测试集）
 D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. “Bayesian face revisited: A joint formulation”. In ECCV 2012, pages 566–579. Springer, 2012.
 Y. Sun, X. Wang, and X. Tang. “Deep learning face representation by joint identification-verification”. arXiv preprint arXiv:1406.4773, 2014.
 Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. “Deepface: Closing the gap to human-level performance in face verification”. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1701–1708. IEEE, 2014.
 CARC, http://bcsiriuschen.github.io/CARC/