APSIPA ASC 2024

Multibiometrics Using a Single Face Image

Koichi Ito, Taito Tonosaki, Takafumi Aoki, Tetsushi Ohki, and Masakatsu Nishigaki

Graduate School of Information Sciences, Tohoku University, Japan & Faculty of Informatics, Shizuoka University, Japan

Abstract

Multibiometrics, which uses multiple biometric traits to improve recognition performance instead of using only one biometric trait to authenticate individuals, has been investigated. Previous studies have combined individually acquired biometric traits or have not fully considered the convenience of the system. Focusing on a single face image, we propose a novel multibiometric method that combines five biometric traits, i.e., face, iris, periocular, nose, and eyebrow, that can be extracted from a single face image. The proposed method does not sacrifice the convenience of biometrics since only a single face image is used as input. Through a variety of experiments using the CASIA Iris Distance database, we demonstrate the effectiveness of the proposed multibiometrics method.

The Challenge & The Solution

Traditional Problem

Standard biometric systems often face a dilemma: increasing accuracy by combining traits (like face, fingerprint, and iris) typically requires multiple sensors. This makes the system more expensive, complex, and less convenient for the user. While highly convenient, face recognition performance can degrade due to aging, cosmetics, or pose changes. Fingerprint and iris recognition, though more stable, require dedicated sensors, leading to reduced convenience and increased system cost when combined.

👤 + 🖐️ + 👁️ 😩

Proposed Solution

Our method uses just one face image to extract five biometric traits. This maintains the high convenience of face recognition while leveraging the power of multibiometrics to achieve superior accuracy and stability. This approach addresses the limitations of previous multibiometric systems that either combined individually acquired traits or did not fully consider user convenience.

Biometric traits extracted from a single face image

Figure 1: Biometric traits extracted from a single face image, whose combination is considered in this paper.

How It Works: A 3-Step Process

The core of our method is a streamlined process that goes from a single image to a robust authentication score. It involves extracting the biometric regions, analyzing them with powerful AI models, and intelligently fusing the results.

Overview of the proposed multibiometrics approach

Figure 2: Overview of the proposed multibiometrics approach using a single face image.

1.

Trait Extraction

Using Mediapipe FaceMesh keypoints are detected on the face to automatically crop images of the five biometric traits: face, periocular, iris, nose, and eyebrow. Face images are normalized for rotation. Iris images are further extracted from periocular images using a dedicated method.

Keypoints extracted using Mediapipe FaceMesh

Figure 3: Keypoints extracted using Mediapipe FaceMesh.

Flow of iris extraction from a periocular image

Figure 4: Flow of iris extraction from a periocular image.

2.

Feature Extraction

Specialized Convolutional Neural Networks (CNN) are used for feature extraction. ResNet-18 with ArcFace is used for face recognition, while SE-ResNet-18 with Cross Entropy (CE) loss is used for periocular, iris, nose, and eyebrow recognition. Data augmentation techniques are applied during training.

Network architecture of ResNet-18

Figure 5: Network architecture of ResNet-18 for face recognition.

Network architecture of SE-ResNet-18

Figure 6: Network architecture of SE-ResNet-18 for other traits.

3.

Score Fusion

The matching score for each biometric trait is calculated using cosine similarity. For iris recognition, a weighted average of the matching score of each subimage, where each weight is calculated based on their mask ratio, resulting in a score that gives more weight to subimages with less occlusion. The final matching score is calculated by a weighed sum of all the matching score.

Experiments & Discussion

We evaluated the proposed method using the CASIA Iris Distance database, which consists of 2,567 images from 142 faces, captured with a near-infrared camera from 2.4m-3.0m. This database is particularly challenging due to blurred iris images. We divided the dataset into training (84 subjects, 1,531 images), validation (28 subjects, 497 images), and test (30 subjects, 513 images) subsets.

Recognition accuracy was assessed using standard metrics: Equal Error Rate (EER), False Rejection Rate (FRR) at 0.1% False Acceptance Rate (FRR@FAR0.1%), and FRR at 0.01% False Acceptance Rate (FRR@FAR0.01%). Lower values for EER and FRRs indicate higher recognition accuracy, especially crucial for high-security applications where minimizing FAR is paramount.

Our CNN models (ResNet-18 and SE-ResNet-18) were pre-trained on ImageNet and fine-tuned using the CASIA Iris Distance training data. Training involved a learning rate of 0.0001 with a reduction factor of 0.1 if validation loss didn't improve for 10 consecutive epochs, and early stopping after 30 non-improving epochs. A batch size of 4 and Adam optimizer were used. Data augmentation (random crop, color jitter, random erasing, random perspective, horizontal shift) was applied with 50% probability.

We conducted experiments on single biometric traits and various combinations, comparing our proposed score-level fusion with feature-level fusion to demonstrate its effectiveness. The results presented in the dashboard below reflect these detailed evaluations.

Interactive Results Dashboard

The original paper contains several tables of data. We've visualized them here to make the results easier to understand. Use the buttons below to explore how combining different biometric traits improves recognition accuracy. Lower values are better, indicating fewer errors.

Select Evaluation Metric

Select Trait Combination

Summary

Select a view and metric to see a summary of the results and key insights here.

Note: The feature-level fusion approach was also tested but consistently resulted in higher error rates than the score-level fusion shown here. The best score-level fusion EER was 0.337%, while the best feature-level fusion EER was 1.397%.

Key Takeaways & Conclusion

Convenience without Compromise: This work successfully demonstrates that it is possible to achieve the high accuracy of multibiometrics without the inconvenience of multiple sensors, using only a single face image.

🏆

Superior Performance via Fusion: Fusing scores from multiple traits dramatically improves performance. The combination of periocular, iris, nose, and eyebrow traits yielded the best result, with an Equal Error Rate (EER) of just 0.337%.

🤔

Facial Parts Outperform the Whole: Interestingly, the best-performing combination excluded the overall face trait. This suggests that a combination of distinct, stable facial regions can be more powerful for recognition than the entire face, which is more susceptible to variations.

💡

Future Work: While our method shows high accuracy, further research is needed to improve feature-level fusion and explore new trait combinations to further reduce error rates in various challenging environments.