(This guest article was originally posted in Biometric Technology Today).
Today’s best user authentication systems are multi-factor. Users are verified by using –
- something they know (e.g. a pin or password)
- something they have (e.g. a key fob)
- something they are (biometric authentication).
The “something they are” factor provides a very accurate and reliable user authentication technique by identifying the individual from a unique physiological or behavioural characteristic, e.g., fingerprint, voice, face, lip movement or keystroke analysis. These biometric modalities are accurate, easy to use and should be difficult to compromise.
However, experience has shown that biometric modalities are susceptible, in varying degrees, to “spoofing”. This is formally defined as “the presenting of an artificial replication of a piece of biometric data to the biometric system in order to try and gain access.” Artificial fingers, high resolution iris images, a photograph of the authorised user or an audio voice recording are all examples of spoof attacks on different biometric modalities. Creating these spoofs is not difficult in today’s connected world – images and video of faces are visible on social media, and when someone talks or uses a phone-based application, their voice can be recorded. Even fingerprints and DNA can be farmed from anything that’s touched.
Biometric Authentication vendors have invested significantly in recent times to ensure their modalities are more resistant to spoofing attacks. Accordingly spoofing techniques have become much more sophisticated. Take, for example, the evolution of the fingerprint spoof. Early fingerprint-based authentication systems could be compromised by relatively crude spoofing techniques. The famous ‘gummy bear’ spoof, devised by Japanese researcher Tsutomu Matsumoto in 2002, used a combination of a latent fingerprint on glass, gelatin from Gummy Bear sweets and a plastic mold to fool 4 out of 5 fingerprint sensors at the time. Over the years fingerprint sensors have developed advanced anti-spoofing capabilities and can now detect pulse, temperature, and capacitance, validating the presence of a live person and thus increasing robustness to ‘gummy bear’ derivative attacks. Fingerprint spoofing techniques have likewise evolved. For example, Deep Learning systems, that leverage the fact that smartphone fingerprint sensors are small and use multiple partial fingerprints, have been used to create synthetic prints that can unlock smartphones (https://newstarget.com/2017-05-02-master-fingerprints-can-unlock-almost-any-phone-bypassing-fingerprint-security-in-seconds.html).
Biometric Authentication on mobile devices is particularly important. The number of smartphone users topped 3 billion in 2018 and increasingly these devices are being used for more security sensitive applications such as mobile banking, m-Commerce and as the primary computing platform for today’s corporate mobile workforce. Legacy authentication techniques (e.g., password entry) are very cumbersome on such small form factor devices, especially when on the move.
As Biometric Authentication increases in popularity, the ability of these systems to differentiate between a live person and a spoof has become critical. This capability is commonly referred to as liveness detection.
Facial Recognition –
Amongst the various biometric modalities in use today, Facial Recognition (FR) has gained greatly in popularity in recent times, especially for user authentication on mobile devices. The increasing popularity of “selfies” shows that users are very comfortable with this form of interaction. FR is easy to use. Unlike the use of fingerprints, for example, there is no physical interaction required between device and user, which can be cumbersome on small form factor mobile devices.
As well as securing access to the smartphone and its installed applications, (e.g. https://play.google.com/store/search?q=face+lock&c=apps), there are other strategic applications for FR technology on mobile devices. Consider Identity Verification (IDV) vendors – these companies provide KYC & AML solutions to banks, financial institutions & government agencies, allowing them to validate the identity of new customers during the onboarding process. The leading IDV companies are now deploying mobile solutions where FR is playing a key role. The IDV vendor’s onboarding workflows now commonly ask the user to produce a government-issued ID (e.g. driver’s license) and will then validate the photo on the ID with the person holding it. The IDV vendor’s mobile app will firstly ascertain validity of the ID document and will then prompt the user to take a selfie. The vendor’s FR technology will validate that the selfie is a match to the photo on the ID document. Although this ID Checking process can negate the need for a costly and inconvenient physical face-to-face ID-check at the bank, the mobile selfie validation step is very prone to spoofing and the FR requires strong liveness detection capability.
In these mobile-based use-cases the ability of the FR technology to validate the person’s identity and “liveness” can be limited by the capability of the mobile device itself. High-end smartphones (e.g. Apple’s iPhone X) can use infrared/depth sensors to create an artificial 3D scan that is secure enough to verify digital payments. These systems have inherently strong liveness detection capabilities and are thus more difficult to spoof. The vast majority of mobile devices, however, have more basic cameras and thus support less sophisticated FR capabilities. Applications leveraging these capabilities are thus more susceptible to spoofing attacks.
Combating Facial Recognition spoofs
There a number of well known spoofing techniques commonly used to compromise FR systems. All have been shown to successfully compromise smartphone-based FR systems to the chagrin of the handset vendors e.g. https://www.biometricupdate.com/201903/web-video-photos-or-siblings-can-spoof-samsung-galaxy-s10-facial-recognition
Spoofing techniques include –
- Print attack: The attacker uses someone’s photo. The image is printed or displayed on a digital device. This is the most common type of attack since most people have facial pictures available on the internet or photos could be obtained without any permission.
- Warped photo attack: consist in bending a printed photo in any direction to simulate facial motion
- Replay/video attack: A more sophisticated way to trick the system, which usually requires a looped video of a victim’s face. This approach ensures behaviour and facial movements to look more ‘”natural” compared to holding someone’s photo. This type has physiological signs of life that are not presented in photos, such as eye blinking, facial expressions, and movements in the head and mouth, and it can be easily performed using tablets or large smartphones.
- 3D mask attack: During this type of attack, a mask is used as the tool of choice for spoofing. It’s an even more sophisticated attack than playing a face video. In addition to natural facial movements, it enables ways to deceive some extra layers of protection such as depth sensors deployed on very high-end smartphones.
To combat these spoofing attacks FR systems need strong liveness detection capability – i.e. the ability to detect the presence of a “live” person as opposed to a static image/video of the subject. There are a number of liveness detection techniques currently in use and these can be broadly categorised into hardware and software-based solutions –
- Hardware-based solutions use specialised sensors to detect liveness in the subject. These sensor-based solutions use a variety of approaches including –
- depth-mapping and 3D-sensing techniques
- sensors to measure/compare the reflectance information of real faces and fake materials
- thermal imaging sensors
- sensors to detect facial vein patterns.
Such solutions are much more prevalent in the bespoke high-end FR systems typically found in airports and border security. The cost of these sensors prohibits their use for liveness detection in the majority of commodity mobile devices.
- Software-based solutions do not require specialist sensors and, to ascertain liveness, leverage the standard camera/microphone available on today’s commodity devices. As such, software-based techniques are most commonly used for liveness detection in mobile-based FR. Some software-based techniques take multiple images of the user at varying distances from the mobile device and then, from these 2D maps, build up a ‘3D image’ and match to the enrolled profile – providing robustness to image/video attacks. The most common software-based techniques, however, ask the user to perform a certain action (e.g. head movement, blinking, speaking a pre-defined phrase) and then ascertain, from analysis of the resultant audio/video, whether or not the challenge was responded to correctly. A successful response to such a challenge is a strong indication of liveness.
IDV vendors who supply KYC solutions (see above) commonly use software-based liveness detection during the ID checking process. These challenge/response solutions typically ask the user to:
- move their head in the X,Y plane,
- blink, smile, frown,
- speak a random sequence of digits/words.
Striking the correct balance between usability and accuracy is critical. Robust liveness detection solutions that use a challenge/response interaction requiring too much time & effort on the part of the user, will struggle to be adopted. Solutions that, for example, require the user to assume a number of positions whilst, at all times, ensuring their head is within an on-screen oval, are cumbersome to use. Solutions that require a highly optimal environmental setting – e.g. very well lit face, minimal camera movement– may require the user to continually repeat the challenge/response sequence until the captured video/audio is suitable for analysis. Conversely some of the easier-to-use options are more readily spoofed – a liveness check solution that asks the user to simply blink or smile can be compromised with a video of the subject doing so, farmed from their social media accounts.
Ideally the convenience and accuracy of liveness detection solutions should be configurable. For applications that require strong security, e.g., mobile bank transfers, the solution must be absolutely sure of liveness and thus a more sophisticated user interaction is acceptable. The ability to readily adjust accuracy & usability to the particular circumstance is key.
Liopa has developed LipSecure, a software-based liveness checker that uses the challenge/response technique. When the user presents to the FR system, a random sequence of digits appears on screen and the user is asked to speak/mime the digit string. Liopa’s core expertise is in Visual Speech Recognition (VSR) and the company have developed an AI-based ‘automated lipreading’ system. LipSecure leverages this technology to analyse the user’s response to the digit challenge. LipSecure will return a score indicating how close a match the response & challenge are, and thus the level of confidence that a live person is present. LipSecure’s ease-of-use and accuracy can be tuned to the customer’s use case by altering both the number of digits in the challenge phrase, and the threshold score that determines liveness. LipSecure will work on any device with a standard camera. On-device SDKs are available that allow mobile applications to access the cloud-based LipSecure service.
Liopa believes that LipSecure is the first commercially available application of VSR technology. The ability to decipher speech from a video of lip movement has many applications. Liopa is initially focusing on constrained vocabulary use cases such as liveness detection, where only the digits 0-9 are used. The technology is also being used to help tracheostomy patients better communicate with their carers. Such patients have impaired/no speech capability but can generally move their lips normally. Liopa’s VSR application will monitor the patients lip movements and decipher which phrase, from a defined list, the patient is attempting to utter. VSR technology can also being used to identify the occurrence of certain keywords and phrases in videos of people speaking. This could be useful in situations where the audio track either doesn’t exist or is of very poor quality, for example in CCTV footage.
VSR can also be used to improve the accuracy of the ‘Voice UI’. Voice-driven applications are now ubiquitous – as is evident from the rise in popularity of virtual personal assistants (Siri, Cortana, Alexa..) and voice-activated in-car command systems. The technology underpinning these applications uses Audio Speech Recognition technology to decipher speech from the audio waveform. These technologies perform very well in ‘clean’ environments, but word accuracy generally degrades in more real-world, noisy environments. VSR, which is audio noise agnostic, can combine with the ASR to improve overall system performance in situations where background noise exists and where a camera can be trained on the head of the speaker.
(You can access this article in its original format here).