Verification system compares a persons live speech with their stored speech pattern

What is Voice Biometrics

and why should you use it?

Voice biometrics is the science of using a person’s voice as a uniquely identifying biological characteristic in order to authenticate them. Also referred to as voice verification or speaker recognition, voice biometrics enables fast, frictionless and highly secure access for a range of use cases from call center, mobile and online applications to chatbots, IoT devices and physical access.

Massive advances in neural networks over the past 2-3 years have enabled the development of voice biometric algorithms that are faster, more accurate, and can authenticate users with a smaller amount of speech. In fact, ID R&D is now able to exceed the accuracy of a 4-digit PIN in many use cases.

Like other biometric modalities, voice biometrics offer significant security advantages over authentication methods that are based on something you know (like a password or answer to a “secret” question) or something you have (like your mobile phone). Voice biometrics also improves the customer experience by removing frustration associated cumbersome login processes and lost and stolen credentials.

Voice Biometric Advantages

null

Enhance the customer experience with fast, frictionless authentication

null

Improve security and minimize breaches due to compromised passwords, phishing, etc.

null

Reduce threats by identifying known fraudsters

null

Instantly identify users and personalize the interaction

null

Free agents from time spent verifying users and resetting passwords

null

Enable natural login for digital channels, including chatbots and virtual assistants

null

Use as part of a two-factor authentication process to increase security without adding effort

How do Voice Biometrics work?

There are over 70 body parts– each with a unique size and shape – that contribute to how a person speaks. Voice biometrics relies on the fact that human voice characteristics correlate strongly to the physiological qualities of how a person creates speech.  Unlike other methods of authentication, voice biometrics does not rely on a secret such as the person remembering a passphrase. It isn’t what the person says that is being authenticated, it’s who is speaking.

More than 70 body parts contribute to how a person produces speech and each of those parts is unique to them. Voice biometric systems work by extracting the characteristics that distinguish a person’s speech from other people.  The result is a “voiceprint” analogous to a fingerprint.  A voiceprint is also called a “voice template.”

Voice recognition systems enroll a known person by creating an initial template, often merging several templates from samples of that person’s speech for higher accuracy.  The initial template is called the enrollment template or enrollment voiceprint.

Diagram of biometric enrollment for voice authentication.

To verify an enrolled person’s identity, the biometric voice recognition system captures a new speech sample, creates a template from the sample, and compares it against the enrollment template.  A strong match between templates indicates that the same person spoke both samples, thus verifying the person’s identity. This manner of using voice recognition is called Speaker Verification.  It is a one-to-one match between the enrollment template and someone claiming to be the enrolled person.

Diagram of biometric matching for voice authentication

Another way to use voice recognition is to compare a voice sample from an unknown identity against multiple enrollment templates.  The goal is to find the person within the set of enrollment templates.  This manner of using voice biometrics is called Speaker Identification.  There are significant limits to accuracy for Speaker Identification, so businesses should consult with an expert to understand if a one-to-many use case with voice will be practical.

The use of voice biometrics for authentication is increasing in popularity due to improvements in accuracy, fueled largely by advances in AI, and heightened customer expectations for easy and fast access to information. Frequent password-associated data breaches are another reason for broader adoption as companies look for ways to better protect customer data.

When it comes to accuracy, it’s not just about keeping the wrong person out. Companies also have to minimize “false rejects” that cause headaches for existing customers and agents. “Equal Error Rate” (EER) s is the point where the number of false accepts and false rejects is equal. Of course the goal is to make both of these error types extremely small, ideally not allowing any impostors through with only a negligible number of valid people getting rejected.

Types of Voice Authentication

Voice authentication can be accomplished using text-dependent speaker recognition or text-independent voice recognition biometrics.

Text dependent voice verification is where a person speaks a specific passphrase, usually consisting of two to three words, like “My voice is my passphrase.”
Learn More

Text independent voice verification is a passive voice biometric approach whereby the user can say anything, enabling authentication to quickly happen in the background during their normal interaction with an agent, IVR, or application.
Learn More

About IDVoice

IDVoice by ID R&D is a robust AI-driven biometric voice recognition engine that provides both text dependent and text independent voice verification for mobile, web and telephone channels, as well as physical access and IoT device integration. The product is built on an innovative Convolutional Neural Network and advanced modified x-vector approach for feature extraction technology for unmatched accuracy and is ranked #1 in the industry’s leading benchmark challenge.

IDVoice is language independent, works with ultra-short utterances and has the smallest footprint available. Download the IDVoice product collateral or visit our IDVoice Text Dependent Verification and IDVoice Text Independent Verification pages to learn more.

Threats to Voice Verification systems

While voice biometrics offers a secure way to authenticate users, it is not immune to threats. Advances in machine learning, recording technology and synthetic speech are enabling high quality voice spoofing, or voice “deepfakes” that are capable of tricking humans and voice biometrics systems into thinking they are hearing a real person. These attacks can be used to gain unauthorized access to accounts.

Combatting voice spoofing requires liveness detection technology, capable of distinguishing between a live voice and a recorded, synthetic or computer generated version of the voice. You can learn more about voice anti-spoofing here.

Want to learn more?

Unlike other solutions, ID R&D’s core voice authentication technology works in any language without retraining, works across channels with a calibration setting, and is designed from the beginning to be noise-tolerant. Ready to learn more about our voice authentication solutions?

Frequently asked questions about voice biometrics

What is a person called when he or she illegally accesses your computer?

What is Hacking? Hackers illegally access devices or websites to steal peoples' personal information, which they use to commit the crimes like theft. Many people shop, bank, and pay bills online.

What translate words images and actions into form a computer can process?

The basic process of OCR involves examining the text of a document and translating the characters into code that can be used for data processing. OCR is sometimes also referred to as text recognition.

Is an input device that converts character or graphic patterns into digital data?

A scanner is a device that reads spatial pattern such as images, graphics and texts, and then generates digital signals of that pattern. Converted digital data may be processed by a computer, stored in a disk, printed by a printer or displayed on a monitor.

Is a mobile device that allows users to take photos and store the photographed images digitally?

A camera phone is a mobile phone which is able to capture photographs and often record video using one or more built-in digital cameras. It can also send the resulting image wirelessly and conveniently.