Hey, Siri: Voice is trending upwards as the computing interface of choice

Consumers are increasingly using their voice to command devices and applications.  The uptake in voice-based technology shows no signs of abating. VoiceBot.ai looked at the usage of voice-driven personal assistants in the US, and reported the following:

  • One quarter of all US households have a smart speaker, such as Amazon Echo or Alexa – a 40 per cent increase since 2018
  • Half of US consumers said they have tried a voice assistant in the car. This number was equally split between using Bluetooth to connect to their smartphone voice assistant, and using the voice solution that came pre-installed in the car
  • Three in five consumers use voice assistants on smartphones at least monthly

[Source: Voice Assistant Consumer Adoption Report ]

In addition to smartphones, cars and smart speakers, many other devices are becoming voice-enabled, including:

  • desktop/laptops
  • smart TVs
  • watches
  • factory-based industrial machines/robots

The major issue with voice-driven devices and applications is their poor performance in noisy conditions. Speech recognition technology is adversely impacted by background noise, and can result in a very poor user experience. If you’ve ever tried using Siri in a crowded restaurant, you can relate.

The advanced AI technology that drives speech recognition is only useful if the device can ‘hear’ you! This is a constraining factor for the global market for speech and voice recognition, which is expected to reach $18.3bn by 2023.

What is lip reading technology?

Many different types of technology can help solve this problem – including our LipRead solution.

LipRead uses the movement of your lips to interpret what you are saying. LipRead can improve the accuracy of existing speech recognition systems, or provide the speech recognition, when no audio exists.

Of course, LipRead requires a camera to capture video of the subject talking.  This would have been an issue two to three years ago, but cameras have become increasingly prevalent across consumer devices. For example –

  • Today’s smartphone cameras are becoming significantly more sophisticated. Camera resolution is increasing, and rear-view cameras are becoming multi-lens with wide angle, telephoto and depth lenses available
  • The latest smart speakers (e.g. Amazon Echo Show/Spot, Lenova Smart Display) are now appearing with embedded cameras
  • Various camera options are available in cars, including:
    • The camera in smartphone-based auto-assistants (Apple CarPlay, Android Auto);
    • Driver monitoring cameras embedded at the factory
    • Mounted cameras such as Advanced Driver Assistance Systems (ADAS), and the inward facing cameras in Head-Up Display units

Poor user experience of voice-driven applications in noisy conditions is a serious threat to players like Amazon, Apple and Google, who are investing heavily in the voice interface. The rising prevalence of embedded cameras in today’s consumer devices allows for the use of Visual Speech Recognition technology, such as LipRead, to improve accuracy and help the technology realise its true potential.

Share this entry

Share on twitter
Share on linkedin
Share on reddit
Share on facebook
Share on whatsapp
Share on email

Check out related content

WordPress Lightbox Plugin
Scroll to Top