Long before AI assistants, the first voice activated solution for changing channels on the TV was shouting at the youngest child in the family or the person sitting closest to the television screen. And while it may have little effect on the eventual outcome, shouting at the TV is still the couch supporters’ preferred tactic for encouraging their favourite sports team or athlete.
What did you say?
Today, AI assistants with voice recognition like Siri, Cortana and Alexa can be found on our phones and in our homes. But when these systems first came out, they could be tricky to use, requiring you to speak in your best ‘telephone’ voice. And if you had a strong accent then you were basically out of luck. Something I first encountered a couple of years ago, when I had a chance to try one out at the NXP Software offices in Leuven.
I was embarrassed to discover that while it could understand the Flemish engineers’ English just fine, my Scottish accent even diluted by living abroad for many years proved too much for it. And I was not alone. This Washington Post article from 2018 highlights the challenges faced by these devices, even in the United States, never mind Glasgow.
Getting smarter
However, as I found out on a recent visit to family in the UK there have been some major improvements to these systems. I was used to my grandchildren fighting over which song to play on Spotify but was quite surprised to find that the smart speaker was now also able to understand me. Admittedly, I did not try anything too fancy, but was able get it to set a timer and stop the alarm. So, what has changed?
For a start, most of these systems now have at least a few more years worth of voice samples to learn from. Machine learning algorithms have also continued to develop, so while Alexa and her siblings may not yet be up to taking over the world, it does seem that they are learning to understand a wider range of dialects.
By using multiple microphones, these devices can now hear people speaking further away from the device and are better at filtering out speech from background noise. For example, Amazon’s Echo smart speaker uses an array of seven microphones and the latest smartphones have three or four. The underlying sound processing technologies and algorithms are also getting better at filtering out speech from background noise, making it easier for these systems to hear us even in noisy environments.
But perhaps the biggest change has been in the field of Natural Language Processing (NLP). NLP is a sub-field of artificial intelligence that is focused on understanding human speech in real-time. While NLP has been around since the 1950s, it is only in more recent years that machine learning techniques and a move to a statistical rather that a rules-based approach has boosted the capabilities of these devices. I also suspect that the vastly superior computing power we have in even our mobile phones today compared to what was available in the fifties, might be a further and significant contributor to their ability to understand us.
Coming out of the kitchen
Now that these devices can recognise what I’m saying, I expect we will soon see them coming out of the kitchen and turning up in other areas where we need to operate handsfree. While I personally look forward to the death of the TV remote control, the car is where voice control potentially offers the most benefits. The combination of noise-cancelling technologies to clean up the sound and smarter NLP algorithms will hopefully mean that we will be able to keep our hands on the wheel and our eyes on the road while we ask Alexa or her automotive sister to change the radio station or find and dial a phone number for us.
I also think that putting voice recognition in the car could be a key enabler for the transition to autonomous driving. When we can talk to our vehicles, we will begin to trust them, which is one of the biggest hurdles to their acceptance. I certainly look forward to being able to hop into an autonomous vehicle and tell it to drive me to the office while I sit in the back sipping my morning coffee and catching up on the news.