Alex Cowan, Liopa’s Senior Machine Learning Scientist, joined the company in April 2017. As she starts her fifth year with the company, we sat down to learn a bit more about what she does for Liopa.
Q How did you find Liopa initially?
A I was working as a research fellow under Liopa’s co-founder Darryl Stewart, and I was really intrigued by my work in developing models for speaker verification. Once I began working for Liopa, I moved into visual speech recognition, which was a bit of a different area. I’m always learning in this role.
Q What is your work mainly focused on?
A For Liopa I’m doing a lot of proof-of-concept stuff – taking ideas from research to development. I’m engaged in training and evaluation of models and pipeline components that we use in our VSR system – and reviewing current State of the Art techniques in this area.
For instance, I helped develop the Kaldi system, developing the models we use for VSR – and implementing end-to-end techniques for VSR.
Q What is Active Speaker Detection?
A It’s an automated way of knowing, “When does speaking start in this video?”. If the beginning of speech can be detected, then analysis can be started to read the lips from that point onwards.
Q What do you enjoy most about this work?
A I like developing the models, and I really enjoy working in a lab environment.
Q What’s changing about this kind of technology?
A Things are constantly changing, and we need to stay on top of that. I’m looking at new techniques for VSR – using TensorFlow – and implementing other approaches and comparing them to what we have. The field has moved on a lot in the past couple of years.
Also we are seeing more movement towards end-to-end approaches.
Q Can you explain a bit more? What is that?
A Kaldi is a more traditional hybrid speech recognition approach – like a pipeline with modules that you optimise separately – it’s powerful and gets good results. But the current State of the Art practice is taking an end-to-end approach – that’s what we’re seeing now.
It’s basically like applying this kind of single deep learning model to solving a complex problem, rather than using different modules – you can get rid of these different moving parts and modules.
Q What universities or companies are paving the way in this field?
A In AVSR, we are seeing papers released by Google Deepmind, Meta AI, as well as academic institutions like Oxford University, Imperial College London, Trinity College Dublin, and University of Nottingham …. This is still very much an open research problem, so it’s being looked at from an academic perspective.
Q Is Google open about it?
A Yes, they are publishing papers on it – where they outline their techniques. Meta AI has also released some code. The large tech companies have advantages in the amount of data they have access to.
At the end of the day, you need a lot of data to run and train these systems.
Q Which internal projects are you most involved with at Liopa?
A I’m doing a lot of work in the automobile sector, but some of this isn’t stuff we can discuss publicly yet. With lots of vehicles now operating on voice-driven commands, there is a research happening in visual voice activity detection for in-car systems.
The idea of being able to work out when speech starts, in a selected video, has other application areas as well. If you can work out when someone is starting to speak, you can apply VSR from that point forwards. It’s useful for data harvesting, as well – there’s a huge amount of data but you only want to analyse the series from when the speech starts.
Q What’s the most challenging thing about your work?
A It’s generally a challenging domain – computer vision, speech recognition, deep learning – you have to be good at all these areas.
Because ML and VSR is a constantly moving field, it’s very much an open research problem – so there’s a lot of new techniques. I spend a lot of time keeping up with what’s new, and knowing what’s current State of the Art.
That’s also what makes it really interesting – that’s why I enjoy it too.
Q What would you love to see happen in terms of this technology being applied?
It would be really interesting to see this technology applied to in-car systems and see that go out into the market.
Q How is this going to change the world or help people?
Personally, I think our SRAVI application automatically helps people in hospital, when they’re in a situation where they’re struggling to communicate. Anytime a technology helps people communicate or helps a level of understanding – that benefits a lot of people in the world.