Who We Are: Domhnall Boyle, Software Engineer
Q How do I pronounce your name?
A It’s like “Donal”.
(Thanks, I was saying it wrong!)
Q How long have you been with the company?
A Since the end of 2019. I came onboard straight after graduating.
Q Had you done a placement year with Liopa?
A No, I did my placement year with the enterprise software firm SAP.
Q What do you work on at Liopa?
A Most of the time I’m working on the back-end for the lipreading recognition system. The system is based on a deep neural network – that’s what we’re running at the moment on the back-end.
I’m also doing another project on the side, called speech synthesis – rather than trying to recognise what lips are saying, it will convert lip movements to an audio file. The idea is that we could pass that to Google’s ASR system to do recognition on the audio as well as the lip recognition. But it’s quite early days with that. My main work is on the SRAVI back-end.
Q Had you had experience before with AI and machine learning?
A Just towards the tail end of my degree – there are a few modules such as AI and data science and that got me interested. Then I did a final-year project which was ML-based. That defined the path I wanted to go down.
Q Is the AI element what interested you in Liopa?
A Yes, using AI, and I liked the computer vision side of things as well. That combination of computer vision and AI is unique. I did my final year project on that. That combination lends a uniqueness to lip-reading software. There aren’t many companies in Belfast using computer vision and AI. Once I found Liopa, I was very interested in applying.
Q So you found Liopa on a search?
A I knew Dr Darryl Stewart (co-founder of Liopa) because he’d taught a module at university. Then I looked up his company and that’s how I found out about Liopa.
Q Which degree programme were you in?
A Computer science – an M-Eng degree. It was five years, including the year in industry – I completed the degree at QUB.
Q What is the most challenging thing about the back-end development for the lipreading technology?
A There are a lot of variables there – we have to get users to use the app correctly, for it to work properly. Lighting, head pose, what way the camera is facing, the way they’re talking, are they overemphasizing words – all these variables matter. It impacts the performance of the back-end. That’s a challenge and we’re always trying to improve on that. We are working with a back-end library called Kaldi – there’s a learning curve, because Kaldi is large and complex. But the performance is really good – it’s quite high in performance compared to Google’s ASR.
Q Is Kaldi a dataset?
A It’s a speech recognition toolkit. You use a dataset to train the model with Kaldi. It’s more taking the model that’s been trained and wrapping it around an API and making it visible to the user from the app.
Q So you’re not involved with the training of the model?
A Not with SRAVI, but with the speech synthesis work I’m involved with a lot of training. SRAVI is mostly already trained, so our work is mostly about trying to improve it, and working on the customer experience. For example, we are playing around with having one phrase list versus multiple phrase lists.
Q So speech synthesis is a lot more open-ended, not confined to a particular phrase list?
A Yes — we’re trying to keep it as wide open as possible, and we’re evaluating what the performance would be like. It would be better if it’s not constrained to a particular phrase list.
Q Where did the idea for speech synthesis come from?
A Darryl had read a paper on a related topic and we built the idea from there. The paper, “Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis,” can be found here, from the International Institute of Information Technology.