Computer Vision is a sub-discipline within AI. This field is growing rapidly in size and importance, so we’ve pulled together a detailed explanation of what it involves. Liopa’s technology uses Computer Vision (CV) to detect lip movements as part of its AI-based lipreading solution.
What’s the purpose of Computer Vision?
Computer Vision is the way that computers use pattern recognition to analyse images. “It involves letting a computer understand visual media,” says Liopa’s CTO, Fabian Campbell-West. “This could mean a single image or it could be a video. It could include media captured by a car’s dash cam, or a long-range telescope, or anything in between.”
What techniques does Computer Vision entail?
Fabian points out that this field so interesting because it is inter-disciplinary. “CV is the task of making the computer understand the visual image or video,” he says. “It’s about putting an image into a format that a computer can make sense of. As developers, we ask ourselves, ‘How do you represent an image with a set of instructions that can be understood for a certain application?’”
What kinds of things would an average person use Computer Vision for?
You may have encountered quite advanced CV in photo editing software including Adobe PhotoShop, without even realizing it.
Fabian notes that CV is a very broad term, and when we’re using CV we could be asking a computer any of the following questions:
- Can you filter this selfie?
- Can you generate an image that looks like this?
- Can you blur something out?
- Can you make people in the background disappear?
- Can you improving the quality of an image to make things clearer?
- Can you recognize a particular face?
Is Computer Vision new?
Fabian says: “Not at all. The field is mature, and it’s used in a lot of apps that people may not realise. Modern social media filters – in SnapChat and so on – are one example. The re-touching techniques used by PhotoShop, and the like, have been developed since the 1970s.”
What are the more advanced applications?
Even though the simpler techniques are commonplace, the more advanced CV innovations are all coming from the same basic discipline. It begins with an algorithm or a set of rules, for instance, that can smooth noise or jagged edges in an image.
Fabian explains, “This advances all the way up to unmanned vehicles, space craft, drones, and other technologies that can ‘see’ for us.”
How does this fit into the grand scheme of AI?
Fabian said, “If the goal of AI, in general, is to create autonomous agents – then the goal of CV is to replicate the sight part of that.
“Anything you could do with the human eyes goes into the field. Wavelengths – spectrums – they produce images based on how light interacts with the human eye. Some animals have very different eyesight to us – a lot of the early research was on other species such as birds of prey. Researchers were trying to understand how they see. They’d use that information to build a computer model.”
What areas of Computer Vision have evolved the most?
Seeing something and understanding it means you must break the image down into parts that fit together as a whole. Humans do this without thinking about it. CV researchers need to consider how to train a computer to do this, and they’ve made huge leaps forward in this field in the past five years.
According to the organisation AI Sciences, the accuracy of CV has gone up from 50% to 99% in the past decade, making Computer Vision more accurate than humans at analysing images.
But this only covers some aspects of understanding – not all.
Fabian explains: “For example, we break down the subject of an image or a video and recognize its component parts – while also seeing the full picture. Segmentation was a very primitive way of doing this – if you had an image of a boat on the water, you might look at the sea and notice that it looks blue and most of the boat looks white – that’s a primitive form of colour segmentation.
“Building and understanding an image from a low level, from pixel level – red, green and blue colour values – that’s how we represent images digitally. That was the very early stage. Now, the question is: ‘Can you recognize this is what a face looks like? Can you recognize all the parts of a face in this image?’”
Facial recognition needs to work across all ages, ethnicities, and skin tones, making the problem more complex.
But Fabian notes that this is “mostly a solved problem – generally speaking it’s something you can lift and put in applications.”
There are limitations when it comes to a computer’s ‘understanding’ of an image.
Fabian explains: “If I showed you an image of a classroom full of chairs and desks, you’d guess it was a classroom. If a chair was hanging from a ceiling, you’d think it was odd, but you’d still know it was a classroom. A computer is still a long way from understanding what any of these things mean.”
In other words, Fabian says, “If you have a complex thing at a weird angle or in an unusual configuration – the computer needs complex skills to understand what’s in that image.”
Thanks for reading Part 1. Look out for an additional article coming soon:
- In Part 2 we’ll examine how Computer Vision can be used in a generative way – and the legal and societal implications
- The piece will describe how Liopa’s team of AI researchers have benefitted from the advancements in this field, explaining how Liopa’s technology uses Computer Vision to read lips