David Bowie famously sang about ‘the gift of sound and vision”, two seemingly simple procedures that we take for granted, but how easy are they really? For example – someone throws you a ball, and you catch it – what could be simpler?
If you break this down, it’s actually an incredibly complex action. The image of the ball passes through your eye to your retina, which processes it quickly before transmitting it to the brain, where it receives more comprehensive analysis from the visual cortex. From there, within a split second, the image is transmitted throughout the cortex, which compares it to all previous knowledge, classifies the object and situation, causing an action: raising your hand to catch the ball (having predicted its path). How do you possibly recreate this in a machine?
Scientists have been working towards this since the 50’s, faced with the challenges of replicating three different aspects of human consciousness; the eye, the visual cortex, the rest of the brain.
These are all incredibly complex challenges, but scientists have had most success creating analogues for the eye, creating sensors and image processors that match, and in some cases even surpass the human eye. With larger, more optically sophisticated lenses, working with sub-pixels at the scale of nano-meters, and the ability to record thousands of images per second, the precision of modern cameras is unprecedented.
No matter how sophisticated these devices, compared to the human brain, they’re closer to pinhole cameras – they could never recognise, nor catch a ball!
While the hardware’s great, the software’s the problem.
The Art of Vision
The human brain’s the most sophisticated organ in the body, virtually a vision machine, with billions of cells dedicated to decoding unfiltered images from the retina. Sets of neurons work together to recognise certain patterns, with higher level networks acknowledging meta patterns.
This is all incredibly complex, but the massive strides recently made in computational power have made improvements. However, while you could theoretically build a system to process every possible configuration of a movement, the brain is way more sophisticated. The human mind is the ultimate operating system, uniting memory, attention and cognition in the most complex neural network imaginable. The challenge facing computer scientists and AI specialists is the old philosophical conundrum; how can the brain (and consciousness) know and describe itself?
However, scientists aren’t giving up that easy – the future of computer vision lies in integrating the powerful but specific systems we’ve created with ones that work in a wider context, simulating memory or attention. This is an incredibly exciting time to enter this field – there are great courses available to get you introduced – check out these offers from Groupon.
Computer vision is only in its early stages, but it’s already everywhere. It’s in our cameras, recognising faces and smiles. It’s in self-driving cars, reading traffic signs and waiting for pedestrians. It’s in factory robots, monitoring problems and navigating around human co-workers, when not replacing them in their jobs.
Humans are still much better at seeing and reacting to visual stimuli than machines, but it’s a minor miracle that machines can see and react at all.