Through The Looking Glass: An Overview of Visual Recognition

Niño Ross Rodriguez NR
Face finding from the movie Mission Impossible: Ghost Protocol

It’s like something straight out of a Sci-Fi movie — machines, robots and androids being able to identify objects and faces with ease. But these days, early adaptations of visual recognition or image recognition technology have been made available today through services provided by the likes of Google and IBM.

Let’s take a journey of how computers and devices took the first steps looking into our world.

Google Images[1]

Do you remember the days when Google was just a search engine and all you could search for was simple text? Google developers then expanded their search product and gave users results with images. The search engine indexed millions of images and the Image Search was born.

A few years later Google introduced the Search by Image feature, which allowed users to reverse image search directly into Google Search without any third-party add-ons.

Google Images

“There’s An App For That™”[2]

Technologies were still young and limited back then. IBM Watson was still Deep Blue playing chess. The cloud was still a dodgy place to keep your files. You pretty much relied on Google to search for anything. Image recognition apps or programs were written either by companies with their own algorithm but were too expensive to produce, or piggybacked on Google’s Image Search which was the easier and cheaper solution.

Likely developed at the same time as Google Images, Google went on to release an app version of their image search feature. It was called Google Goggles[3] and it allowed users to search by taking a picture. The app also featured the ability to recognise labels or landmarks without using a text-based search.

This meant people could search for virtually anything, immediately, by using their smartphones and not have the argues wait of getting back to a desktop.

Google Goggles

Visual Recognition and Web Apps

Let’s fast forward today’s date, where technology had a sudden growth spurt and gave us, users and developers, endless possibilities to play with.

We now have the ability to tinker with artificial intelligence, machine learning, deep learning, natural language processing and visual recognition to name just a few. Of course, we can always hire developers with specialties in artificial intelligence and machine learning, but it would cost us an arm and a leg. There are services from IBM Watson[4] and Google Cloud Platform[5] that provide developers ease of use, all with an affordable price tag.

More Than Meets The Eye

With the possibilities being endless, here are a few examples on how we can use these technologies at our disposal:

“Eye See It” App

There may be a time our visually challenged friends may need assistance in identifying an object or reading small text. With an app installed on their smartphones, it would be as easy as taking a picture and the app, replying to the user with voice, telling the users what it has seen. A user could take a picture of an unknown object to them and could reply, “I am most definitely looking at a sports car”.

Centenario Lamborghini

“Plant-Eye-tion” App

A possible idea for the plant lovers out there who may not be as well versed as our gardener friends. Gardeners would be a walking encyclopedia of knowledge of plants, carrying information about the plants they take care of. What if we had an app that by taking a picture of the flower or plant, it would give you the name of the picture taken? Not only that, would it give you the full details of the flower/plant, it could help you keep the thing alive sharing how many times per day it needs to be waters and weather it prefers sun or shade… it can turn the hobbyist into an expert sharing the information relevant to your surroundings.

“Eye Keeper”

Another possibility is curating relevant and safe content to your social media wall in real time. Think about it, you’re at an event and there is a big digital wall that aggregates the pictures taken by the people and its uploaded in social media or in the servers. Currently, you have two choices, you have dedicated team members monitoring and curating the content or you give up full control and allow every post with the relevant tagging to be posted to the wall. One costs a lot of time and money, the other is a huge risk.

The visual recognition service can act as a gatekeeper, analysing the images before it actually reaches the live screen, automatically preventing any inappropriate images being displayed in the big screen.

I Can Show You The World...

Artificial intelligence has taken its first few steps to further see into our world. In the near future, we might not need machine learning and deep learning anymore to “see” and “learn.” Google Lens[6] already took the first step in maximising the potential of the combination of visual recognition, deep learning and augmented reality.

I too, took a step in trying to see what this technology can do. Have a play at our little demo below. I highly suggest that you use your mobile phone for scanning images. Bear in mind though, that this is only a proof of concept. You might still get weird replies from the web app. Happy clicking!

Nat's Eye http://adel.ph/eye

How can we help your system "see" better?