Science 2017-11-08T09:03:08+00:00

Just have a look at some of these 2017 stats:

photo uploads to Facebook every day
photos and videos Instagramed every day
video is uploaded to YouTube every minute
Trademarks registered globally each year
imaging examinations / X-Rays in US each year

More video content is uploaded in 30 days than the major U.S. television networks have created in 30 years – it would take over 100,000 years to watch it all…

and over one million years to tag it all by hand.

The Internet landscape is being fundamentally changed because of all these videos and images – which is predominantly user generated. The reality is that our visual economy is generating images exponentially faster than companies and individuals’ capability to understand and profit from them.

Only a small part of visual content is currently annotated (if at all) and mostly relies on human intervention – which is time-intensive, subjective, expensive, task dependent – and is limited by what can be described by text. The vast majority of this visual content contains no meaningful metadata at all and therefore cannot be understood by machines or harnessed by organisations. The world is visually-impaired.

Whether it be through a 10 second loop shared on Snapchat or a 5 minute Facebook Live video, as social media evolves, it is crucial that businesses learn how and where they can most effectively achieve consumer connectivity.

“WeSee is able to generate enhanced value from video and images.”

WeSee believes that by enriching known information about any visual content, its value will increase. Armed with automatic detection, extraction and analysis, our team of seasoned media professionals, mathematicians and scientists are aligned to solve key problems facing the digital industry. Using deep learning algorithms to bring insights into visual content not only benefits the companies producing or collating image and video material, but those who are tasked with understanding, utilising and monetising it.

WeSee can unlock this data because unlike other systems, we don’t just read metadata,
we see and understand the very DNA of visual content.

How does it work?

WeSee’s deep learning-based computer vision imitates the human brain’s ability to understand and process all we see. Utilising the latest in artificial intelligence, our core Visual Intelligence Engine (VIE), is developed with a biologically-inspired programming patterns called a Convolutional Neural Network (CNN) – an artificial neural network that specialises in analysing visual imagery.

By emulating the response of an individual neuron to visual stimuli and then mimicking the interconnections between those neurons the way our brain does, our VIE uses relatively little pre-processing compared to other image classification techniques. The major advantage of our VIE is the ability to learn filters that previously were hand-engineered – analysing multiple discrete layers in content and applying our rule-based algorithms – to effectively read visual content in a more human-like fashion – and, in some scenarios, even better than humans.

Unlike other systems that are based upon open-sourced code and frameworks trained on open-source data, our technology is a new smart approach to video classification; by combining machine learning, deep learning and proprietary rule-based algorithms, alongside specialists dedicated to data collating, sorting and tagging.

Let’s think this through in real world terms

Rather than simply seeing data, like the individual letters on a sign, our processing goes deeper to understand the shape or colour of the sign, such as a red octagon. We look at surroundings, like the fact it is next to a road. All of these multiple layers of scale, details and context bring human-like perception to know that it is a ‘stop sign’.

Or, maybe when presented with a black blob within an image; does it have eyes, ears or legs? How are these in relation to each other? What is it doing? Where is it? In reading multiple layers and cue points simultaneously, the way humans do in milliseconds, we decipher it’s not just a piece of coal, but a ‘black cat’.

[rev_slider alias=”science_how” /]

Now, what if the sign or cat goes out of view within the video stream? What if the camera angle changes? Our VIE is unique in that it still understands the context of the visual content and is able to classify the piece as ‘automotive’ or ‘nature’ accordingly. Overlaid with pre-defined expressions, you can then understand emotions. Processing something that is not acting how you would expect, or something is where it shouldn’t be, you can filter adult content or highlight medical discrepancies. Customised to your needs, VIE is able to build upon the contextual relevance of any visual content, making it an incredibly powerful tool for analysing your image or video material – just better and faster than humans can.

So, whether you are trying to identify a troubled cat within a video stream, or identifying troublesome indicators in a CAT-scan, the power of WeSee is about to realise visual content in incredible new ways.

The end of the road for little white lies

Beyond merely recognising people and faces, our Visual Intelligence Engine is poised to read feelings, intentions and moods – to bring emotional intelligence at scale. By training our systems to look at suspicious behaviour, WeSee technology will help reduce fraud in insurance claims and recruitment applications.

Just some of our use cases

From broadcast content to advertising technology and beyond. These are just some of the ways in which we have helped our global clients so far.