COMPUTER VISION
COMPUTER VISION
Computer vision is a field of Artificial Intelligence (AI) that enables computers and systems to derive meaningful information from digital images , videos and other visual inputs - and take actions or make recommendations based on that information . If AI enables computers to think , computer vision enables them to see , observe and understand .
Computer vision works much the same as human vision , except humans have a head start . Human sight has the advantage of lifetimes of context to train how to tell objects apart , how far away they are , whether they are moving and whether there is something wrong in an image .
Computer vision trains machines to perform these functions , but it has to do it in much less time with cameras , data and algorithms rather than retinas , optic nerves and a visual cortex . Because a system trained to inspect products or watch a production asset can analyze thousands of products or processes a minute , noticing imperceptible defects or issues , it can quickly surpass human capabilities .
Computer vision is used in industries ranging from energy and utilities to manufacturing and automotive - and the market is continuing to grow . It is expected to reach USD 48.6 billion by 2022 .
HOW DOES IT WORK :
Computer vision needs lots of data . It runs analyses of data over and over until it discerns distinctions and ultimately recognize images . For example , to train a computer to recognize automobile tires , it needs to be fed vast quantities of tire images and tire-related items to learn the differences and recognize a tire , especially one with no defects .
Two essential technologies are used to accomplish this : a type of machine learning called Deep Learning and a convolutional neural network ( CNN ) .
Machine learning uses algorithmic models that enable a computer to teach itself about the context of visual data . If enough data is fed through the model , the computer will “ look ” at the data and teach itself to tell one image from another . Algorithms enable the machine to learn by itself , rather than someone programming it to recognize an image .
A CNN helps a machine learning or deep learning model “ look ” by breaking images down into pixels that are given tags or labels . It uses the labels to perform convolutions ( a mathematical operation on two functions to produce a third function ) and makes predictions about what it is “ seeing ” . The neural network runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true . It is then recognizing or seeing images in a way similar to humans .
Much like a human making out an image at a distance , a CNN first discerns hard edges and simple shapes, then fills in information as it runs iterations of its predictions . A CNN is used to understand single images . A recurrent neural network ( RNN ) is used in a similar way for video applications to help computers understand how pictures in a series of frames are related to one another .
COMPUTER VISION APPLICATIONS :
There is a lot of research being done in the computer vision field , but it’s not just research . Real - world applications demonstrate how important computer vision is to endeavors in business , entertainment , transportation , healthcare and everyday life . A key driver for the growth of these applications is the flood of visual information flowing from smartphones , security systems , traffic cameras and other visually instrumented devices . This data could play a major role in operations across industries , but today goes unused . The information creates a test bed to train computer vision applications and a launchpad for them to become part of a range of human activities :
- 1.) IBM used computer vision to create My Moments for the 2018 Masters golf tournament . IBM Watson watched hundreds of hours of Masters footage and could identify the sights ( and sounds ) of significant shots . It curated these key moments and delivered them to fans as personalized highlight reels .
- 2.) Google Translate lets users point a smartphone camera at a sign in another language and almost immediately obtain a translation of the sign in their preferred language .
- 3.) The development of self - driving vehicles relies on computer vision to make sense of the visual input from a car’s cameras and other sensors . It’s essential to identify other cars , traffic signs , lane markers , pedestrians , bicycles and all of the other visual information encountered on the road .
- 4.) IBM is applying computer vision technology with partners like Verizon to bring intelligent AI to the edge , and to help automotive manufacturers identify quality defects before a vehicle leaves the factory .
Many organizations don’t have the resources to fund computer vision labs and create deep learning models and neural networks . They may also lack the computing power required to process huge sets of visual data . Companies such as IBM are helping by offering computer vision software development services . These services deliver pre - built learning models available from the cloud - and also ease demand on computing resources . Users connect to the services through an application programming interface ( API ) and use them to develop computer vision applications .
IBM has also introduced a computer vision platform that addresses both developmental and computing resource concerns . IBM Maximo Visual Inspection includes tools that enable subject matter experts to label , train and deploy deep learning vision models - without coding or deep learning expertise . The vision models can be deployed in local data centers , the cloud and edge devices .
While it’s getting easier to obtain resources to develop computer vision applications , an important question to answer early on is : What exactly will these applications do ? Understanding and defining specific computer vision tasks can focus and validate projects and applications and make it easier to get started .
Here are a few examples of established computer vision tasks :
- Image classification sees an image and can classify it ( a dog , an apple , a person’s face ) . More precisely , it is able to accurately predict that a given image belongs to a certain class . For example , a social media company might want to use it to automatically identify and segregate objectionable images uploaded by users .
- Object detection can use image classification to identify a certain class of image and then detect and tabulate their appearance in an image or video . Examples include detecting damages on an assembly line or identifying machinery that requires maintenance .
- Object tracking follows or tracks an object once it is detected . This task is often executed with images captured in sequence or real - time video feeds . Autonomous vehicles , for example , need to not only classify and detect objects such as pedestrians , other cars and road infrastructure , they need to track them in motion to avoid collisions and obey traffic laws .
- Content-based image retrieval uses computer vision to browse , search and retrieve images from large data stores , based on the content of the images rather than metadata tags associated with them . This task can incorporate automatic image annotation that replaces manual image tagging . These tasks can be used for Digital Asset Management systems and can increase the accuracy of search and retrieval .
Comments
Post a Comment