Blog

All Blog Posts

Azure Cognitive Services: Computer Vision

Azure Computer Vision Banner

Azure provides a robust set of artificial intelligence (AI) features called Cognitive Services that developers can easily utilize within their own applications. The Cognitive Services offering is currently divided into five main categories: Vision, Speech, Language, Knowledge, and Search.

One of the AI services under the Vision category is Computer Vision. This service analyzes an image and extracts information about the content within the image.

Examples of what Computer Vision can do include:

  • Tag visual items
  • Categorize an image
  • Describe an image
  • Recognize faces within an image
  • Evaluate colors and generate thumbnails
  • Provide details for moderating content within images

Tags, Category, and Description

Computer Vision can tag visual items within an image based on more than 2,000 recognizable objects, such as people, scenery, and actions. Along with each tag, Computer Vision will include a confidence score. The results are sorted by the confidence score from highest to lowest.

In addition to tagging, it is also able to return taxonomy-based categories based on a list of 87 concepts, like faces, food, nature, and abstract.

Using the tags, Computer Vision will generate a description of the image displayed as human-readable text in a complete sentence.

Let’s look at an example:

Pug
Pic: Instagrammer @snissenful

Tags

Name

Confidence

dog

0.99915123

sitting

0.9753965

indoor

0.9698432

black

0.8994449

laying

0.7277232

white

0.6921732

pug

0.6921732

bulldog

0.11564628

 

Category

Name

Confidence

animal_dog

0.98828125

 

Description 

Caption

Confidence

a large black dog lying on the ground

0.8761196

Above are the results returned from Computer Vision for an image of my pug, Jasmine. For the tags, it was able to identify a black and white dog lying indoors. Also, based on the confidence scores returned, it was able to determine the dog is most likely a pug. It returned the correct category of animal – dog, but the description of “a large black dog lying on the ground” isn’t quite accurate. It should be “a small, portly pug flopping on the couch.”

Faces

Face detection is another feature of Computer Vision. This AI technology provides the ability to detect human faces within images and return the face coordinates as well each person’s gender and age.

Far Reach Partners

Faces

index

age

gender

coordinates

0

52

Male

top

167

left

558

width

65

height

65

 

1

51

Male

top

77

left

666

width

64

height

64

 

2

43

Female

top

117

left

311

width

54

height

54

 

3

34

Male

top

69

left

149

width

52

height

52

 

4

49

Female

top

126

left

459

width

50

height

50

 

  

Above are the results returned from Computer Vision for an image of the Far Reach partners. For the faces, it was able to detect all 5 faces of the partners and identify the appropriate gender. The estimated ages were accurate for a couple partners but the others were off by 10 years…some of us must be aging faster than others.

Colors

Computer Vision can also perceive color schemes using an algorithm that is capable of extracting individual colors from an image. The colors are analyzed in three different contexts: foreground, background, and as a whole. An accent color is extracted from an image and represents the most visible color to users through a mix of dominant colors and saturation.

Flower
Pic: Instagrammer @snissenful


Tags

Name

Confidence

plant

0.964872241

flower

0.942176044

red

0.8848852

garden

0.133545712

flora

0.07647245

leaf

0.0756237

 

Category

Name

Confidence

plant_flower

0.85546875

 

Description

Caption

Confidence

a close up of a flower

0. 983600438

 

Colors

Dominant color background

 

Dominant color foreground

Red

Accent color

#C40724

In this example, Computer Vision determined a dominant background color of green, a dominant foreground color of red, and an accent color with the hex value of #C40724. Not bad!

Thumbnails

Not all images are suited for all devices. Therefore, it’s sometimes necessary to generate different thumbnail sizes to provide a better user experience on certain devices. Computer Vision is able to generate thumbnails by identifying a region of interest (ROI). It uses a thumbnail algorithm to recognize the main object, which is the region of interest, and remove distracting elements from the image. It then crops the image based on the identified ROI. And finally, it changes the aspect ratio to fit the target thumbnail dimensions. 

Cat
Pic: Instagrammer  @snissenful

In this example, the original image seen above results in the thumbnail images below. The thumbnail images have been automatically cropped according to the different target thumbnail dimensions based on the region of interest, which in this case is the kitty in a bucket in the picture. This feature is referred to as smart cropping.

cat thumbnails
Pic: Instagrammer @snissenful

There are several other capabilities within Computer Vision beyond those described above. You can also…

  • Identify celebrities and landmarks utilizing models
  • Define your own models for performing custom image recognition
  • Extract text using optical character recognition (OCR) and handwriting recognition
  • Evaluate images for potential adult content to assist with content moderation
  • Determine other image metadata such as dimensions, image type, and format

A picture is worth a thousand words. And with the help of Computer Vision, developers can easily determine a few of those words through code.