New Vision System Developed By MIT Could Be Used In Household Robots

Researchers found a new method for developing object recognition in artificial intelligence.

Researchers found an "off-the-shelf" algorithm to aggregate different perspectives, allowing artificial intelligence to recognize four times as many objects compared to those that use only one perspective. The researchers then presented an even newer algorithm that is 10 times as fast as the previous one, and could be used in household robots, MIT reported.

"If you just took the output of looking at it from one viewpoint, there's a lot of stuff that might be missing, or it might be the angle of illumination or something blocking the object that causes a systematic error in the detector," says Lawson Wong, a graduate student in electrical engineering and computer science and lead author on the new paper. "One way around that is just to move around and go to a different viewpoint.

To make their finding the researchers looked at scenarios that contained 20 to 30 different images of household objects on a table, some of which had multiples of the same object.

They identified an algorithm that "doesn't discard any of the hypotheses it generates across successive images, but it doesn't attempt to canvass them all, either," the researchers reported. Instead the algorithm samples the hypothesis at random; since there is an overlap between these hypotheses, a number of samples tends to find a consensus on the correspondence of the two images. To keep these required samples low the researchers found a way to simplify the hypotheses.

"Suppose that the algorithm has identified three objects from one perspective and four from another. The most mathematically precise way to compare hypotheses would be to consider every possible set of matches between the two groups of objects: the set that matches objects 1, 2, and 3 in the first view to objects 1, 2, and 3 in the second; the set that matches objects 1, 2, and 3 in the first to objects 1, 2, and 4 in the second; the set that matches objects 1, 2, and 3 in the first view to objects 1, 3, and 4 in the second, and so on. In this case, if you include the possibilities that the detector has made an error and that some objects are occluded from some views, that approach would yield 304 different sets of matches," the researchers reported.

As an alternative the researchers' algorithm considered each object in the first group separately and evaluated the likelihood of mapping onto an object in the second group; this approach required only 20 comparisons.

The findings were published in a recent edition of the International Journal of Robotics Research.