Brain-inspired computer vision

Brain-inspired computer vision

A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under various nonideal circumstances, such as partial occlusion and different view angles. Replication of such capabilities in a machine would require three key ingredients: (i) access to large-scale perceptual data of the kind that humans experience, (ii) flexible representations of objects, and (iii) an efficient unsupervised learning algorithm.

Most existing object recognition programs rely on supervised training of algorithms using bounding boxes or object labeling of hundreds of images. However, the human brain learns to recognize objects in varied contexts without repeated training.

Inspired by the brain’s unsupervised learning ability, researchers incorporated basic computational principles that the brain likely uses to perform visual recognition and developed Structural Unsupervised Viewlets Models (SUVMs) of humans, cars, and airplanes, among other targets.

The Internet fortunately provides unprecedented access to vast amounts of visual data. This paper leverages the availability of such data to develop a scalable framework for unsupervised learning of object prototypes—brain-inspired flexible, scale, and shift invariant representations of deformable objects (e.g., humans, motorcycles, cars, airplanes) comprised of parts, their different configurations and views, and their spatial relationships.
The authors developed a series of viewlets, which are images depicting pieces of objects, such as an arm, in different poses or orientations, together with a spatial map of how the pieces mesh together to create an entire object. The authors tested the SUVMs on two existing visual datasets.

The face and human SUVMs recognized human faces correctly without false positives. The airplane SUVM performed less well, a result that the authors attribute to the relatively small number of training images presented. Adding relative probability to viewlets could increase the sensitivity of the models, which could be potentially applied to videos, according to the authors.