Search

What’s In a Monkey’s Mind?

Image courtesy of Bob Brewer via Unsplash.

Our visual system is akin to a computer. Along the brain’s inferotemporal (IT) cortex, neural computations transform raw sensory inputs into useful representations of the physical world. Current research suggests a surprising parallel between these transformations in the IT cortex and the computer graphics pipelines that convert 3D scenes into view-dependent 2D images.

To perform this 3D–2D transformation, computer graphics uses a generative model, which accounts for features and sensory measurements of a 3D space when it is projected onto a 2D area. The forward direction of the generative model allows for the generation of 2D images. Say you take a ball with a certain size and color/shading—both its physical appearance in 3D and its measurements (both in size and color) would allow us to create a 2D image of the ball.

There are two main hypotheses for how visual transformations are organized in the IT cortex. The prevailing view is that high-level statistics underlie vision: the IT cortex functions like a deep convolutional neural network (DCNN) trained to distinguish between object categories and identities. A study led by Ilker Yildirim, an assistant professor of psychology at Yale, and Hakan Yilmaz, a psychology PhD student, proposes a second hypothesis: the graphics model is present in the IT cortex but used in reverse—a computation they refer to as “inverse graphics.”

The researchers honed in on a specific IT network: the multiarea body-processing network in macaques. In order to study inverse graphics as a mechanism for macaque vision, they made use of both neuroscientific and machine learning approaches. They obtained macaque neural data by taking single-cell recordings from two regions of the IT cortex involved in perceiving bodies: the middle superior temporal sulcus body patch and the anterior superior temporal sulcus body patch. The recordings were acquired as the monkeys were shown images from the “Monkey View Set,” a set of images depicting computer-generated 3D monkey models with different body shapes, postures, and viewpoints.

To model how inverse graphics might manifest in the macaque IT cortex, the researchers created a different kind of DCNN, which they named the Body Inference Network (BIN). Unlike standard DCNNs, which are typically trained to classify objects in images, BIN learns to reverse the graphics-based generative model. “Instead of going from the world to images with computer graphics, now with BIN, we want to be able to learn the inverse mapping from images to 3D worlds,” Yildirim said. BIN works by taking a flat 2D image of a body and predicting its 3D form, including parameters such as shape, pose, rotation, and viewpoint. By showing BIN images from the Monkey View Set, the researchers were able to assess similarities in image processing between the macaque IT cortex and an inverse graphics-based model.

The researchers used a multivariate statistical technique known as representational similarity analysis (RSA) to compare neural and BIN responses to the Monkey View Set images. RSA works like this: imagine each brain area and each layer of the BIN model as a map, with each image represented as a point. The position of each point depends on how strongly the neurons or the model’s units react to that image. By comparing these maps, RSA shows whether the brain and the BIN organize or “see” the images in similar ways. “RSA is how we know whether the representational space with respect to the brain of the stimuli is aligned with the representational space of the stimuli with respect to BIN,” Yildirim said.

The researchers found a high degree of similarity between the representational spaces for the IT cortex and BIN, suggesting that inverse graphics is indeed the underlying computation for 3D vision. “Since the monkeys were just looking passively at the images, we believe that these inverse graphics computations occur spontaneously,” Yildirim said.

The researchers hope that their discovery will inspire the incorporation of inverse graphics in other machine vision systems to better understand primate visual capabilities. Look around yourself right now—what may seem 3D to us may just be our brain’s internal representation of 2D inputs.