Using object reconstruction as top-down attentional feedback yields a shape bias and robustness in object recognition
Seoyoung Ahn, Hossein Adeli, Gregory Zelinsky, Stony Brook University, United States
Posters 2 Poster
Pacific Ballroom H-O
Fri, 26 Aug, 19:30 - 21:30 Pacific Time (UTC -8)
Many theories of vision posit the existence of top-down inference in visual perception, but little is known about how this visual inference occurs in the brain and the role it plays in robust object recognition. Here we built an iterative encoder-decoder network that generates an object reconstruction—a visualized prediction about the possible appearance of an object—and uses it as top-down attentional feedback to bias the feed-forward processing into forming one globally coherent object representation (e.g., shape). We tested this model using the challenging out-of-distribution digit recognition task, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. The proposed model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedfoward CNNs and other advanced models (e.g., adversarially trained networks). Our model is particularly robust to corruptions such as blur, noise, and occlusion, where shape perception plays an important role, consistent with our suggestion that an object reconstruction is used by top-down attention to impose a shape bias on the perception of an object in the visual input.