Deep convolutional neural networks fail to classify images ‘in the wild’
Michelle Greene, Jennifer Hart, Bates College, United States
Session:
Posters 3 Poster
Location:
Pacific Ballroom H-O
Presentation Time:
Sat, 27 Aug, 19:30 - 21:30 Pacific Time (UTC -8)
Abstract:
In the last decade, deep convolutional neural networks (DNNs) have revolutionized computer vision and are credited with achieving human-level classification accuracy. However, the large-scale image databases that support their training are sampled by convenience from the internet and may reflect more idealized versions of scenes than those found ‘in the wild’. To what extent do these networks generalize to first-person visual experience? We created two datasets that sampled visual experience from cell phones or head-mounted video. Both datasets were classified by a DNN that was pretrained on the Places database (Alexnet), and classification labels were compared to ground truth labels obtained by human observers. Strikingly, the network had much lower top-1 and top-5 accuracies than the Places test set and higher classification entropy. Together, these results reveal a critical gap in the abilities of DNNs, which may have implications for neural models that incorporate DNNs.