Objects or Context? Learning From Temporal Regularities in Continuous Visual Experience With an Infant-inspired DNN
Cliona O'Doherty, Rhodri Cusack, Trinity College Dublin, Ireland
Posters 2 Poster
Pacific Ballroom H-O
Fri, 26 Aug, 19:30 - 21:30 Pacific Time (UTC -8)
Current deep neural network (DNN) models of human vision are focused on static, unnaturalistic supervised learning mechanisms, which are not present in human infants and which ignore the dynamics of naturalistic experience. Here, we implement an infant-inspired learning mechanism into a self-supervised DNN, by using contrastive learning to find commonalities in naturalistic video over various timescales. We hypothesised that commonalities across longer timescales (e.g., one minute) would reflect scene context, which changes relatively slowly. We assessed learned representations with test images in which objects or backgrounds were changed. We found that the temporal contrastive learning approach led to representations that reflected scene context more than a baseline supervised network, which learned an object-centric embedding. However, at longer (5 min) timescales, object and context knowledge could both be learned. This illustrates that temporal structure in naturalistic visual inputs can be a powerful resource for learning, and demonstrates the importance of embracing dynamic training signals when implementing more human-like DNN models.