Social Inference from Relational Visual Information: An Investigation with Graph Neural Network Models
Manasi Malik, Leyla Isik, Johns Hopkins University, United States
Posters 1 Poster
Pacific Ballroom H-O
Thu, 25 Aug, 19:30 - 21:30 Pacific Time (UTC -7)
Humans effortlessly recognize social interactions from visual input, such as distinguishing helping versus hindering interactions. Attempts to model this ability typically rely on generative inverse planning models, which make predictions based on simulations of agents' inferred goals. However, these models are computationally expensive and intractable on natural videos. Further, evidence suggests that recognizing social interactions is largely a visual process, separate from complex mental simulation. Yet, bottom-up visual models have not been able to reproduce human behavior. We hypothesize that humans rely on relational visual information in particular, which is lacking from standard neural network models, to recognize social interactions. We propose a graph neural network model, SocialGNN, that uses relational visual information to recognize social interactions between agents. We find that SocialGNN aligns with human interaction judgments significantly better than a matched neural network model without graph structure and, unlike inverse planning models, can operate on both animated and natural videos. These results show that adding relevant inductive biases to artificial vision systems allows them to make human-like social judgements without incurring high computational costs. Our findings further show that humans can make complex social interaction judgements based on visual information alone, and may rely on structured, graph-like representations.