In this paper we present a multimodal approach to categorizing user posts based on their discussion topic. To integrate heterogeneous information extracted from the posts, i.e. text, visual content and the information about user interactions with the online platform, we deploy graph convolutional networks that were recently proven effective in classification tasks on knowledge graphs. As the case study we use the analysis of violent online political extremism content, a challenging task due to a particularly high semantic level at which extremist ideas are discussed. Here we demonstrate the potential of using neural networks on graphs for classifying multimedia content and, perhaps more importantly, the effectiveness of multimedia analysis techniques in aiding the domain experts performing qualitative data analysis. Our conclusions are supported by extensive experiments on a large collection of extremist posts. This research was produced with the aid of VOX-Pol Research Mobility Programme funding and supervision by VOX-Pol colleagues at Dublin City University.