Visual Affect Around the World

Every culture and language is unique. This project expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of sentiment- and emotion-polarized visual concepts by adapting semantic structures called adjective-noun pairs in a multilingual context. We designed a new language-dependent method for automatic discovery of these adjective-noun constructs. Using a social multimedia platform, we used our pipeline to create a large-scale multilingual visual sentiment concept ontology (MVSO). Our unified ontology is organized hierarchically by multilingual clusters of visually detectable nouns and sub-clusters of emotionally biased versions of these nouns. As a part of our effort to better under visual affect, we release here the dataset of >15.6K sentiment-biased visual concepts across 12 languages with language-specific detector banks, >7.36M images and their metadata.

This webpage provides the above mentioned material for download and is structured as the following:

To download the dataset, please click HERE


The construction process of our multilingual visual sentiment ontology (MVSO) begins with crawling images and metadata based on emotion keywords. Image tags (t1, ..., t5) are labeled with part-of-speech tags, and adjectives and nouns are used to form candidate adjective-noun pair (ANP) combinations, while others are ignored (in red). Finally, these candidate ANPs are filtered based on various criteria which help remove incorrect pairs (in red).

We perform a two-stage clustering that enables us to hierarchically organize our ANPs across languages and form a multilingual ontology over visual sentiment concepts. We discovered 3,329 sub-clusters of ANP concepts, e.g. resulting in clusters where ANPs like "little pony" and "little horse" co-occur. In the example clustering below, we projected data to 2D using t-SNE. On the left, six clusters composed of different sets of nouns are shown with clusters of sunlight/rays/glow and dog/cat/pony. On the right, we show the sub-clustering of ANPs for the dog/cat/pony cluster in Cluster A, giving us noun groupings modified by sentiment-biasing adjectives to get ANPs like "funny dog"/"funny cats" and "adopted dog"/"abandoned puppy".

The Multilingual Visual Sentiment Ontology comes in two flavors: fully automatic and crowd-verified (15,630 ANPs), both covering 12 languages.

Image Data

This part of the dataset consists the set of 7,368,364 URL links to Flickr images associated with the full MVSO. Due to copyright reasons, we cannot distribute the original images and we make no guarantees that the provided URL links are persistent. In order, the three largest image sets come from English, Spanish, and Italian.

On the right, we show example images from "around the world" organized by affective visual concepts. Top set shows images of old market concept from three different cultures/languages; and the bottom, images of good food. Even though conceptual reference is the same, each culture’s sentimental expression of these concept may be adversely different.

ANP Sentiment Scores

We compute sentiment scores for each ANP using two popular and publicly available sentiment tools SentiStrength and SentiWordnet. Details for how we combined the score outputs from these tools and applied them across all languages can be found in our paper. In a separate study, we assessed the fidelity of these automatic sentiment scores by crowd-verification and confirmed the net distortion by these automatic tools is quite small. Median sentiment scores per language are shown below.

In addition to the sentiment scores, we also provide emotion score mappings for each ANPs across all languages. Emotion scores were computed via smoothed co-occurrence counts between ANPs and an expanded dictionary of emotion keywords in image tag data.

Detector Banks

We constructed several benchmark models for detecting ANP visual concepts using a convolutional neural networks (CNNs). We fine-tuned an AlexNet architecture on our MVSO image dataset for the six major languages using Caffe. These models achieve dramatically better concept detection performance compared to previous ANP works like (Borth et al. 2013) and (Chen et al. 2014).


To download the MVSO data, please fill out the following fields. We will send you download instructions via email immediately. People who request and use the dataset (or code) must agree that 1) the use of the data is restricted to research or education purpose only, 2) all copyright and license restrictions associated with the dataset/code will be followed, and 3) the authors of MVSO and their affiliated organizations make no warranties regarding the database or software, including but not limited to non-infringement.

Requested Data:   

Ontology      Image Data      ANP Sentiment Score    Detector Banks    Image Sentiment Dataset 

Name:   Affiliation:   Email:  

Citation for MVSO dataset

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Brendan Jou*, Tao Chen*, Nikolaos Pappas*, Miriam Redi*, Mercan Topkara* and Shih-Fu Chang.
ACM Int'l Conference on Multimedia (MM), Brisbane Australia, 2015. [PDF]

Citation for MVSO Image Sentiment Dataset

Columbia MVSO Image Sentiment Dataset [arXiv]
Vaidehi Dalmia, Hongyi Liu and Shih-Fu Chang.