Open Access

Large-scale geo-facial image analysis

  • Mohammad T. Islam1,
  • Connor Greenwell1,
  • Richard Souvenir2 and
  • Nathan Jacobs1Email author
EURASIP Journal on Image and Video Processing20152015:17

DOI: 10.1186/s13640-015-0070-9

Received: 31 January 2015

Accepted: 11 May 2015

Published: 10 June 2015

Abstract

While face analysis from images is a well-studied area, little work has explored the dependence of facial appearance on the geographic location from which the image was captured. To fill this gap, we constructed GeoFaces, a large dataset of geotagged face images, and used it to examine the geo-dependence of facial features and attributes, such as ethnicity, gender, or the presence of facial hair. Our analysis illuminates the relationship between raw facial appearance, facial attributes, and geographic location, both globally and in selected major urban areas. Some of our experiments, and the resulting visualizations, confirm prior expectations, such as the predominance of ethnically Asian faces in Asia, while others highlight novel information that can be obtained with this type of analysis, such as the major city with the highest percentage of people with a mustache.

Keywords

Faces Geolocation Images Facial attributes

1 Introduction

What do people look like in southern Kenya? What does the average person from Tokyo look like and how do they differ from people in Jakarta or Los Angeles? Such questions are the focus of anthropological studies on human diversity [1], where the traditional approach relies on direct observation, which requires extensive manual effort. This severely limits the types of questions that can be addressed. A computational model of such variations could greatly expand our understanding of contemporary human diversity and enable applications in a wide range of disciplines, including the following: anthropology, sociology, fashion, security, and computer graphics. This avenue of analysis is enabled by the convergence of two phenomena. First, every day, a growing number of (geotagged) images are uploaded to social media sites. On one popular social media site [2], geotagged photos are uploaded at a rate of around 500 per minute, or 260 million per year. Second, the state-of-the-art algorithms in computer vision have reached a level of accuracy and robustness that allows detailed scene information (e.g., people, objects, background) to be automatically extracted from images.

Our goal in this work is to explore and analyze the geospatial structure of facial appearance using publicly available imagery (Fig. 1). To support this effort, we constructed a dataset by gathering geotagged images and extracting aligned frontal face patches. This resulted in a dataset, GeoFaces, of approximately 0.8 million geotagged faces, which, to our knowledge, is the largest publicly available dataset of its kind. In addition, for each facial image patch, we also provide automatically extracted visual attributes such as gender, ethnicity, and facial hair.
Fig. 1

Human facial appearance differs for many reasons, including ethnicity, gender, and hair style. In this work, we explore the relationship between such visual attributes and geographic location

We use the GeoFaces dataset to explore the location dependence of human face appearance and visual attributes using a variety of statistical models. This analysis highlights the strong underlying patterns hidden in the data. In addition to the dataset, the main contributions of this work are the following: (1) visualizations, constructed using techniques from machine learning that highlight the geo-dependence of visual appearance and facial attributes, (2) quantitative results that further illuminate the dependence, and (3) an evaluation of several methods for estimating the location of a face at the continental, sub-continental, and country scale.

2 Related work

The advent of inexpensive GPS-enabled cameras and persistent connectivity has led to a profusion of publicly available, geotagged imagery. Such images have been used to extract a wide variety of geospatial information, including 3D scene models [3], local weather conditions [4], land cover type [5], and architectural styles [6]. To our knowledge, little work has used automated approaches for exploring the relationship between facial appearance and geographic location. Despite limited research on the geo-dependence of face appearance, there is a significant amount of research in a number of related problems.

2.1 Face image analysis

The human face is one of the most intensely studied object types in computer vision, with active research on a variety of subproblems. We give a brief overview of recent work in the following areas: detection [7, 8], pose normalization [7, 9, 10], attribute estimation [1012], and recognition/verification [13, 14]. Modern approaches for face detection use machine learning techniques to automatically determine if an image patch contains a face. A variety of methods have been proposed, including the approach by Shen et al. which uses exemplar-based image retrieval [7] and the approach by Scherbaum et al. [8], which uses a traditional AdaBoost-based technique augmented with novel synthetic training imagery. Approaches for pose normalization [7, 9, 10] use either 2D or 3D warping and often rely on the output from the detector to guide the selection of warp parameters. For attribute estimation, Kumar et al. developed a method for pairwise face verification by comparing sets of human-describable features and visually descriptive similes [11]. Another approach built generative models for opposing facial attributes (smiling-to-frowning, etc.) [12]. Xiong et al. recently introduced IntraFace, a tool for identifying human facial features [10]. Recent work in face recognition has progressed along two fronts, developing methods for extracting more robust features [13] and using improved learning-based algorithms for classification [14].

We make use of recently developed commercial and academic tools for face detection, pose normalization, and attribute extraction. Our work uses these tools to address a higher-level question, “How does expected face appearance depend on location?”

2.2 Large-scale image datasets

Many large-scale image datasets have been introduced recently to advance research in vision-related areas such as object detection [1517], classification [18, 19], and outdoor scene analysis [20, 21]. Similarly, for the task of image geolocalization, datasets [2225], which contain millions of geotagged images, have been collected from Internet search engines and photo-sharing websites. We use a similar approach for constructing the GeoFaces dataset, except that the existing datasets have focused on scenes and our focus is faces. Many large facial image datasets have been developed, but most are targeted at facial recognition and have size on the order of several thousands [2628]. To our knowledge, the large-scale geotagged face image dataset we have constructed is the first of its kind. It is significantly larger than existing face datasets and is the only one that provides geotagged imagery.

2.3 Geo-dependence of scene appearance

The relationship between scene appearance and location has been of significant recent research interest, with work focusing on image localization [25, 2932], detecting architectural styles [33], and extracting geo-informative features [32, 34]. These methods attempt to automatically discover and/or exploit the relationship between scene appearance and geographic location. We address many of the same issues but extend this line of research and examine the geo-dependence of facial appearance and facial attributes.

3 GeoFaces

To build GeoFaces, a large dataset of geolocated face patches, we downloaded geotagged imagery from Flickr [35] with face-related tags (e.g., face, portrait, men, family, friends). For each image, a commercial face detector [36] was used to detect faces and fiducial points. The detector was tuned to find frontal (or nearly frontal) faces. From 3.14 million images, 3.8 million face patches were extracted. For each face patch, the detector reported the estimated pose direction and detection confidence and also the locations and confidences of pre-defined fiducial control points (e.g., eyes, nose, mouth). Each face patch was automatically aligned to a common reference frame using a similarity transform, with eye centers as control points.

3.1 Dataset validation and cleanup

To eliminate false positives and non-frontal faces, we filtered the original set of facial image patches. Initially, we relied on the confidence values provided by the face detection software. Specifically, we retained images with an estimated pose of zero degrees (directly facing the camera), and we empirically determined that a detection confidence greater than 600 (the face detector assigns a confidence value between 0 to 1000 to each of the detected faces) filtered most of the non-face false positive detections. This simple thresholding preserved roughly 30 % of the face patches and eliminated most of the non-frontal patches or face-like patterns that were initially detected.

While the detector was, in general, quite reliable, we observed that the detection confidence values and pose estimates were often unreliable for small image patches (i.e., inter-pupillary distance of ≤10 pixels). For additional filtering, we trained a classifier using the detected pose and the correlation of the intensity gradient of the image patch with a set of reference faces as features. Using roughly 100 examples (split evenly between positive and negative front-facing patches), we trained a C-support vector machine (SVM) classifier with linear kernel (c=1) [37]. Of the 3.8 million original face patches, this process resulted in 0.8 million face patches in GeoFaces. Visual inspection of the resulting images suggests more accurate filtering than relying solely on the confidence estimates of the face detector. Figure 2 shows representative initial detections and final aligned patches from the dataset. The geographic locations of these patches are shown in Fig. 3.
Fig. 2

Representative images before (a) and after (b) processing

Fig. 3

Distribution of images in the GeoFaces dataset

3.2 Extracting facial attributes

For each face in our dataset, we extracted facial attributes using IntraFace [10]. The software computes five facial attributes: beard, mustache, gender, glasses, and race. Except for glasses (“Eyeglasses”, “Sunglasses”, “No glasses”) and race (“Asian”, “Black”, “Indian”, “White”), the attributes are binary. The output is a real value for each attribute that reflects the degree of confidence in the selected label.

Currently, we make a hard assignment to the particular binary/categorical label and discard the confidence values. While it may be preferable in some cases to use the confidence values, such as when an image is labeled as “No beard” and “No mustache” but the person has visible stubble, we find that the final labels are quite accurate.

3.3 GeoFaces summary

The image processing pipeline (face detection, alignment, attribute detection) took roughly 5 s per image, with most of the computation spent on attribute detection. GeoFaces will evolve as we collect more images and improve the methods for detecting, aligning, and filtering. The full dataset, including face patches and visual attribute values, is freely available online [38]. The remainder of this work describe various ways of using this dataset to better understand the relationship between human appearance and geographic location. See the Appendix for additional non-geographic analysis of the dataset, such as how the expected size of a face relates to the textual tags of the enclosing image.

4 Geo-dependence of facial appearance and attributes

Using GeoFaces, we visualized trends in facial appearance and attributes both across the globe and within smaller regions.

4.1 Geo-facial appearance

We explored the relationship between facial appearance and geographic location using Principal Component Analysis (PCA), a statistical model that is commonly called Eigenfaces [39] when applied to facial image analysis. For computational efficiency and to remove background clutter, each face patch was resized to 200 × 200 and pixels outside of a pre-defined elliptical region were ignored. This elliptical mask has major and minor axes of 157 pixels and 130 pixels, respectively, and has a center at the middle of the face patch. We construct a vector from all RGB pixel values under the mask and use a standard approach, based on the singular value decomposition (SVD), to estimate the global average face, μ; the PCA components (Eigenfaces), E; and the coefficients for each image, C i . Figure 4 shows the top three Eigenfaces and corresponding distribution of PCA coefficients mapped by image location. Based on observing multiple regions of smoothly varying coefficients, the first and third Eigenfaces appear to be related to geographic location. The effect is less visible with the second Eigenface, which appears to encode the direction of lighting on the face.
Fig. 4

Geographic distribution of Eigenface coefficients. Each map shows the expected value of the coefficient of the corresponding Eigenface image across the globe (blue) and indicates lower and higher values, respectively

Using the Eigenface representation, we estimated average faces for different parts of the world and observe, perhaps unsurprisingly, that the expected appearance of a face depends on geographic location. For a given location, l, we computed a location-dependent average face, \(\hat {f}_{l} = \mathbb {E}[f|l]\), by estimating the weighted average of nearby Eigenface coefficients (with a Gaussian weight function centered at l with σ=5°) and reconstructing the corresponding image from the Eigenface coefficients. Specifically, when computing the average face for a given location, l, the weight for a face located at x i is \(w_{\textit {li}} = e^{-(x_{i}-l)^{2}/\sigma ^{2}}\phantom {\dot {i}\!}\). We compute the Eigenface coefficients C l of the average image located at l as follows: \(C_{l}=\frac {\sum {w_{\textit {li}} C_{i}}}{\sum {w_{\textit {li}}}}\). Finally, we reconstruct the average face \(\hat {f}_{l}\) from Eigenface coefficients C l and global average face μ using the following formula: \(\hat {f}_{l}= E C_{l} + \mu \). The average images for a set of locations around the globe are shown in Fig. 5. Locations with a low number of images are omitted by filtering based on the total weight \(\left (\sum w_{\textit {li}}<50\right)\). While it is clear that average facial appearance depends on geographic location, these images do not capture the wide variety of facial appearances that can be seen in a particular place.
Fig. 5

A map of locally weighted average images. The spatial similarities demonstrate the geo-dependence of facial appearance

4.2 Geo-facial dictionary learning

The Eigenface representation used in the previous section implicitly assumes that the distribution of face appearance in a particular location is well represented by a Gaussian distribution. In this section, we explore a model, based on dictionary learning, that can capture multi-modal structure in the data.

For a subset of faces (n≈250k), we used sparse dictionary learning [40] to find k=300 representative cluster centers and assigned each face to one of the centers. For computational reasons, we used the top 50 PCA coefficients as our feature representation. Figure 6 shows the 32 cluster centers with the largest membership, from largest to smallest along rows. These average images capture variations in gender, pose, lighting, and ethnicity.
Fig. 6

The mean images of clusters found using dictionary learning

We used a kernel density estimate (KDE) (using a Gaussian kernel with σ=5°) to approximate the geographic distribution of faces assigned to a particular cluster. Figure 7 shows the difference between this conditional distribution and the distribution of all faces, regardless of cluster membership. For example, members of the topmost cluster, which appear similar to an Asian female, are more likely to be found in China and Japan than in Europe or the Eastern United States. We also found that many cluster centers shared similar spatial distributions. The primary visual difference between clusters with similar spatial distributions appears to depend on gender, viewpoint, and lighting differences. This motivated us to group face clusters in a manner not based solely on image appearance.
Fig. 7

The conditional distribution of geo-facial cluster members. False-color images representing the geographic distributions (red more common, blue less common) of images assigned to a given geo-facial cluster, together with the cluster center image (inset)

We form super-clusters by grouping clusters based on the conditional geographic distribution of their members. For each conditional KDE, we sampled it on a grid and vectorized the resulting matrix. We then applied non-negative matrix factorization (NMF) to the resulting matrix and extracted three components, each of which can be reshaped and visualized as a false-color map. Figure 8 shows the three resulting geospatial components and representative faces for the cluster centers that are best described by the given distribution. Super-clusters appear to differ primarily in ethnicity and skin tone, which are known to depend on geographic location [1]. Within a super-cluster, the dominant variations appear to be hair style, pose, gender, and lighting conditions, which are less dependent on geographic location.
Fig. 8

Visualizing geo-facial super-clusters. (top) The distribution of faces from super-clusters formed by grouping geo-facial clusters with similar geographic distributions. (bottom) For each super-cluster, 16 representative geo-facial cluster centers

4.3 Geo-facial attributes

Moving beyond appearance-based features, we also explored the geo-dependence of the facial attributes extracted using IntraFace [10]. We computed the relative distribution of each pair of attribute values (e.g., “Male” versus “Female”, “Beard” versus “No beard”) and plotted the relative density histogram (Fig. 9), which shows the relative density of pairs of attribute values. For fixed-sized bins, the intensity represents the relative frequency, λ, of a pair of labels:
$$ \lambda\left(n_{1}, n_{2}\right) = \frac{\left(n_{1} - n_{2}\right)}{\left(n_{1} + n_{2} + p\right)} $$
(1)
Fig. 9

Visualizing the relative densities of paired attributes (ac). In each false-color map, red represents higher concentrations of the first attribute value, and blue represents higher concentrations of the second. White indicates an equal concentration of both of the attributes

where n 1 and n 2 are the number of facial images in a region of interest with a particular attribute value and p is a pseudo-count (for this work, p=20), which serves as a prior that reduces noise in the visualization caused by regions with few faces.

At the global scale, we first group images into 5° square bins, an area roughly 340,000 km2 (about the size of Germany). For the set of images in each bin, the color of the tile represents the relative frequency of the attribute pair at that location. Many of the global attribute distributions are unsurprising and follow expected geographic distributions. For example, comparing the relative frequency of the “Asian” and “White” attribute values for race reveals distinct, and opposite, modes in Southeast Asia and Europe. Other patterns are perhaps more unexpected. Our dataset contains a higher proportion of “Female” images in eastern Asia and Western United States and a higher proportion of “Male” images in the Middle East.

For regions of high image density, we were able to observe finer-grained patterns. Figure 10 shows the relative density histograms, with a bin size which was about 6.5 km2, for several attribute pairs for London, Los Angeles, New York, and the whole world. The red “hot spots” in LA and NYC correspond to regions of these cities with large Asian populations (e.g., Chinatown in NYC). Also, there is a high proportion of “Beard” faces (compared to “No Beard”) in the heart of downtown London. On first glance, it is not always clear if these types of observations are due to particular biases in the dataset or cultural norms.
Fig. 10

Maps showing the relative densities of paired attributes in selected cites (ad). In each graph, red represents higher concentrations of the first attribute value, and blue represents higher concentrations of the second. White indicates an equal concentration of both of the attributes

For the regions surrounding ten major cities (New York, London, Paris, Mumbai, Rio de Janeiro, Hong Kong, Sydney, Beijing, Los Angeles, Tokyo), the stacked bar charts in Fig. 11 show the ratios of certain attribute values in each city. Despite inherent biases in the data due to source (e.g., geotagged images uploaded to social media sites), we observed that the ethnicity attributes distribution seem to follow expected patterns. Also, we observed that facial hair is much more common in some cities (Paris and Mumbai) than in others (Hong Kong and Tokyo).
Fig. 11

Visualizing the distribution of five facial attributes from ten major world cities (ae)

To further explore the relationship between global location and facial attributes, we sought to find groups of faces that are nearby spatially as well as similar in terms of attributes. Many well-known unsupervised clustering techniques could be applied to this problem. We employed the normalized cuts algorithm [41] on a graph-based representation of the dataset, where nodes were facial images and edges encode attribute similarity. Each node was connected to its five nearest (spatially) neighbors and the weight encodes the similarity between attribute values and the pixel intensity of the corresponding facial regions. The clustering process identified groups of faces from similar regions that share similar attributes.

Interestingly, although the clustering process was unsupervised, the discovered clusters tend to align with geographically meaningful regions. Figure 12, shows the results of an experiment with 15 clusters. Regions such as Eastern United States, Central America, and Europe are evident in the clustering output. Average face images from several groups illustrate the correspondence between clusters and geographic regions.
Fig. 12

World-level clusters and average faces for each of the four largest clusters. From left to right: Western US, Eastern US, Europe, and Southeast Asia

These results demonstrate that facial appearance and attributes are strongly dependent on the geographic location where the image was captured. In this section, our focus was on using unsupervised methods and visualization, which is useful for human understanding of the data. In the following section, we describe several experiments that use supervised learning to relate appearance and location and demonstrate how this can enable a novel application.

5 Supervised geo-facial analysis

We used two supervised learning methods to better understand the relationship between geographic location and face appearance: canonical correlation analysis (CCA) and linear discriminant analysis (LDA). We use both methods to extract location-dependent features that support observations about facial appearance in various world regions and, for LDA, enable the novel application of estimating the geographic location of a face.

5.1 Location-dependent component images

CCA is a multivariate statistical tool for exploring relationships between paired sets of variables. Given two datasets \(A\in \mathcal {R}^{m\times n}\) and \(B\in \mathcal {R}^{p\times n}\) containing paired observations, CCA finds sets of projection vectors (u 1,u 2,…) and (v 1,v 2,…) such that the random variables \(\left (\mathrm {u}^{\mathsf {T}}_{1}A,\mathrm {v}^{\mathsf {T}}_{1}B\right)\) are maximally correlated. That is, it finds u 1,v 1 such that \(\rho = \text {corr}(\mathrm {u}^{\mathsf {T}}_{1}A,{v}^{\mathsf {T}}_{1}B)\) is maximized. The pair of vectors (u 1,v 1) is called the first canonical pair, and subsequent canonical pairs are defined similarly.

For geo-facial analysis, we sought to find a set of component images and corresponding coefficients that are strongly location dependent. Let A be our set of Eigenface coefficients, one for each image. Let B be an indicator variable encoding image location. The non-zero entry corresponds to the latitude/longitude bin where the image was captured (we use 6° square spatial bins). Performing CCA on this paired data resulted in a projection of our PCA basis and a projection of our locations, as represented by our bin structure. The results of this method applied to our full dataset are shown in Fig. 13. The top three components and their corresponding geographic distribution show a strong location dependence. Based on the distribution maps, it appears that the first three components correspond to the extent to which the face is East Asian, African, or Indian, respectively.
Fig. 13

The top three location-dependent CCA component images. Each map shows the expected value of the coefficient of the corresponding image across the globe. The distributions of the CCA coefficients are more strongly related to geographic location than the PCA coefficients (Fig. 4). The color blue indicates lower values of CCA coefficients while red indicates higher values

5.2 Directions of facial appearance variation

In the previous section, CCA was used to convert weakly location-dependent Eigenfaces into strongly location-dependent components. This analysis was dominated by global population patterns and obscured some of the local structure in facial appearance variation. Here, we focus our analysis on local facial appearance variations. Instead of using a global indicator variable to represent geolocation, we performed CCA on the images from small spatial areas and used a linear model for geolocation.

For a given location, l, we created a filtered dataset, A l , of all faces within 10°. We used the latitude and longitude of these faces as the paired dataset, B l . From CCA, the element of the first canonical pair that corresponds to B l is a vector, v l , which represents the direction with the most significant face appearance change. Figure 14 shows the resulting direction field for this analysis over multiple locations overlaid on a world map. Visually, the computed gradients match our intuition. For example, in the area around the Mediterranean, the directions are mostly vertical because of the strong differences in appearance between Africa and Europe. We see similar patterns between the USA and Mexico and between India and East Asia.
Fig. 14

A direction field estimated using CCA between face appearance and geographic location in local neighborhoods. The lines show the cardinal direction that is most correlated with facial appearance change. The lines are color-coded by the correlation coefficients, from blue (low) to red (high)

5.3 Location-dependent face classification

We address the novel application of estimating the geographic location of an image using only a face it captures. We compared three commonly used classifiers linear discriminant analysis (LDA) [42], random decision forests (RF) [43] (150 trees), and a SVM [37] (C-SVM, linear kernel with c=1) using three feature representations: PCA, one based on histogram of oriented gradients (HOG) [44, 45], and another based on local binary pattern (LBP) [46, 47] features. For the PCA feature vector, we used the top 50 features. For the HOG-based feature, we used a cell size of 8×8 pixels and a block size of 2×2 and computed the features from a 210×168 region from the center of patch. The final, concatenated feature vector is 17,360 dimensional. In order to reduce the dimensionality of the feature, we selected the top 300 PCA coefficients and use that as our HOG-based feature representation. We used a similar approach to compute the LBP-based descriptors with 21×21 cells.

We trained a one-vs-all (OVA) classifier for each classification method and each feature representation. For each class, the training set for each consisted of 1500 positive examples and 1500 negative examples sampled from the rest of the world. The remainder of the dataset was used for testing. Table 1 shows the average accuracy, across all continents, for all methods. The results show that LDA+LBP outperformed all others. In the remainder of this section, we analyze the performance of this method in greater detail.
Table 1

The average accuracy for continental-level one-vs-all classifiers for different combinations of features and classifiers

 

PCA

HOG

LBP

LDA

62 %

62 %

63 %

RF

61 %

61 %

61 %

SVM

61 %

54 %

55 %

Table 2 shows the overall accuracy of each OVA classifier. The respective classifiers for Africa and Asia were significantly more accurate than the others. Figure 15 shows the proportion of faces from different continents (columns) that were classified as being from a particular continent (rows). This affinity matrix shows that, for example, fewer people from Asia were misclassified as being from Europe (25 %) than were misclassified as being from Africa (41 %). There is clearly a pattern, but overall, the accuracy is fairly low. We speculate that one significant source of error is the large spatial area and diversity of a continent.
Table 2

The accuracy for all one-vs-all continental-level classifiers

Continent

Acc. (%)

Asia

73 %

Africa

70 %

Europe

60 %

Americas

56 %

Oceania

56 %

Fig. 15

Continental-level affinity matrix. The proportion of faces classified as positive for different continents (columns) when trained one-vs-all for a particular continent (rows). For example, for a classifier trained to distinguish between European and non-European faces (row 3), we found that 46 % of the faces in the Americas (column 2) were labeled as being European

We also trained classifiers on 23 sub-continental regions [48]. Table 3 lists the sub-continent regions and classification accuracies of the classifiers with the best performance. Figure 16 shows the affinities (proportion of positives) between the sub-continental regions. The block diagonal structure of the matrix shows four distinct clusters. For example, Australia, New Zealand, Europe, North America, and Central and West Asia form a cluster indicating that faces from these regions look similar. Finally, we trained OVA classifiers at the country-level, mainly a political, rather than geographic, partitioning. Table 4 shows the ten countries with the highest classifier accuracy. Figure 17 shows maps for the ten target countries color-coded based on the percentage of faces labeled as positive by the given country’s OVA classifier.
Table 3

The sub-continental regions with the highest accuracy classifiers

Sub-Continent

Acc. (%)

Middle Africa

86 %

Western Africa

83 %

Eastern Africa

80 %

Eastern Asia

78 %

Southern Asia

76 %

South-Eastern Asia

74 %

Fig. 16

Sub-continental-level affinity matrix. The proportion of faces classified as positive for different sub-continents (columns) when trained one-vs-all for a particular sub-continent (rows). The block diagonal structure of the matrix shows four distinct clusters corresponding to the most common ethnic groups in the world

Table 4

Countries (identified by ISO country code) for which the one-vs-all classifier had the highest (a) and lowest (b) localization accuracy

(a) Most distinctive

(b) Least distinctive

Country

Acc. (%)

Country

Acc. (%)

GHA

82 %

SWE

55 %

ETH

81 %

FRA

55 %

TWN

77 %

ARG

55 %

NGA

75 %

BEL

54 %

KHM

74 %

NZL

54 %

MMR

74 %

GBR

54 %

HKG

73 %

USA

54 %

BGD

73 %

DEU

53 %

GTM

73 %

AUS

51 %

LAO

72 %

CAN

49 %

Fig. 17

False-color maps depicting the country-level affinity of facial appearance for selected target countries (aj). Colors vary from white (≤30 % positives) to green (≥70 % positives)

6 Conclusion

We used a large dataset of geotagged face patches, collected from the Internet, to explore the geo-dependence of human facial appearance. We applied statistical techniques to explore this geo-dependence and found that there is rich structure in this relationship that is not fully explained by differences in the distribution of ethnic or racial groups. In our analysis, we rely on existing techniques for face detection, pose estimation, appearance normalization, and attribute estimation. Our work with the GeoFaces dataset highlights the need for continued improvements to these core algorithms. Such improvements will increase the number and quality of facial image patches we can use, which will increase the accuracy of our higher-level analysis.

There are many potential future applications of this type of analysis, both within computer vision and in other domains. In computer vision, we envision the learned models supporting the creation of algorithms for facial image detection, recognition, and alignment that are tailored for particular geographic locations. Outside of computer vision, we envision a wide range of users posing questions to our system. Such questions could come from sociologists (e.g., “What are current trends in facial hair?” [49]), security officers (e.g., “Where is this person probably from?” [50]), or school children (e.g., “What do people look like in Bangladesh?”).

This work was made possible by the availability of large repositories of geotagged images and the maturity of facial image analysis algorithms. In addition to the immediate applications to facial imagery, we envision that this work will motivate similar work for other types of objects, both natural and man-made.

We continue to expand the GeoFaces dataset, improve the underlying computer vision tools, refine our analysis techniques, and attempt to reduce dataset bias by finding alternative sources of geotagged face imagery. For future work, we plan to investigate the time-varying aspects of facial appearance by collecting geotagged images with known capture time. As a long-term goal, we plan to enable interactive analysis, at finer geographic scales and over a wider variety of attributes.

7 Appendix

7.1 Non-geographic data analysis

In addition to analyzing the geo-dependence of face appearance, the GeoFaces dataset enables us to explore non-geographic dependence. Here, we focus on the relationship between facial appearance and the textual tags applied to images, which are often added by the individual that uploaded the image.

Table 5 shows, for various textual tags, the distribution of the number of faces found in an image. For example, for images tagged “Wedding,” 25.6 % of the images contained a single face, and 5.7 % contained two or more faces. For images tagged “Portrait” 20.0 % of images contained a single face, but only 0.6 % contained two or more faces. For all tags, a large percentage of images (generally 70–80 %) contained no images due to our aggressive filtering. This type of analysis could be used in the future to guide the selection of search keywords that are likely to result in a large number of high-quality faces.
Table 5

The distribution of number of frontal faces per image for different tags

Tags

0

1

2

3

≥4

Face

85.9 %

13.6 %

0.3 %

0.1 %

0.1 %

Family

78.6 %

18.4 %

1.5 %

0.7 %

0.8 %

Friends

75.7 %

21.1 %

1.7 %

0.8 %

0.8 %

Group

84.8 %

11.8 %

1.3 %

0.8 %

1.3 %

Party

74.0 %

22.9 %

1.7 %

0.7 %

0.7 %

Portrait

78.9 %

20.0 %

0.6 %

0.3 %

0.3 %

Wedding

68.7 %

25.6 %

2.6 %

1.4 %

1.7 %

Figure 18 shows the differences in the expected location of detected faces for different tags. Each heatmap was formed by counting the number of images that contained a face centered at each pixel and then convolving the resulting 2D histogram with a Gaussian kernel. For example, in images tagged “Face,” most of the faces are centrally located in the upper half of the image but for images tagged “Family,” “Friends,” “Group,” or “Party,” faces are more frequently found on the left and right sides of the image. In images tagged “Portrait,” the typical horizontal position of a face is much more constrained than the vertical position.
Fig. 18

Visualizing the expected face location in images with different tags. Red (blue) pixels are more (less) likely to be the center of a face

Figure 19 shows the expected size, in terms of the fraction of the image width between the eyes, of a face for different tags. This shows, for example, that images tagged “Face” are more likely to occupy a large part of the image than in images tagged “Family”. There is also a pronounced increase in probability at around 0.1 for each tag, likely due to the popular posing of full-body portraits.
Fig. 19

Visualizing the distribution of relative face size in images with different tags. The horizontal axis corresponds to inter-ocular distance relative to the image size, and the vertical axis shows relative frequency

Declarations

Acknowledgements

This work was partially supported by the NSF (CNS-1156822) and DARPA (D11AP00255). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor. We gratefully acknowledge flickr users ennuiislife, hotlantavoyeur, rocketboom, besighyawn, and scubabix for allowing us to use their images via Creative Commons licenses. See http://cs.uky.edu/~tarik/papers/geofacial/ for additional information regarding the original images.

Authors’ Affiliations

(1)
Department of Computer Science, University of Kentucky
(2)
Department of Computer Science, University of North Carolina at Charlotte

References

  1. RC Lewontin, W Freeman, Human diversity (Scientific American Library, New York, 1982).Google Scholar
  2. Twitter. http://twitter.com
  3. S Agarwal, Y Furukawa, N Snavely, I Simon, B Curless, SM Seitz, R Szeliski, Building Rome in a day. Commun. ACM. 54(10), 105–112 (2011).View ArticleGoogle Scholar
  4. H Zhang, M Korayem, DJ Crandall, G LeBuhn. International World Wide Web Conference (ACM, 2012), pp. 749–758.
  5. D Leung, S Newsam, in IEEE Conference on Computer Vision and Pattern Recognition. Proximate sensing: inferring what-is-where from georeferenced photo collections (IEEE, 2010).
  6. C Doersch, S Singh, A Gupta, J Sivic, AA Efros. ACM Trans. Graphics (SIGGRAPH). 31(4), 101:1–101:9 (2012).
  7. X Shen, Z Lin, J Brandt, Y Wu. IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2013).
  8. K Scherbaum, J Petterson. IEEE International Conference on Computer Vision (IEEE, 2013).
  9. O Rudovic, M Pantic, in IEEE International Conference on Computer Vision. Shape-constrained gaussian process regression for facial-point-based head-pose normalization (IEEE, 2011).
  10. X Xiong, F De la Torre, in IEEE Conference on Computer Vision and Pattern Recognition. Supervised descent method and its applications to face alignment (IEEE, 2013).
  11. N Kumar, AC Berg, PN Belhumeur, SK Nayar, in IEEE International Conference on Computer Vision. Attribute and simile classifiers for face verification (IEEE, 2009).
  12. D Parikh, K Grauman, in IEEE International Conference on Computer Vision. Relative attributes (IEEE, 2011).
  13. D Yi, Z Lei, SZ Li, in IEEE Conference on Computer Vision and Pattern Recognition. Towards pose robust face recognition (IEEE, 2013).
  14. X Cao, D Wipf, F Wen, G Duan, in IEEE International Conference on Computer Vision. A practical transfer learning algorithm for face verification (IEEE, 2013).
  15. J Deng, W Dong, R Socher, L-J Li, K Li, L Fei-Fei, in IEEE Conference on Computer Vision and Pattern Recognition. ImageNet: a large-scale hierarchical image database (IEEE, 2009).
  16. BC Russell, A Torralba, KP Murphy, WT Freeman, LabelME: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1-3), 157–173 (2008).View ArticleGoogle Scholar
  17. A Torralba, R Fergus, WT Freeman, 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008).View ArticleGoogle Scholar
  18. M Everingham, L Van Gool, CK Williams, J Winn, A Zisserman, The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010).View ArticleGoogle Scholar
  19. G Griffin, A Holub, P Perona, Caltech-256 object category dataset. Technical report, California Institute of Technology, (2007).
  20. N Jacobs, N Roman, R Pless, in IEEE Conference on Computer Vision and Pattern Recognition. Consistent temporal variations in many outdoor scenes (IEEE, 2007).
  21. S Narasimhan, C Wang, S Nayar, in European Conference on Computer Vision. All the images of an outdoor scene (SpringerBerlin Heidelberg, 2002).Google Scholar
  22. Y Avrithis, Y Kalantidis, G Tolias, E Spyrou, in ACM-MM. Retrieving landmark and non-landmark images from community photo collections (ACM, 2010).
  23. G Tolias, Y Avrithis, in IEEE International Conference on Computer Vision. Speeded-up, relaxed spatial matching (IEEE, 2011).
  24. T Weyand, J Hosang, B Leibe, in European Conference on Trends and Topics in Computer Vision. An evaluation of two automatic landmark building discovery algorithms for city reconstruction (Springer Berlin Heidelberg, 2012).Google Scholar
  25. J Hays, AA Efros, in IEEE Conference on Computer Vision and Pattern Recognition. IM2GPS: estimating geographic information from a single image (IEEE, 2008).
  26. PJ Phillips, H Moon, SA Rizvi, PJ Rauss, The feret evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1090–1104 (2000).View ArticleGoogle Scholar
  27. GB Huang, M Ramesh, T Berg, E Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, University of Massachusetts, Amherst (2007).
  28. V Jain, E Learned-Miller, Fddb: A benchmark for face detection in unconstrained settings. Technical report, University of Massachusetts, Amherst (2010).
  29. DJ Crandall, L Backstrom, D Huttenlocher, J Kleinberg, in International World Wide Web Conference. Mapping the world’s photos (ACM, 2009).
  30. P Serdyukov, V Murdock, R Van Zwol, in SIGIR. Placing flickr photos on a map (ACM, 2009).
  31. Q Fang, J Sang, C Xu, in ACM-MM. Giant: Geo-informative attributes for location recognition and exploration (ACM, 2013).
  32. E Kalogerakis, O Vesselova, J Hays, AA Efros, A Hertzmann, in IEEE International Conference on Computer Vision. Image sequence geolocation with human travel priors (IEEE, 2009).
  33. C Doersch, S Singh, A Gupta, J Sivic, AA Efros, What Makes Paris Look like Paris?ACM Trans. Graphics (SIGGRAPH). 31(4), 101:1–101:9 (2012).View ArticleGoogle Scholar
  34. S Lee, H Zhang, DJ Crandall, in IEEE Winter Conference on Applications of Computer Vision. Predicting geo-informative attributes in large-scale image collections using convolutional neural networks (IEEE, 2015).
  35. Flickr, http://flickr.com
  36. Detecting Facial Parts. http://www.omron.com/ecb/products/mobile/okao03.html. Omron, 2013.
  37. C-C Chang, C-J Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst.Technol. 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm View ArticleGoogle Scholar
  38. GeoFaces: A Dataset of Geotagged Face Images. http://geofaces.csr.uky.edu 2014.
  39. M Turk, A Pentland, Eigenfaces for recognition. J. Cognitive Neurosci. 3(1), 71–86 (1991).View ArticleGoogle Scholar
  40. J Mairal, F Bach, J Ponce, G Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010).MATHMathSciNetGoogle Scholar
  41. J Shi, J Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell (2000).
  42. K Etemad, R Chellappa, Discriminant analysis for recognition of human face images. J. Opt. Soc. Am. A 14(8), 1724–1733 (1997).View ArticleGoogle Scholar
  43. G Fanelli, M Dantone, L Van Gool, in IEEE Conference on Automatic Face and Gesture Recognition. Real time 3D face alignment with random forests-based active appearance models (IEEE, 2013).
  44. A Albiol, D Monzo, A Martin, J Sastre, A Albiol, Face recognition using HOG-EBGM. Pattern Recognit. Lett. 29(10), 1537–1543 (2008).View ArticleGoogle Scholar
  45. O Déniz, G Bueno, J Salido, F De la Torre, Face recognition using histograms of oriented gradients. Pattern Recognit. Lett. 32, 1598–1603 (2011).View ArticleGoogle Scholar
  46. T Ahonen, A Hadid, M Pietikainen, Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006).View ArticleGoogle Scholar
  47. D Maturana, D Mery, A Soto, in International Conference of the Chilean Computer Science Society. Face recognition with local binary patterns, spatial pyramid histograms and naive bayes nearest neighbor classification (IEEE, 2009).
  48. Composition of macro geographical (continental) regions, geographical sub-regions, and selected economic and other groupings. http://unstats.un.org/unsd/methods/m49/m49regin.htm. United Nations Statistics Division 2013.
  49. DE Robinson, Fashions in shaving and trimming of the beard: the men of the illustrated London news, 1842-1972. Am. J. Sociol. 81(5), 1133–1141 (1976).View ArticleGoogle Scholar
  50. S Fu, H He, Z Hou, Learning race from face: a survey. IEEE Trans. Pattern Anal. Mach. Intell.36(12), 2483–2509 (2014).View ArticleGoogle Scholar

Copyright

© Islam et al. 2015

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.