WorldPop Research Group
University of Southampton
26 May 2020
The peanutButter web application allows you to produce your own gridded population estimates by spreading your estimates of people per building evenly across a map of buildings. This is a quick and simple approach that utilizes high resolution maps of building footprints, assuming that the same number of people live in each building across entire regions or settlement types (e.g. urban and rural). WorldPop partners with Maxar Technologies, Ecopia.AI, and the Bill and Melinda Gates Foundation for recent maps of buildings [@ecopia2020digitize, @dooley2020gridded].
The “peanut butter” approach can be applied in a bottom-up or top-down fashion. Bottom-up uses estimates of average household sizes for each settlement type (e.g. urban and rural) and then applies these to each building in the corresponding settlement type. Top-down takes user-defined population totals for administrative units and spreads those people evenly across all buildings within a given unit such that the total per administrative unit matches those defined by the user. The prefered approach will depend on what information you have available: estimates of people per building (bottom-up) or estimates of people per administrative unit (top-down).
Code for the peanutButter R package is openly available from WorldPop on GitHub: https://github.com/wpgp/peanutButter.
Use the sliders to explore population parameters until you find a combination that produces reasonable estimates (shown in the ‘Results’ table) of total population, urban population, and rural population for the country as a whole.
Adjust the age-sex sliders to select the demographic group(s) that you would like your gridded population estimates to represent. Note: this will not change the population totals shown in the ‘Results’ table.
Use the “Gridded Population Estimates” button to generate a gridded population map (a geotiff raster) that is produced by applying your population parameters to building footprints in each approximately 100 m grid cell across the country.
(optional) Use the “Settings” and/or “Source Files” button(s) to save the input data.
Upload a GeoJson file containing polygons and their associated population totals, by clicking on the “Browse” button and selecting your GeoJson file. Ensure that you have the correct country selected.
Adjust the age-sex sliders to select the demographic group(s) that you would like your gridded population estimates to represent.
Use the “Gridded Population Estimates” button to generate a gridded population map (a geotiff raster) that is produced by disaggregating a polygon’s population total evenly across the buildings inside the polygon.
(optional) Use the “Source Files” button to save the input data.
The peanutButter method requires you to provide estimates of three population characteristics for both urban and rural settlement types using expert opinion:
Areas have been pre-classified as ‘urban’ based on two simple rules that are consistent across all countries. A ~100 meter grid cell is classified as ‘urban’ if it is part of a grouping of contiguous cells with: 1) 1,500 or more cells, and; 2) 5,000 or more buildings (Dooley and Tatem 2020). All grid cells containing building centroids that do not meet the ‘urban’ criteria, are classified as ‘rural’.
The population is estimated using the following formula:
Population = Pop_Urban + Pop_Rural
Pop_Urban = Buildings_Urban x PropResidential_Urban x UnitsPerBuilding_Urban x PeoplePerUnit_Urban
Pop_Rural = Buildings_Rural x PropResidential_Rural x UnitsPerBuilding_Rural x PeoplePerUnit_Rural
You can supply your own inputs (i.e. raster of building count, raster of urban/rural classification) to produce gridded population estimates using the “aggregator” function of the peanutButter R package.
Mean people per housing unit
This parameter defines the average number of people in a housing unit and it can be defined for urban and rural areas separately. References for determining these values could include:
Mean housing units per building
This parameter defines the average number of housing units per residential building. This number would be expected to be near 1 for rural areas and slightly higher in urban areas.
Proportion residential buildings
This parameter defines the proportion of buildings that are residential (total building counts from Dooley and Tatem 2020). Some buildings are non-residential like factories, shops, barns, and sheds. In addition, there are some erroneous building footprints that do not represent buildings and others that may represent more than one building.
We provide moderately informative default values for population parameters, but we strongly urge users to modify the defaults based on a thorough investigation of available data sources. Our process for selecting default values was:
Look up average household sizes (UN 2019 or UN 2017). We assumed that household sizes were equal in urban and rural areas. If no estimate was availabe, we assumed an average of 5 people per housing unit.
Assume there was an average of 1.1 housing units per building in urban areas and 1 housing unit per building in rural areas.
Adjust the proportion of buildings that are residential until the total population matched the WorldBank estimates, assuming the same proportion of residential buildings for urban and rural areas. For countries where population estimates were available from the WorldPop Open Population Repository, we used those total populations instead.
The top-down approach produces gridded population estimates by disaggregating user-provided population totals within user-provided polygon boundaries (e.g. administrative boundaries). There are three steps involved in this process:
Calculate the total number of buildings in each polygon using the source data from Dooley and Tatem (2020) that were derived from high resolution building footprints (Ecopia.AI and Maxar Technologies 2020).
Calculate the average number of people per building by dividing the user-provided population total for each polygon by the total number of buildings in that polygon.
Produce gridded estimates by multiplying the average people per building by the total number of buildings in each ~100 m grid cell.
User-provided polygons will be ignored if they are too small to overlap the centroid of at least one 100-m grid cell from the building raster (see Source Data). If this happens, the total population of the gridded population estimates (i.e. summing all cells) will exclude the populations from these small polygons.
If there are zero buildings within any user-provided polygons, then the gridded population estimates will contain NA for these grid cells.
Areas of your polygons that exceed the national boundaries used in the building footprints (see Source Data) will not contain gridded population estimates.
Gridded population estimates for specific demographic group(s) are produced the same way for the top-down and bottom-up methods using ~100 m gridded estimates of the proportion of population in each age-sex group (WorldPop et al. 2018). There are 2 steps involved:
Sum the individual age-sex proportions for each grid cell across the age-sex groups selected using the sliders.
Multiply the resulting grid of proportions by the gridded estimates of total population from the bottom-up or top-down method.
To produce gridded population estimates for specific age-sex groups using your own inputs (i.e. population raster, or region-specific age-sex proportions), you can use the “demographic” function from the peanutButter R package. For an example of the age-sex information needed, please refer to the ‘agesex’ files downloaded when clicking on the “source files” button for any country.
In the peanutButter app, there are several source datasets working behind the scenes that you can download using the “Download Source” button.
There are two data sets describing building patterns (Dooley and Tatem 2020) that were derived from building footprints (Ecopia.AI and Maxar Technologies 2020):
There are also two source datasets that provide the proportion of population in each demographic group for every ~100 m grid cell (WorldPop et al 2018, Pezullo et al 2017, Carioli et al in prep). The age-sex source files include:
The time period associated with gridded population estimates produced by peanutButter depends on several factors. The spatial distribution of the population across the country depends on the spatial distribution of building footprints. The age-sex breakdown depends on the source demographic data. The population totals depend on the population data that you provide.
Building footprints
The satellite imagery used to create the building footprints is from Maxar’s Vivid imagery mosaics. The exact date depends on the specific satellite image used in the mosaic at a given location. The average mosaic image is less than 20 months old (Ecopia.AI and Maxar Technologies 2020).
Demographic data
The demographic data (WorldPop et al. 2018) describing age-sex proportions and their spatial distribution are based on projections that represent the year 2020, but they do not account for potential effects of population displacement and migration.
Population data
Bottom-up: The time-point represented by the gridded population estimates is influenced by the date of your settings for “people per housing unit”, “housing units per residential building”, and “proportion of buildings that are residential”. The date reflected by our default values is usually 2018-2019. For more information, refer to our sources described in the sub-section “Default Values” from More Info: Bottom-up.
Top-down: The time-point of the gridded population estimates is influenced by the date of the population totals in your uploaded geojson file.
The peanutButter R package was developed by the WorldPop Research Group within the Department of Geography and Environmental Science at the University of Southampton. Funding was provided by the Bill and Melinda Gates Foundation (INV-002697). Ecopia.AI and Maxar Technologies (2020) provided high resolution building footprints based on recent satellite imagery. Gridded age-sex data were provided by the WorldPop Global High Resolution Population Denominators Project led by Alessandro Sorichetta with funding from the Bill and Melinda Gates Foundation (OPP1134076). Development of the peanutButter R package was led by Doug Leasure. Claire Dooley developed the source rasters of building counts and urban/rural settlements. Maksym Bondarenko maintains WorldPop’s Shiny server. Professor Andy Tatem provides oversight of the WorldPop Research Group.
Leasure DR, Dooley CA, Bondarenko M, Tatem AJ. 2020. peanutButter: An R package to produce rapid-response gridded population estimates from building footprints, version 0.1.0. WorldPop Research Group, University of Southampton. doi:10.5258/SOTON/WP00667
GNU General Public License v3.0 (GNU GPLv3)
Carioli A, Pezzulo C, Hanspal S, Hilber T, Hornby G, Kerr D, Tejedor-Garavito N, Nilsen K, Pistolesi L, Adamo S, Mills J, Nieves JJ, Chamberlain H, Bondarenko M, Lloyd C, Ves N, Koper P, Yetman G, Gaughan A, Stevens F, Linard C, James W, Sorichetta A, and Tatem AJ. In prep. Population structure by age and sex: a multi-temporal subnational perspective.
Ecopia.AI and Maxar Technologies. 2020. Digitize Africa.
Dooley, C. A. and Tatem, A.J. 2020. Gridded maps of building patterns throughout sub-Saharan Africa, version 1.0. WorldPop Research Group, University of Southampton. Source of building footprints “Ecopia Vector Maps Powered by Maxar Satellite Imagery”(c) 2020. doi:10.5258/SOTON/WP00666
Pezzulo C, Hornby GM, Sorichetta A, Gaughan AE, Linard C, Bird TJ, Kerr D, Lloyd CT, Tatem AJ. 2017. Sub-national mapping of population pyramids and dependency ratios in Africa and Asia. Sci. Data 4:170089 doi:10.1038/sdata.2017.89
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by the Bill and Melinda Gates Foundation (OPP1134076).