Gianluca Boo1, Roland Hosner2, Pierre Z Akilimali3, Edith Darin1, Heather R Chamberlain1, Warren C Jochem1, Patricia Jones1, Roger Shulungu Runika4, Henri Marie Kazadi Mutombo4,5, Attila N Lazar1 and Andrew J Tatem1
1WorldPop Research Group, University of Southampton, Southampton, United Kingdom
2Flowminder Foundation, Stockholm, Sweden
3École de Santé Publique de Kinshasa, Kinshasa, Democratic Republic of the Congo
4Institut National de la Statistique, Kinshasa, Democratic Republic of the Congo
5Bureau Central du Recensement, Kinshasa, Democratic Republic of the Congo
This report is a supplement to the modelled gridded population estimates for the Haut-Katanga, Haut-Lomami, Ituri, Kasaï, Kasaï-Oriental, Lomami and Sud-Kivu provinces in the Democratic Republic of the Congo (DRC) (2021). The report describes the processing of the microcensus data collected in these provinces between March and May 2021, which is used as input for a Bayesian statistical model used to produce the gridded population estimates, following an approach described by Wardrop et al. (2018). The data processing consists of five main steps involving 1) attribute selection and pre-processing, 2) processing of listed persons, 3) processing of listed households and 4) processing of listed clusters.
## [1] "Y:/mydocuments"
We accessed the most recent version of the microcensus data, which was pre-checked, pre-processed and consolidated by the École de Santé Publique de Kinshasa and the Flowminder Foundation, and selected the following attributes.
Attribute | Description | Format |
---|---|---|
x_cluster_ezid | Unique identifier of the enumeration zone (EZ) | Integer [1442 unique values] |
x11_building_id | Unique identifier of the building | String [free text] |
a2_gpslong | GPS coordinate of the building [longitude] | Numeric |
a2_gpslat | GPS coordinate of the building [latitude] | Numeric |
a2_gpsprec | GPS accuracy | Numeric [meters] |
a3_buildingtype | Building type | Integer [1 to 2 ( residential building), 3 to 4 (collective residential building), 5 (non-residential building), 6 (mixed use), or 7 (non-functional)] |
b1_hhid | Unique identifier of the household | String [free text] |
b4_hhocc | Is there at least one person who lives in this housing unit? | Logic [TRUE or FALSE] |
c4_consent | Agree to take part in the study | Logic [TRUE or FALSE] |
c5_hhsize | Number of individuals in the household, including visitors that stayed last night | Integer [1 to 30] |
c10_lastnight | Did the person stay here last night? | Logic [TRUE or FALSE] |
c16_nrmonths | During the past 12 months, how many months of the year has the person lived in this household? | Numeric [0 to 12 months, -98 (Don’t know) or -99 (Prefer not to say)] |
c17_reason | What is the main reason the person does not live in the household all year round? | Integer [1 (moved during the last year), 6 (look for/take up temporary work), 8 (be close to school/university/other educational institute]), 9 (seek or receive medical care), 10 (spend time with family members or friends), or 11 (other)] |
c17_other | What is the other reason the person does not live in the household all year round? | String [free text] |
c11_gender | Is the person male or female? | String [F (female) or M (male)] |
c12_age | How old was the person at their last birthday? | Integer [0 to 99 years] |
We pre-processed and recorded the selected attributes into more actionable formats (e.g., logical for binary attributes) to facilitate the processing steps presented below.
We retrieved individual microcensus records for the 367,831 listed persons and subsequently selected the de jure population (United Nations 1991) — hereafter named residents — according to the following criteria.
The person spent at least six months in the household during the preceding 12 months [ c16_nrmonths>=6
], or
the person is likely to spend at least the next six months in the household [ c17_reason%in%c(1, 6, 8, 9, 10, 11)
] and
the person spent the previous night in the household [ c10_lastnight==TRUE
].
The implementation of these criteria involved dropping 4,373 listed persons. The remaining 363,458 residents were subsequently allocated to a unique age (i.e. 0
, 1-4
, 5-9
, 10-14
, 15-19
, 20-24
, 25-29
, 30-34
, 35-39
, 40-44
, 45-49
, 50-54
, 55-29
, 60-64
, 65-69
, 70-74
, 75-79
and 80+
) and sex (i.e. F
and M
) group.
To tackle age-reporting issues for under-ones (i.e. months were sometimes reported as years), both individuals with the attribute c12_age==0
and the attribute c17_other
containing the following keywords — nouvelle
or nourrisson
or nourrison
or nourisson
or bebe
or naissance
or nouveau
or beb3
or mois
or semaine de vie
or moin.+an
— were allocated to the 0
age group.
827 residents had either no age or sex reported but were not discarded from the data. The count of residents belonging to each age and sex group is presented in the interactive plot below.
We retrieved individual microcensus records for the 85,954 listed households and selected the 80,970 households with at least one resident. We subsequently assessed the three following scenarios to assess whether the household was eligible for imputation in the case of non-response.
Household interview completed — the household i) comprised at least one resident at the time of the survey, ii) the surveyor talked to a potential respondent and iii) the respondent agreed to be interviewed [ b4_hhocc==TRUE
and c4_consent==TRUE
], or
Household interview refused — the household i) comprised at least one resident at the time of the survey and ii) the surveyor talked to a potential respondent and iii) the respondent refused to be interviewed [ b4_hhocc==TRUE
and c4_consent==FALSE
].
Over the 80,970 households with at least one resident, 3,098 refused to be interviewed and were considered eligible for imputation (see the processing of listed clusters section).
We accessed the most recent version of the microcensus cluster monitoring data, which was pre-checked, pre-processed and consolidated by the École de Santé Publique de Kinshasa and the Flowminder Foundation, and selected the following attributes.
Attribute | Description |
---|---|
ez_id | Unique identifier of the enumeration zone (EZ) |
cluster_accessible | Was the cluster accessible to the surveyor? |
cluster_reason | What is the reason for inaccessibility? |
cluster_security_issue | Was there any security issue reported? |
cluster_surveyed | Was the cluster surveyed? |
cluster_empty_people | Was the cluster empty (people)? |
cluster_partial_coverage | Was the cluster only partly surveyed |
cluster_building_count | Count of listed buildings |
cluster_people_count | Count of listed persons |
cluster_comments | Comments |
The map below shows the location of the microcensus clusters and whether these clusters were successfully surveyed and considered for population modelling.
We linked the microcensus cluster monitoring data with the individual records pre-processed for listed persons and listed households to obtain summaries at the cluster level. After linking and aggregating the records at the cluster level, we imputed 13,259 residents in 3,098 eligible households based on the mean household size for each cluster.
The microcensus data processing described in this report enabled us to produce summaries of population counts and age and sex breakdowns at the cluster level within the seven provinces included in the GRID3 Mapping for Health Project. Over the 1,596 sampled clusters, 99 cluster were not surveyed, 3 clusters were only partially listed, 94 clusters surveyed but empty of population and 3 clusters dropped as outliers/anomalies. The remaining 1,397 clusters were considered for population modelling.
These data were produced by the WorldPop Research Group at the University of Southampton as part of the GRID3 Mapping for Health Project. This project was delivered under the leadership of the Ministry of Public Health, Hygiene and Prevention of the DRC and funded by Gavi, the Vaccine Alliance (RM 86720420A2). The project was led by the Flowminder Foundation and the Center for International Earth Science Information Network (CIESIN) at the Columbia University, in collaboration with the WorldPop Research Group at the University of Southampton and national partners including, but not limited to, the École de Santé Publique de Kinshasa and both the Bureau Central du Recensement and the Institut National de la Statistique. This work was a continuation of the GRID3 (Geo-Referenced Infrastructure and Demographic Data for Development) programme funded by the Bill and Melinda Gates Foundation (BMGF) and the United Kingdom’s Foreign, Commonwealth & Development Office (INV 009579, formerly OPP 1182425). The study was approved by the Faculty Ethics Committee of the University of Southampton (ERGO II 62716).
G Boo, R Hosner, PZ Akilimali, E Darin, HR Chamberlain, WC Jochem, P Jones, R Shulungu Runika, HM Kazadi Mutombo, AN Lazar and AJ Tatem. 2021. Modelled gridded population estimates for the Haut-Katanga, Haut-Lomami, Ituri, Kasaï, Kasaï-Oriental, Lomami and Sud-Kivu provinces in the Democratic Republic of the Congo (2021), version 3.0. WorldPop, University of Southampton, Flowminder Foundation, École de Santé Publique de Kinshasa, Bureau Central du Recensement and Institut National de la Statistique. DOI: 10.5258/SOTON/WP00720
This report may be redistributed following the terms of a Creative Commons Attribution 4.0 International (CC BY 4.0) license.