Gianluca Boo1, Roland Hosner2, Pierre Z Akilimali3, Edith Darin1, Heather R Chamberlain1, Warren C Jochem1, Patricia Jones1, Roger Shulungu Runika4, Henri Marie Kazadi Mutombo4,5, Attila N Lazar1 and Andrew J Tatem1
1WorldPop Research Group, University of Southampton, Southampton, United Kingdom
2Flowminder Foundation, Stockholm, Sweden
3École de Santé Publique de Kinshasa, Kinshasa, Democratic Republic of the Congo
4Institut National de la Statistique, Kinshasa, Democratic Republic of the Congo
5Bureau Central du Recensement, Kinshasa, Democratic Republic of the Congo
This report is a supplement to the modelled gridded population estimates for the Haut-Katanga, Haut-Lomami, Ituri, Kasaï, Kasaï-Oriental, Lomami and Sud-Kivu provinces in the Democratic Republic of the Congo (DRC) (2021). The report describes the sampling design developed to select microcensus clusters in these provinces. The enumeration of these clusters is used to derive input data for a Bayesian statistical model used to produce the gridded population estimates, following an approach described by Wardrop et al. (2018). The design consists of four main steps involving the definition of 1) sampling frame and strata, 2) sample size by province, 3) sample size by province and settlement type and 4) microcensus clusters selection. These steps build on previous work (Boo et al. 2020) carried out in the western part of the DRC.
The sampling frame was defined using gridEZ, an algorithm for generating gridded enumeration zones using user-defined target population and geographic size (Dooley 2019). We used gridded building counts as a proxy for the distribution of population and selected 80 buildings as a target size and six hectares as a maximum areal extent (Dooley et al. 2020). We defined the strata as the seven provinces, further partitioned by settlement type. We derived the settlement types from GHS-SMOD data reclassified into urban (classes 23 to 30), periurban (classes 14 to 22), village (classes 11, 12 and 13) and hamlet (classes 9 and 10) areas (GHSL 2019). The sampling units produced using gridEZ can have some odd shapes as a consequence of the aggregation algorithm. After consultation with the local partners, we discarded the sampling units located in the territoires of Djugu (Ituri) and Shabunda (Sud-Kivu) because of logistical and security considerations. The table below shows the distribution of the sampling units within the strata defined by the province and the settlement type.
Province | Urban | Periurban | Village | Hamlet |
---|---|---|---|---|
Kasaï | 1570 | 2069 | 5047 | 2264 |
Kasaï Oriental | 2541 | 988 | 1702 | 1012 |
Lomami | 1365 | 1704 | 3862 | 3842 |
Haut-Lomami | 1445 | 1830 | 3637 | 4079 |
Haut-Katanga | 8778 | 2348 | 3936 | 6244 |
Sud-Kivu | 3820 | 2099 | 4396 | 6598 |
Ituri | 2043 | 1530 | 5090 | 9933 |
The sample size was dictated by logistical constraints, such as the time (i.e. 41 working days) and the surveyors available (i.e. 210 in total) within each province. We estimated that a team would comprise six surveyors, which would be expected to survey one cluster per day. These considerations enabled us to propose an overall sample size of 1,435 microcensus clusters. This number was subsequently increased by approximately 10% — to 1,596 clusters — to account for potential additional accessibility constraints. The table below shows the sample size breakdown by province with relative logistical information.
Province | Surveyors | Teams | Days | Sample Size (original) | Sample Size (increased) |
---|---|---|---|---|---|
Kasaï | 30 | 5 | 41 | 205 | 228 |
Kasaï Oriental | 30 | 5 | 41 | 205 | 228 |
Lomami | 30 | 5 | 41 | 205 | 228 |
Haut-Lomami | 30 | 5 | 41 | 205 | 228 |
Haut-Katanga | 30 | 5 | 41 | 205 | 228 |
Sud-Kivu | 30 | 5 | 41 | 205 | 228 |
Ituri | 30 | 5 | 41 | 205 | 228 |
The sample size by province was further refined by settlement type to select microcensus clusters representing urban, periurban, village and hamlet areas. Within provinces, we evaluated different sample size scenarios by assessing the statistical distribution of the number of buildings across settlement types. Similar to previous work, we aimed at assessing the optimal sample size by strata based on the lowest average Kolmogorov-Smirnov (K-S) statistics over 1,000 simulations, which indicates similarity between the sample and the entire population (Boo et al. 2020). The plot below shows the average KS statistics for different sample sizes by province and settlement type.
The plot shows that the KS-distributions look similar across the different strata, especially within the same province. This pattern suggested that the gridEZ algorithm produced sampling units with a relatively homogeneous number of buildings. For this reason, we defined the sample size for each settlement type by allocating an equal number of units, summing up to the sample size (increased) at the province level. The table below shows the sample size breakdown by province and settlement type.
Province | Sample Size (increased) | Urban | Periurban | Village | Hamlet |
---|---|---|---|---|---|
Kasaï | 228 | 57 | 57 | 57 | 57 |
Kasaï Oriental | 228 | 57 | 57 | 57 | 57 |
Lomami | 228 | 57 | 57 | 57 | 57 |
Haut-Lomami | 228 | 57 | 57 | 57 | 57 |
Haut-Katanga | 228 | 57 | 57 | 57 | 57 |
Sud-Kivu | 228 | 57 | 57 | 57 | 57 |
Ituri | 228 | 57 | 57 | 57 | 57 |
We randomly selected 57 sampling units independently across the 28 strata — seven provinces partitioned into four settlement types. We subsequently linked the sampling units to administrative data to retrieve the attributes for different administrative levels. The map below shows the location of the centroid of the selected microcensus clusters.
The sampling design described in this report enabled us to produce a sampling frame including all the accessible areas within the seven provinces included in the GRID3 Mapping for Health Project and 2) a number of sampling units representing the microcensus clusters. The population of these clusters is fully enumerated in a dedicated microcensus.
These data were produced by the WorldPop Research Group at the University of Southampton as part of the GRID3 Mapping for Health Project. This project was delivered under the leadership of the Ministry of Public Health, Hygiene and Prevention of the DRC and funded by Gavi, the Vaccine Alliance (RM 86720420A2). The project was led by the Flowminder Foundation and the Center for International Earth Science Information Network (CIESIN) at the Columbia University, in collaboration with the WorldPop Research Group at the University of Southampton and national partners including, but not limited to, the École de Santé Publique de Kinshasa and both the Bureau Central du Recensement and the Institut National de la Statistique. This work was a continuation of the GRID3 (Geo-Referenced Infrastructure and Demographic Data for Development) programme funded by the Bill and Melinda Gates Foundation (BMGF) and the United Kingdom’s Foreign, Commonwealth & Development Office (INV 009579, formerly OPP 1182425). The study was approved by the Faculty Ethics Committee of the University of Southampton (ERGO II 62716).
G Boo, R Hosner, PZ Akilimali, E Darin, HR Chamberlain, WC Jochem, P Jones, R Shulungu Runika, HM Kazadi Mutombo, AN Lazar and AJ Tatem. 2021. Modelled gridded population estimates for the Haut-Katanga, Haut-Lomami, Ituri, Kasaï, Kasaï-Oriental, Lomami and Sud-Kivu provinces in the Democratic Republic of the Congo (2021), version 3.0. WorldPop, University of Southampton, Flowminder Foundation, École de Santé Publique de Kinshasa, Bureau Central du Recensement and Institut National de la Statistique. DOI: 10.5258/SOTON/WP00720
This report may be redistributed following the terms of a Creative Commons Attribution 4.0 International (CC BY 4.0) license.