This is a request for comments: The post describes a potential future project that might benefit multiple effective altruistic organizations, particularly in global health and development. I would love to hear your thoughts on all angles of this:
- Is this worth doing? How would you rate importance, tractability, neglectedness?
- All other thoughts are welcome too, even (especially?) if they are critical.
- It would be particularly great to hear from practitioners who work in developing countries.
The full text of the article follows:
For many regions of the world, there are no authoritative and openly accessible place names. Solving this problem would help all development and humanitarian work. And solving this problem might be possible with the right mix of people and skills. I'm considering to dedicate my work time to this. The present post explains the idea, with the aim of getting as much feedback as possible.
A fictional map, showing a sparsely populated region and a place with unknown name.
Problem: wait, no place names?
Approximately two billion people live in places that are not labeled on any map, like this village in the Democratic Republic of Congo. Google Maps shows "Walikale" as the address, but that's a territory of almost 25,000 square kilometers, or the name of a town about 100 km away. Other maps and gazetteers are no better: OpenStreetMap, Who's on First
The lack of place names has many negative effects, like slower development, difficult humanitarian work, or worse governance. Consider the example of bednet distributions against malaria. This is familiar to me, because I have worked at the Against Malaria Foundation for the last 2.5 years. Bednets are an essential tool to fight malaria, and so countries like the Democratic Republic of Congo distribute them to all households every three years. Without reliable lists of location names:
- Planning the distribution is hard: how to communicate where nets should go, and how many are needed in each place?
- People miss out on nets: entire villages might be left out if the distribution agents aren't aware of them.
- Monitoring is hard: Typically, AMF selects a subset of locations randomly and sends independent monitoring teams there to verify the distribution. This only works if there are place names.
- Inconsistencies make everything difficult: For example, we want independent monitoring teams, but we can't use their data if location names don't match those used by the distribution teams.
- New technologies cannot be used: Distributions would benefit from better digital data collection tools, dashboards, satellite-based population estimates, etc. The adoption of these tools is often difficult because they require place names and boundaries.
An important consequence is that distributions cost more. Funders regularly buy +5% or +10% extra nets to ensure that there are no stockouts despite low-quality planning data. In addition, there is an engineering cost of dealing with missing and inconsistent data. When I was at AMF, a significant part of my work time was spent on these problems.
Bednet distributions are an area that I know well, but the example generalizes. Most non-profits, NGOs, and governments working in least developed countries would benefit from better place names. Better data would improve routine healthcare, make it easier to organize fair elections, and much more. Yet, such is nobody's core responsibility, and thus nobody has solved the problem yet.
The goal
I want to build the world's best dataset of place names, boundaries, and metadata (such as population estimates), with a focus on least developed countries.
The current state of the art looks roughly like the graphic below, which comes from the excellent overview State of the Gazetteer in 2023 by Who's on First. Regions with orange dots have data on local administrative areas (e.g., village or town names/boundaries). They are missing in most of Africa and large parts of Asia. I want to make sure this map becomes fully covered.
Coverage of local administrative names in the WOF gazetteer (source)
The most useful data has places with on the order of 100 households. This is a typical rural village or a small urban neigborhood. At this level of detail, the data is fine-grained enough for detailled planning. And while practicioners can always use higher levels of the location hierarchy or merge places, the opposite is more difficult.
To make the data fully useful, we also want boundary polygons. This allows us to link places to other data sources, e.g., GPS coordinates or satellite imagery.
Finally, we want estimates of the coverage and quality of the data. We want to know what percentage of settlements and people are covered, and get a list of the unknown settlements. For metadata like population numbers, it would be great to have confidence intervals or some other measurement of accuracy. All data should also come with information on provenance and recency.
Candidate solutions
Solutions to this problem of missing place names involve identifying good data sources, cleaning and aggregating the data, publishing the result in all relevant places to ensure its use, and finally maintaining the data so it stays up-to-date.
A few candidate data sources:
- Governments, ministries of health.
- NGOs like the Against Malaria Foundation, but also local actors such as SANRU in DRC. These could provide not only place names, but also GPS coordinates of households or population estimates.
- The UN Office for Coordination of Humanitarian affairs (see for example their data sets at humdata.org).
- Academic initiatives such as the Columbia Population Research Center or GRID3
- Initiatives by for-profit organizations like Facebook.
- New satellite-based data sets like World Settlement Footprint and Sentinel-2 Land Use.
The people at the Who's on First gazetteer have recently completed gathering and importing place data for India. I highly recommend their write-up of the Karmashapes project. Their approach and techniques might translate to other parts of the world.
Potential challenges
Building the world's best place data set is not without challenges. Below is an initial bullet list. This is an area where feedback and ideas would be particularly welcome.
- There are many existing initiatives addressing parts of the problem. We need to better understand why they have not yet produced the data that we'd like to see.
- Authoritiative data might not exist. People have disputes around boundaries. Some places (e.g., South Sudan) might now know or want the concept of "village name".
- There is a risk of colonialistic thinking, where we (the rich first-worlders) believe to know what less developed countries need. Also, the data is potentially dual-use: it can serve dictators as well as humanitarian organizations.
- The data set needs to be maintained to stay relevant.
- The data set should be published under an open license to maximize usage.
- It's unclear how to make a living when working on this project.
Conclusion
With this early project write-up, I am primarily looking for feedback. Any thoughts are welcome: Is this a good idea? Why or why not? Is this an important problem to solve? How could it be done? Whom should I talk to? What is missing?
Don't hesitate to reach out to me, Jonas Wagner ltlygwayh@gmail.com.
OpenStreetMap's humanitarian team (https://www.hotosm.org/) works on similar topics, it seems like to me.
Thanks! This seems very relevant. I will try to contact the team.
Are you familiar with What Three Words?
Yes, I know about What Three Words. Thanks for the suggestion! It's a good opportunity to clarify the different aims of my project and W3W.
W3W is essentially the same as a GPS coordinate, except more memorable and easier to pronounce. A W3W place does not necessarily correspond to anything particular in the real world (like a settlement). Thus, W3W does not provide any added value for planning purposes.
There are some other downsides, such as W3W being proprietary and based on (IMO) bad design choices (e.g., hard to localize).
A better alternative to W3W is https://maps.google.com/pluscodes/. Plus Codes are indeed useful in places where some form of address is needed, and they are seeing some adoption in developing countries.
My goal is somewhat different: I would like to collect and publish the natural, given names of places, along with boundaries and metadata. The ideal unit here is the settlement, village, community, or neighborhood -- this is the level at which the data would most support humanitarian work, health services, elections, infrastructure development, etc.
For more on this, and why I think we shouldn't advocate for W3W, see: https://shkspr.mobi/blog/2019/03/why-bother-with-what-three-words/ for theoretical reasons and https://w3w.me.ss/ for some practical examples.
As you mention, https://plus.codes is indeed much better, although this is only tangentially related to your project
Can you expand on why the ideal unit is "the settlement, village, community, or neighborhood"?
Here are some reasons why I think that units of ~100 households are ideal. The post itself has more examples.
It's best for detailed planning. There is a type of humanitarian/development work that tries to reach every household in a region. Think vitamin A supplementation, vaccination programs, bednet distributions, cash transfers, ... For these, one typically needs logistics per settlement, such as a contact person/agent/community health worker, some means of transportation, a specific amount of bednets/simcards/..., etc.
Of course, the higher levels of the location hierarchy (health areas, counties, districts, ...) are also needed. But these are often not sufficient for planning. Also note that some programs use other units of planning altogether (e.g., schools or health centers), but the settlement is common.
It's great for monitoring. The interventions mentioned above typically want to reach 100% settlement coverage. It makes sense to monitor things at that level, i.e., ensure that each settlement is reached.
It's great for research. Many organizations use household sampling surveys. These are typically clustered, which means that researchers select a given number of "enumeration units", and then sample a fixed number of households in each unit. Ideally, these enumeration units have roughly even size, clear and well-understood boundaries, and known population counts. The type of locations that I'm aiming for would make good enumeration units.
This type of place name is used and known. For example, people in the region will know where "Kalamu" is. There will likely be a natural contact person, such as a village chief. There will be a road that leads there and a way to obtain transportation. One can ask questions like "is there cellphone coverage in Kalamu" and get a good answer. In the majority of cases, a place name is a well-understood, unambiguous and meaningful concept.
The final reason is about data availability: settlement names are usually the most detailed names available, and their names are reasonably stable and accepted. The data exists, we only need to collect and aggregate and publish it. In contrast, streets or buildings often don't have names, so we can't easily have more fine-grained data than place names. Plus, there are some solutions like Plus Codes for situations where address-like data are preferred.
I can also confirm that an early employee of W3W told me that supporting development work was one the main original aims of W3W.
After a quick read, this was my first thought too (ie that promoting & advocating for the use of "what three words" might be an easier solution)
As an aid worker, I think this is a very interesting idea. Many folks have already mentioned a few tools that can be useful, such as OpenStreetMap. I would also love to see one that could also assess the current needs of these communities.