Introduction
Geocoding is the process of translating a text input like Ungewitterweg, Berlin
into a location with longitude and latitude such as 52.544022/13.147589
. So whenever you search in OpenStreetMap or Google Maps for a location, it does exactly that (and sometimes more, but we don’t focus on that now).
For a pet project of mine (notfellchen.org) I wanted to do exactly that: When a animal is added there to be adopted, the user must input a location that is geocoded and saved with it’s coordinates. When another user visits the site, that wants to adopt a pet in their area, they input their location and it will search for all animals in a specific radius.
How is that done? I’ll show you!
Nominatim
Nominatim is a software that uses OpenStreetMap data for geocoding. It can also do the reverse, find an address for any location on the planet. It is used for the geocoding on OpenStreetMap, so it’s quite production-ready. We could use the public API (while obeying the usage policy) but it’s nicer to have our own instance, so we don’t stress the resources of a donation funded organization and to improve user privacy.
Nominatim works by importing geodate from a PBF-file into a postgres database. This database will later be queried to provide location data. The process is described below.
DNS records
Se let’s start by setting the DNS records so that the domain geocoding.example.org
points to your server. Adjust as needed.
Value | Type | Target |
---|---|---|
geocoding.example.org | CNAME | server1.example.org |
Docker-compose Configuration
We will use Docker Compose to run the official Nominatim Docker image.
It bundles nominatim together with the database postgres. I usually prefere to have a central database for multiple services (e.g. allows easier backups) but for nominatim a seperate database is good for two reasons
- import process (described later) will not slow the database for other services
- it’s easier to nuke everything if things go wrong
The following environment variables will be used to configure the container
PBF_URL
: The URL from where to download the PBF file that contains the geodate we will import. They can be obtained from Geofabrik. It is highly recommended to first download the file to a local server and then set this URL to that server so that the ressources from Geofabrik are not affected if something goes wrong. Feel free to use the pre-set URL for germany while it works if you want to test around.REPLICATION_URL
: Where to get updates from. For example Geofabrik’s update for the Europe extract are available athttps://download.geofabrik.de/europe-updates/
Other places at Geofabrik follow the patternhttps://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/
POSTGRES_
Postgres tuning data, the current setting allows imports on a ressource constrained system. See postgres tuning docs for more infoNOMINATIM_PASSWORD
: Database password.IMPORT_STYLE
: See below
Import Styles
Import styles will determin how much “resolution” the geocoding has. It has the following options
admin
: Only import administrative boundaries and places.street
: Like the admin style but also adds streets.address
: Import all data necessary to compute addresses down o house number level.full
: Default style that also includes points of interest.extratags
: Like the full style but also adds most of the OSM tags into the extratags column.
It has a huge impact on how long the import will take and how much space it will require. Be aware that the import time is on a machine with 32GB RAM, 4 CPUS and SSDs, these are not fixed numbers. My import of admin
took 12 hours.
Style | Import time | DB size | after drop |
---|---|---|---|
admin | 4h | 215 GB | 20 GB |
street | 22h | 440 GB | 185 GB |
address | 36h | 545 GB | 260 GB |
Explaining after drop (from the docs)
About half of the data in Nominatim’s database is not really used for serving the API. It is only there to allow the data to be updated from the latest changes from OSM. For many uses these dynamic updates are not really required. If you don’t plan to apply updates, the dynamic part of the database can be safely dropped using the following command:
./utils/setup.php --drop
I have not done this, so I don’t have any experince with that. But probably it’s a good idea if you don’t need up-to-date data.
Reverse Proxy
As with most of my projects, it runs on a server where the mash-playbook has deployed a Traefik, as Application Proxy. I’ll therefore use trafik labels to configure the revers proxy but the same could be achieved with Caddy or Nginx.
Complete configuration
services:
nominatim:
environment:
- PBF_URL=https://cdn.hyteck.de/osm/germany-latest.osm.pbf
- REPLICATION_URL=https://download.geofabrik.de/europe/germany-updates/
- POSTGRES_SHARED_BUFFERS=1GB
- POSTGRES_MAINTENANCE_WORK_MEM=1GB
- POSTGRES_AUTOVACUUM_WORK_MEM=500MB
- POSTGRES_EFFECTIVE_CACHE_SIZE=1GB
- IMPORT_STYLE=admin
- NOMINATIM_PASSWORD=VERYSECRET
labels:
- "traefik.enable=true"
- "traefik.docker.network=traefik"
- "traefik.http.routers.nominatim.rule=Host(`geocoding.example.org`)"
- "traefik.http.routers.nominatim.service=nominatim-service"
- "traefik.http.routers.nominatim.entrypoints=web-secure"
- "traefik.http.routers.nominatim.tls=true"
- "traefik.http.routers.nominatim.tls.certResolver=default"
- "traefik.http.services.nominatim-service.loadbalancer.server.port=8080"
container_name: nominatim
image: mediagis/nominatim:4.4
restart: always
networks:
- traefik
volumes:
- nominatim-data:/var/lib/postgresql/14/main
- nominatim-flatnode:/nominatim/flatnode
shm_size: 1gb
volumes:
nominatim-flatnode:
nominatim-data:
networks:
traefik:
name: "traefik"
external: true
Importing
Now we are ready to go! Before you type docker-compose up -d
let me explain what it will do
- Start the database
- Download the PBF file from the given URL
- Import the PBF file into the database. Here you are most likely to run into errors because of ressource constraints
- Start the Nominatim server
If you are ready, lets go: docker-compose up -d
. Monitor what nominatim is doing with docker logs -f nominatim
and make a cup of tea. This will take a while (proably several hours).
Testing
You can test your server by visiting the domain. Try /?q=CITYNAME
to see an actual search result.
Example: https://geocoding.example.org/?q=tuebingen
Result
You should now have a running Nominatim instance that you can use for geocoding 🎉. Initially I wanted to show in the same post how you’d use this server to power area search in django but that will be in part 2. Feel free to ping me for questions, preferably at @moanos@gay-pirate-assassins.de
Oh and one last thing:
Legal requirements
Data from OpenStreetMap is licenced under the Open Database License. The ODbL allows you to use the OSM data for any purpose you like but attribution is required. For showing map data, you’d usually display a small badge in the bottom left corner of the map. But geocoding also needs attribution, as per this guideline.