Understanding Overpass, the API of OpenStreetMap

If you want to use spatial data from OpenStreetMap[1], one way to get it is via Overpass[2], a read-only API of the OpenStreetMap database. While the API is extremely flexible, its unique language, Overpass QL is not very intuitive. This article takes a very simple query and dissects it into its smallest pieces. This will help you understand (and write) such queries.

An overpass at the junction of the M1 and M7 motorways in Hungary, 1968. FORTEPAN/UVATERV. Creative Commons CC-BY-SA-3.0

Author’s note: although this page has every piece of information as the original post on my personal website, I recommend visiting the site for the best reading experience.

Before we begin, I have to emphasize the importance of reading the official documentation: most of my knowledge about Overpass QL comes from the Language Reference and the Language Guide. However, these documents are quite dry and therefore not very easy to follow, which is why I am providing this step-by-step tutorial.

This article is not meant to be an all-encompassing guide to OpenStreetMap, therefore I strongly recommend checking out the OSM Wiki to understand what OpenStreetMap really is, and more specifically the beginners’ guide to understanding OSM data to understand what we will be doing below.

Museums in Szeged

We will use Overpass Turbo, a friendly interface to the Overpass API to run a simple query that finds museums in Szeged, a city in Hungary.

The Overpass Turbo interface showing a query to get museums in Szeged and its results. Map data © OpenStreetMap contributors.

We want to get the location of these museums and additional tags associated with them.[3] We will either get a point (one latitude/longitude pair) or the outline of the building.[4] We can get these by typing tourism=museum in Szeged in the in the query wizard window of Overpass Turbo and hitting the build and run query button. You can do this yourself by going to Overpass Turbo and clicking the button and pasting the code snippet there.

Overpass Turbo generates a query for us. After removing the comments that are added automatically (and that are not very useful unless you already know well how Overpass works), we get the following:

[out:json][timeout:25];
{{geocodeArea:Szeged}}->.searchArea;
(
node["tourism"="museum"](area.searchArea);
way["tourism"="museum"](area.searchArea);
relation["tourism"="museum"](area.searchArea);
);
out body;
>;
out skel qt;

Line-by-line analysis

What is going on here? Let’s take this query apart! Note that just like in many programming languages, the semicolon ; at the end of the line denotes the end of a statement.

The [out:json][timeout:25]; statement

[out:json][timeout:25]; sets overall query parameters. In this case, we state that we want the output in JSON format and that the script should time out after waiting for 25 seconds.

The output format could be xml, json, csv, custom, or popup. XML and JSON are similar to each other: a non-tabular format that has all the information you requested. CSV directly generates tabular data to be used in spreadsheets or elsewhere; it also requires further parameters. The other two options are rarely used. Documentation for all these options can be found here.

As for the timeout: sometimes you might notice that this happens to your query. In this case, you might want to increase the timeout setting. Be mindful though that the API servers provide a free service you share with others. Do not query too much data at the same time and do not let your queries run for too long — your query might get denied if you request too much/too often.

The {{geocodeArea:Szeged}}->.searchArea; statement

{{geocodeArea:Szeged}}->.searchArea; assigns the area called Szeged to a variable called searchArea.

Aerial photo of Szeged, taken in 1989. The Votive Church is in the middle of the photo with the Tisza river behind it. The white building in the park on the left of the river is the Móra Ferenc Museum. FORTEPAN/Urbán Tamás. Creative Commons CC-BY-SA-3.0

{{geocodeArea:Szeged}} is actually not pure Overpass QL but an Overpass Turbo Shortcut. These shortcuts are provided by Overpass Turbo to make writing queries easier. In this case, the shortcut is used to find the area called Szeged, so that we do not have to manually find its ID to run the query.[5]

If you click Export then under Query select Convert to OverpassQL, you get the actual query with the shortcuts expanded into their actual values. In our case, we can see that the line gets replaced with area(3601025105)->.searchArea;, so we now know that the area ID of Szeged is 3601025105.[*]

( and );

( and ); indicate that we would like to get a union of the searches that we see on the next three lines.

node[...](...);, way[...](...);, and relation[...](...);

node["tourism"="museum"](area.searchArea); can be split up in the following parts:

  • node indicates that we want to search for nodes (points)
  • ["tourism"="museum"] is a filter for the value of the tourism key of the node: it should exist and its value should be museum (see relevant Wiki page).
  • (area.searchArea) means that we want to filter by area: the parentheses show that this is a filter, area shows that we filter by area and .searchArea is the variable we defined above. We could use an area ID here instead of a variable; since we know that the area ID of Szeged is 3601025105, a direct way to write this would be (area:3601025105). Notice that we now have a colon to indicate that it is an ID. (See the Wiki for more information on this).

way[...](...); and relation[...](...); are essentially the same but searching for ways and relations instead of nodes. Finally, as mentioned above, ( and ); takes the union of these three queries.

Getting the data, part 1: out body;

out is a command to actually gather the data. We will tackle the body part when we discuss the out skel qt; line, for now, let's just accept it as a parameter of out (it tells the API what we want to get).

If we do not add out;, we will not get actual results. Try running the following code in Overpass Turbo by pasting it there and pressing the Run button:

[out:json][timeout:25];
{{geocodeArea:Szeged}}->.searchArea;
(
node["tourism"="museum"](area.searchArea);
way["tourism"="museum"](area.searchArea);
relation["tourism"="museum"](area.searchArea);
);

Overpass Turbo will not display a map because it receives no geo data. It will display a response similar to the following one in the Data tab:

{
"version": 0.6,
"generator": "Overpass API 0.7.55.5 2ca3f387",
"osm3s": {
"timestamp_osm_base": "2019-03-16T15:03:02Z",
"timestamp_areas_base": "2019-03-16T14:53:02Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [ ]
}

This happens because we did not specify the aspects of the data points we actually want to get.

To make this a bit clearer, we have to understand that Overpass QL allows for some abstraction. When we ran the following part of the query:

(
node["tourism"="museum"](area.searchArea);
way["tourism"="museum"](area.searchArea);
relation["tourism"="museum"](area.searchArea);
);
out body;

Overpass interpreted this as assigning this union to an unnamed default result set, then asked the server to output this unnamed default result set. If we were to write an explicit query, we could write:

(
node["tourism"="museum"](area.searchArea);
way["tourism"="museum"](area.searchArea);
relation["tourism"="museum"](area.searchArea);
)->.my_museums;
.my_museums out body;

Just like when we defined the search area, the -> operator is used to assign a result set to a variable (whose name must start with a period). In programming terms, if we do not return this variable, it just gets destroyed after our query runs. out is used in this case to actually "collect" its results. It does not return a new set but updates the original one.

Getting the data, part 2: the > command

The atomic data type of OpenStreetMap is a node. Ways (lines) consist of nodes and relations consist of relations, ways, and/or nodes.[6]

Downwards recursion, >, is the command to get the elements that build up the element in question. When we call > on a result set, it gets its members. As the relevant Wiki section puts it, it returns:

  • all nodes that are part of a way which appears in the input set; plus
  • all nodes and ways that are members of a relation which appears in the input set; plus
  • all nodes that are part of a way which appears in the result set

The last of these three mean that we do not need to use > twice to get the nodes of the ways that are parts of a relation in our original data, it is automatically gathered for us. [**]

We need this recursion because if a museum we got in the result set is defined as a way or relation (because it is not just one node but the outline of a building), just having the details of the way (or relation) is not enough to actually display it. We also need the details of the nodes that are members of this way (or relation).

For example, one of the museums in Szeged is called Fekete Ház. Its data in OpenStreetMap is the following (with some node IDs omitted for brevity):

{
"type": "way",
"id": 92877718,
"nodes": [
1076957101,
4794113061,
...,
1076957101
],
"tags": {
"building": "public",
"name": "Fekete Ház",
"tourism": "museum",
"website": "http://moramuzeum.hu/latogatoi-informaciok/kiallitohelyeink/fekete-haz/"
}
}

We know that we need nodes 1076957101, 4794113061, 1076957033 to display the outline of the building, but we do not get their details (their coordinates and other tags) from the server unless we specifically ask for them with the recursion statement.

Getting the data, part 3: out skel qt;

At this point in the query, we have the ways, nodes, and relations already in an output format (after out body;) plus we have some new nodes (and potentially ways) after the recurse down (>) statement in our result set. The latter of the two has not yet been returned, so we need another out statement.

The out statement can take a few parameters, as listed on the Wiki.

Let’s get qt right out of the way: it is a parameter to describe how the data should be sorted in the output. This value makes the query faster, as it sorts data roughly geographically.

Another parameter for out is the detail level of the data we return. The default level is body, which we used in the first out statement. It returns coordinates, key-value tags, and roles within relations – everything that you need to fully use the data. Another possible value for this parameter is skel, used in the second out statement of the query. Here we only return the information that is essential to display the object: IDs and coordinates (or member IDs, if the object is not a node but a way or relation).

For example, if we were to change our last line from out skel to out body, we would learn that one of the nodes that constitute the outline of one of the museum buildings is also a memorial plaque.

If we use body, we receive the following information about this node:

{
"type": "node",
"id": 4556840669,
"lat": 46.2442204,
"lon": 20.1416321,
"tags": {
"historic": "memorial",
"material": "stone",
"memorial": "plaque",
"name": "Varga Mátyás"
}
}

whereas with skel we really only get whatever is needed to display it (as a part of a line):

{
"type": "node",
"id": 4556840669,
"lat": 46.2442204,
"lon": 20.1416321
}

Part of our query has already been returned with out body before we called out skel on the second part (where we got additional data points with >). Therefore, although we only get basic data of the new elements, we do not lose the already retrieved tag information of the previously returned elements.

Getting centroids

It might be the case that you just want to display the locations of the museums on a map and therefore you don’t actually need the outlines of the buildings, just their overall location as a point.

The documentation for out shows that you can request just the center of each object by using out center. Obviously, for nodes, this returns the node itself; for ways and relations, you get the centroid.[7]

Therefore, we can change the end of our query by replacing

out body;
>;
out skel qt;

with

out body center; // 'body' can be omitted because it is the default value

and we get the centroids as expected. Notice that we only have one out statement here; this is because it immediately returns the centroid without the need to use > to get the constituent elements of ways and relations.

A more easily readable version of the query

If I were to write the query by hand, I would probably put it in the following format:

[out:json][timeout:25];
{{geocodeArea:Szeged}}->.searchArea;
(
node["tourism"="museum"](area.searchArea);
way["tourism"="museum"](area.searchArea);
relation["tourism"="museum"](area.searchArea);
)->.my_museums;
.my_museums out;
.my_museums > -> .full_set;
.full_set out skel qt;

Just adding the few variable names makes it much easier to read — it becomes much clearer what data we act on on each step.

Conclusion

This article has taken a simple Overpass QL query and explained it in depth. Of course, this has just scratched the surface of what the Overpass API is capable of — however, having attained some familiarity with the topic, the Language Reference and the Language Guide will be much easier to understand.

As a next step, you might want to check out the Overpass API by Example page for more complex queries.

If you have any questions or comments, please add them below.

[1] OpenStreetMap is more than just a free map: it is also a free database of global geographical data. If you are not familiar with it, check out its Wiki to learn more.
[2] Read more about the Overpass API and its various endpoints here.
[3] Tags are a way of storing information about nodes, ways, or relations in key/value pairs. For example, a way might have the tag “highway=primary” indicating that it is a primary road. Read more on the relevant Wiki page.
[4] This depends on how the volunteer editors added the museum to the database. Is something missing? Follow the Beginners’ Guide to learn how to edit the map.
[5] You can read more about Overpass Turbo shortcuts on the Wiki.
[6] Read the relevant Wiki page to learn more about elements and their types.
[7] Note that this centroid can be outside of the shape if the shape is concave.

[*] IDs like this or any of the specific results (node details, tags, dates, anything) can change as editors add data to the map. Therefore, these specifics might get outdated in this article at one point, but this should not hinder the understanding of its material.
[**] If we use the recurse down relations command, >>, it also recursively returns all relations that are members of a relation appearing in the input set. This is important if we have relations that are built up of relations in our initial result set. One could typically just use >> to ensure they get all the data.

Originally published at https://hann.io on January 17, 2020.

Transportation Economist by training, currently working as a Data Scientist at Booking.com. Map and geo data enthusiast. Avid hiker. Personal website: hann.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store