How to get geo coordinates for POIs and show them on openstreetmap without need for any database backend, cgi, php or any other external database service - just by using symlinks and gatling http daemon

by reinhard@finalmedia.de

2015/07/06

I found a hackish solution for this problem.

Why hackish?

Sure, I could also choose the boring way and use plain javascript, a json file for every country and loading POI data with ajax on demand. So you had to specify the target country first in searchform (or filter it), get your valid json or geojson file with ajax from server, parse for coordinates. you can handle the whole thing in javascript, e voila. done. but... this is boring and that's not, what I want.

I want a solution additionally satisfying the following needs:

So here is a solution.

See a live Demo in Action: http://cdn.osterbruecken.de/ostermap (german).

How it is done?

First I fetched a dump of the geonames database. For the this testing case I just needed data for region germany, so I fetched this file

	http://download.geonames.org/export/dump/DE.zip

I extracted the file DE.txt out of it, parsed the tab separated file (tsv) with tr and cut (or you can use awk if you like it) and used grep for getting all POIs, marked with ";P"

I reduced the charset and transformed them to lowercase, just allowing the following characters:

	a-zöäü ß .-

you can use the following chain to do that

	tr "\t" ";" < DE.txt | cut -d";" -f2,5,6,7 | grep ";P" |\ 
	tr -d "," | cut -d";" -f1,2,3 | tr ";" "," |\
	tr -dc "0-9a-zA-ZöäüÖÄÜß\n ,.-" | tr "A-ZÖÄÜ" "a-zöäü" > cities.txt

this will export all lines to a new file, called cities.txt based on the following format:

	city,lat,lon

UPDATE: The database of geonames.org was not very satisfying. So I used official openstreetmap database dumps from http://download.geofabrik.de, in this case germany-latest.osm.pbf (>2.4 GB) (uncompressed around 40GB) and extract all cities or streetnames out of it. Use osmconvert.c from the toolset of osmconvert for extracting data: (hint: build a 64bit executable and use a machine with a lot of RAM for this! processing the dataset germany-latest.osm will need about 14 GB of RAM on your machine and it will take some hours to finish)

	./osmconvert germany-latest.osm.pbf --max-objects=900000000 --all-to-nodes \
	--csv="name @lat @lon" --csv-separator="," | grep -v -E "^," > cities.txt

process your cities.txt and sort out all duplicate names (its a quick hack, perhaps i will rename those in an improved version later)

	sort -k1 -t, cities.txt | uniq > uniq_cities.txt
all cities are stored in file uniq_cities.txt now - line by line with its coordinates like this:

	zwötzen,50.84858,12.08635

then I wrote a small script, that reads those lines and makes lots of broken symlinks out of it, just putting them into a folder called "search".

	#!/bin/bash
	mkdir -p search
	cat uniq_cities.txt | while read line
	do
	url="http://osm.org/#map=/`echo $line| tr -dc "0-9.,-" | cut -d"," -f2,3| tr "," "/"`"
	symlink="search/`echo $line | cut -d"," -f1 | tr -dc "a-zA-ZöäüÖÄÜß. -" | tr "A-ZÖÄÜ" "a-zöäü"`"
	ln -s "$url" "$symlink"
	done

the name of the broken symlink is the name of the city and the symlink points to an URL like this

	http://osm.org/#map=/

with the given coordinates of the city.

sure, this is just an example. you can use your own tileserver and your own map, like I did in the demo "Ostermap" mentioned before.

In this way you'll get a lot of broken symlinks like these:

...
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ührde -> http://osm.org/#map=/51.70547/10.20814
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrendorf -> http://osm.org/#map=/53.86275/9.41756
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrsleben -> http://osm.org/#map=/52.20087/11.26443
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhry -> http://osm.org/#map=52.29693/10.85758
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhsmannsdorf -> http://osm.org/#map=51.33048/14.90316
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst -> http://osm.org/#map=51.36469/14.506
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst am taucher -> http://osm.org/#map=51.19249/14.21843
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uichteritz -> http://osm.org/#map=51.20652/11.92215
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uiffingen -> http://osm.org/#map=49.5024/9.59269
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigenau -> http://osm.org/#map=49.31204/11.01731
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigendorf -> http://osm.org/#map=48.18048/9.57969
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uissigheim -> http://osm.org/#map=49.67984/9.57134
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbargen -> http://osm.org/#map=53.37535/7.58291
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbering -> http://osm.org/#map=48.35362/13.01465
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulberndorf -> http://osm.org/#map=50.87472/13.67231
...

Why those broken symlinks?

Now, I can use those broken symlinks with gatling httpd, a tiny and really fast httpd server by Felix von Leitner.

I already used gatling for leaflet, my own maps and tiles I made with glosm.

You can find the project "Ostermap" right here. I just rendered the map for Saarland.

Gatling recognizes broken symlinks. If they contain "://" ( like in "http://" or "https://" it will make a valid http redirect out of it and redirect your browser to the given URL. This is a really nice feature. Thanks, fefe. In this way it will redirect any name of the given city to leaflet or openstreetmap with the given coordinates.

So I can start a locally listing gatling and enter the following url in a browser

	http://127.0.0.1/search/ulbargen

to get the geo-location of the city ulbargen and directly show it on the map.

furthermore, you can additionally supply a minimal vanilla javascript, that gets the value of an input-box, transforms given text to lowercase and calls the url by rewriting window.location as described above to subfolder "search/ulbargen". this tiny index.htm would do the magic:

	<html>
	<input title="please enter the name of the point of interest" id="name" value="ulbargen">
	<input type="button" value="suche" onclick="window.location+='search/'+document.getElementById('name').value.toLowerCase()">
	</html>

If any invalid POI name is entered, gatling just responses with 404 file not found. you can write additionally a ajax-script, catching 404 response and write something like "sorry, POI not found. please retry". Or specify your special 404 error page.

Ok. I got it. But why using broken symlinks and not just regular files and get them with javascript?

First: By storing the geo-information in an broken symlink i can implement a very compact storage of those coordinates without limitations of the underlaying filesystem or defined blocksize for a single regular file.

When you try to store the coordinates in regular files, also named by the name of the city - this is not very efficent: The whole "database" of this example would need over 240MB in total, since every file is about 4k on your storage. Even if it just contains those few bytes for the coordinates, every regular file would have a file size about 4096 bytes on your drive (because of block size) (see wikipedia if you want to know more about this). So lots of small regular files would waste a lot of storage capacity.

If you don't believe, just have a look at such files and compare the size with the the following commands

	echo hello > regular_file
	ls -slh1 regular_file
	stat regular_file
	du -hcs regular_file

	ln -s "hello again" symlink_file
	ls -slh1 symlink_file
	stat symlink_file
	du -hcs regular_file

Sure, you can change blocksize of your filesystem by reformating the blockdevice or possibly do some tweaks with tune2fs. But even then the minimal blocksize of ext3 would be around 512 bytes and these changes would be no out-of-the-box solution and could lead to disadvantages of other services on your system.

you will find some more information about this topic here, here and here.

When using symlinks, the whole "database" its just about 1.8 MB in total, since each symlink and inode just needs those 128 bytes in this case. It won't get "blown up" to 4k by the specified minium block size of the underlying filesystem.

now you can also make a tarball out of the folder for distributing the "database". the xz tarball is about 1.1 MB then.

you simply can add a new POI by doing this

	ln -s "http://osm.org/#map=/49.49361/7.26694" "osterbrücken"

and remove it, just by deleting the symlink

	rm osterbrücken

you also can specify zoom-level for individual POIs, if you want to. just use:

	ln -s "http://osm.org/#map=14/49.49361/7.26694" "osterbrücken"

Improvements

you could also distribute street-names in this way, for example by making cities as subfolders and put street POIs as symlinks. this would work with build in directory indexing of gatling.

since there are no unique city names, i should also consider to generate folders for duplicate names and then put each symlink in this folder.

Alternatives

Have a look at rfc5870 which describes the URL Scheme for Geo-Coordinates. Its a A Uniform Resource Identifier for Geographic Locations, in WGS-84 (World Geodetic System). But than you have to evaluate this URL at your client application. Also since gatling awaits "://" and the geo-url just is "geo:74.4294,19.0245", you wont get a successful redirect. you would have to change sourcecode of gatling in http.c for parsing this correctly.

Downloads

You can fetch my pois.de.txt.xz (51MB) with 4.919.091 entries in format "name,lat,lon". The dataset is based on extraction of openstreetmap database dump (20150701), so licenced under Open Data Commons Open Database Lizenz (ODbL), and so copyright by © OpenStreetMap contributors