I grew up in (39.0483284, -95.67803955): Dataset of Geocoded US Cities

I’m working on a mapping project at the moment and have delved into the world of geopoints and GIS.  Joy oh joy.

From my day job I have seen that there is a wealth of free data available at data.gov, but its not always in a form that is immediately usable for an average hacking session.

Example one: http://www.data.gov/geodata/g602088/

The data presented here seems to be of a very high quality:  a list of every populated place in the US, the population (2000 census), and most importantly: a geocoded location.  This is a set of data that a data warehouse would be happy to sell you for a thousand or so dollars.  However, the data is offered for download as a shapefile.

Now don’t get me wrong: shapefiles are completely standard raw materials for GIS professionals and it is a (semi) open format.  For a guy like me that isn’t going to buy a license for ESRI GIS software or delve into PostGIS it is still a step away from what I actually need though.

Enter open source software.

After some Googling I came across a great little set of C scripts called shp2txt.  A small executable built on top of the shplib Open Source C libraries.  These libraries parse the shapefile and harvest the stored data out of the DBase formated datastore that accompanies it.

So for me (open data + open source software) = usable dataset.

One open turn deserves another so here is the resulting dataset that came out of the shapfiles:

cities2.csv.tar.gz — gzipped CSV, 1.2MB
cities2.sql.tar.gz — gzipped SQL dump file, 1.2MB

The dataset contains the following fields:

mysql> describe cities2;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | int(11)     | YES  |     | NULL    |       |
| xcoord     | varchar(15) | YES  |     | NULL    |       |
| ycoord     | varchar(15) | YES  |     | NULL    |       |
| z          | varchar(10) | YES  |     | NULL    |       |
| m          | varchar(10) | YES  |     | NULL    |       |
| citiesx020 | varchar(10) | YES  |     | NULL    |       |
| feature    | varchar(25) | YES  |     | NULL    |       |
| name       | varchar(50) | YES  |     | NULL    |       |
| poprange   | varchar(25) | YES  |     | NULL    |       |
| pop2000    | varchar(15) | YES  |     | NULL    |       |
| fips55     | varchar(10) | YES  |     | NULL    |       |
| county     | varchar(40) | YES  |     | NULL    |       |
| fips       | varchar(40) | YES  |     | NULL    |       |
| state      | varchar(5)  | YES  |     | NULL    |       |
| statefips  | varchar(15) | YES  |     | NULL    |       |
| display    | varchar(10) | YES  |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+
16 rows in set (0.00 sec)

Data attributions:

The original data set was downloaded from data.gov and is not subject to any usage restrictions per the data.gov Data Policy.

You are free to use the derivative SQL and CSV files that I produced for any use.

Posted in dataset, Geekery, Hackery

Leave a Reply

Your email address will not be published. Required fields are marked *

*