When developing map applications in a .NET environment it is almost natural to
use MS SQL Server (Express) as the supporting database for the application.
There is however one major drawback to this database - it doesn't have any
built-in geospatial functionality. This is why I created the User Defined Types
SqlLatLng and SqlBounds to handle geospatial data directly in the database. I
could have gone with some of the specialized databases, but integrating
geospatial functionality into SQL Server allows you to have all the data stored
in one place (that works directly in your .NET environment).
The nice thing about the SQL Server 2005
Express edition is that it is free. This of course also means that you are
not going to get all that you are getting in the very expensive Full version.
The main limitations are:
- You are limited to working on only one
processor (dual core won't help you),
- The database can use a maximum of
1GB or RAM,
- The database size is limited to 4GB.
But what
you are getting is still quite a lot. I tried putting it to the test and loaded
the complete NGA country file as
well as the BGN
list of U.S. locations. Those two represent over 6 million locations in the
world. Every significant column had its own index for faster search. Afterwards
I loaded a shapefile countries of the world (I made a KML version available here)
in order to create faster reverse geocoding capability. On top of that went the
CIA World Factbook
to have some description and information about locations.
Even with all
this data I had barely passed 1.5GB in database size, so I continued to load
shapefiles. I loaded the complete set of U.S. Zip Code and Country shapes
available from the U.S. Census Bureau.
On top of that I added a complete list of French
departments, Canadian
provinces. I'm still loading data into the database as all the above (fully
indexed) doesn't even bring me to 2GB.
Feeding data into a database is not
enough, one needs some functions and formulas to handle the data. The SQL
Server Integration Assembly that I created has some generic functions for
calculating whether a SqlLatLng value is within a SqlBounds value. The
LineEncloses function calculates whether a SqlLatLng value is inside a complex
shape. Finally there are distance calculations and more.
I used these
functions combined with some T-SQL programming for searching in order to create
some database specific Stored Procedures in order to geocode and reverse geocode
locations.
So what are the results? Well a simple geocode lookup where you
search for the coordinates of a named location takes < 1 second. This is
mainly due to the indexing capabilities of the database.
More interesting
is the reverse geocoding result. Reverse geocoding of random locations around
the world took 2-5 seconds to complete if you want to include every little town
in the world. For zip code determination of a location inside the U.S. it
took 2-3 seconds.
For reverse geocoding it is not possible to rely on
indices as the distance to each relevant location has to be calculated and
sorted so the result times will be higher than indexed values. Furthermore the
result times will be proportional to the data in the database. I decided to feed
a lot of data into the database to test it, but for most applications a single
(or a couple) of country data sets will suffice. This should mean much faster
result times.
So even though you have decided to work in a M$ environment
it is still possible to create a powerful geospatially enabled database at a
very low cost. All of the above is free (except for the SQL
Server Integration Assembly which requires a license), meaning that the only
cost you will have is the time to set up the database and feed the data.