Reimers.dk

.NET, AJAX and Google APIs brought together
Welcome to Reimers.dk Sign in | Join | Help
in Search

Jacob Reimers Weblog

How Much Geospatial Ability Can You Squeeze Out of SQL Server 2005 Express?

When developing map applications in a .NET environment it is almost natural to use MS SQL Server (Express) as the supporting database for the application. There is however one major drawback to this database - it doesn't have any built-in geospatial functionality. This is why I created the User Defined Types SqlLatLng and SqlBounds to handle geospatial data directly in the database. I could have gone with some of the specialized databases, but integrating geospatial functionality into SQL Server allows you to have all the data stored in one place (that works directly in your .NET environment).

The nice thing about the SQL Server 2005 Express edition is that it is free. This of course also means that you are not going to get all that you are getting in the very expensive Full version. The main limitations are:

  • You are limited to working on only one processor (dual core won't help you),
  • The database can use a maximum of 1GB or RAM,
  • The database size is limited to 4GB.

But what you are getting is still quite a lot. I tried putting it to the test and loaded the complete NGA country file as well as the BGN list of U.S. locations. Those two represent over 6 million locations in the world. Every significant column had its own index for faster search. Afterwards I loaded a shapefile countries of the world (I made a KML version available here) in order to create faster reverse geocoding capability. On top of that went the CIA World Factbook to have some description and information about locations.

Even with all this data I had barely passed 1.5GB in database size, so I continued to load shapefiles. I loaded the complete set of U.S. Zip Code and Country shapes available from the U.S. Census Bureau. On top of that I added a complete list of French departments, Canadian provinces. I'm still loading data into the database as all the above (fully indexed) doesn't even bring me to 2GB.

Feeding data into a database is not enough, one needs some functions and formulas to handle the data. The SQL Server Integration Assembly that I created has some generic functions for calculating whether a SqlLatLng value is within a SqlBounds value. The LineEncloses function calculates whether a SqlLatLng value is inside a complex shape. Finally there are distance calculations and more.

I used these functions combined with some T-SQL programming for searching in order to create some database specific Stored Procedures in order to geocode and reverse geocode locations.

So what are the results? Well a simple geocode lookup where you search for the coordinates of a named location takes < 1 second. This is mainly due to the indexing capabilities of the database.

More interesting is the reverse geocoding result. Reverse geocoding of random locations around the world took 2-5 seconds to complete if you want to include every little town in the world. For zip code determination of a location inside the U.S. it took 2-3 seconds.

For reverse geocoding it is not possible to rely on indices as the distance to each relevant location has to be calculated and sorted so the result times will be higher than indexed values. Furthermore the result times will be proportional to the data in the database. I decided to feed a lot of data into the database to test it, but for most applications a single (or a couple) of country data sets will suffice. This should mean much faster result times.

So even though you have decided to work in a M$ environment it is still possible to create a powerful geospatially enabled database at a very low cost. All of the above is free (except for the SQL Server Integration Assembly which requires a license), meaning that the only cost you will have is the time to set up the database and feed the data.

Published 18. februar 2007 03:32 by jjrdk
Filed under: , ,

Comments

 

lof said:

Very cool!!

Do you know if your functions/assemly would work in SQL 2000? My database is well over 4 GB, over half TB.

Thanks

Jakub

februar 19, 2007 20:51
 

jjrdk said:

SQL Server 2000 doesn't support CLR assemblies, so you cannot create user defined types in the same way. You will have to write the geographic calculations in T-SQL.

februar 20, 2007 00:15
 

jjrdk said:

Just to elaborate - you can also create a setup where you analyze the data on the webserver or have an SQL Server 2005 which queries the SQL Server 2000 from a function. Both setups have the overhead of transferring data before anaylzing = more data transferred than necessary.

februar 20, 2007 06:16
 

lof said:

Can the free version of SQL 2005 use linked tables? Then I could possibly link my large tables and use your CLR assembly.

I use linked tables to query Oracle databases from SQL 2k. It works reasonable well on tabels and views with milions of records.

februar 20, 2007 11:34
 

jjrdk said:

I haven't actually tried it. Synonyms seems to be the preferred, but there are several ways of calling external data using CLR and SQL Server 2005. Have a look here - http://www.google.com/search?q=sql+server+2005+linked+tables

februar 20, 2007 11:56
Anonymous comments are disabled