Avoid duplicate addresses in a database table

I try to avoid wheel reuse when it comes to storing street addresses in a table only once. Uniqueness constraints will not work in some common situations:

100 W 5th Ave 100 West 5th Ave 100 W 5th 200 N 6th Ave Suite 405 200 N 6th Ave #405 

I could implement some business logic or a trigger to normalize all fields before inserting and using uniqueness constraints for several fields in a table, but it would be easy to miss some cases with something that varies the same way as street addresses.

What would be best would be a universal identifier for each address, possibly based on GPS coordinates. Before saving the new address, find your GUID and see if the GUID exists in the Address table.

An organization such as Mapquest, Postal Serice, FedEx, or the US government probably has such a system.

Has anyone found a good solution for this?

Now my address table (generated by JPA):

 CREATE TABLE address ( id bigint NOT NULL, "number" character varying(255), dir character varying(255), street character varying(255), "type" character varying(255), trailingdir character varying(255), unit character varying(255), city character varying(255), state character varying(255), zip integer, zip4 integer, CONSTRAINT address_pkey PRIMARY KEY (id) ) 
+4
source share
4 answers

I settled in the USC WebGIS service due to its nice web services interface and I can easily subscribe.

Geocodes are not suitable as a unique key for street addresses, although for a number of reasons. For example, geocoding cannot distinguish between different units in a condominium or apartment building.

I decided to use the analyzed address from the result of geocoding and set unique restrictions on the street number, street name, unit, city, state and zip code. This is not ideal, but it works on what I am doing.

0
source

Look at the address on Google maps and use the spelling it uses.

+4
source

You need regular expression support such as syntax. You can come up with some kind of automaton function that will analyze tokens and try to match them, and then expand or shorten them to abbreviations. I would look at glob () as functions that support *? etc. on unix as a quick dirty fix.

0
source

I have not looked for address verification or normalization, although address verification is a good idea. I need a unique identifier for each street address to avoid duplicate entries.

It seems that geocoding can provide a solution. When geocoding, the input can be the street address, and the output will have latitude and longitude coordinates with sufficient accuracy to resolve a particular building.

There's a bigger problem with the ambiguity of the street address than I thought. This is from the geocoding Wikipedia page:

"... in Boston, Massachusetts, there are several 100 Washington streets because several cities were annexed without changing street names."

The geocoding Wikipedia page contains a list of resources (many free) for performing geocoding.

0
source

All Articles