I think that there is a good solution: supervised learning/segmentation with dir...

I think that there is a good solution: supervised learning/segmentation with direct user verification.

You have the user enter a free form address, and then translate it into a structured address. If they correct any fields, you look at those and try to figure out if the final result is correct or not, and integrate that.

Maybe this could be done as a service with iframes (like ReCaptcha); and since the information in a full address is basically entirely public knowledge (at least in the U.S.), you can keep all of it around in full detail.