Information:
I have a program that generates XML sitemaps for Google Webmaster Tools (among other things).
GWTs gives me errors for some Sitemaps, because the URLs contain sequences of characters like ã¾, ã <, ã €, etc. **
GWTs says:
We require your Sitemap to be encoded in UTF-8 format (you can usually do this by saving the file). As with all XML files, any data values (including URLs) must use entity escape codes for characters: & , , " < , .
Special characters are output in XML files (with HTML objects).
XML file fragment:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://domain/folder/listing-ã.shtml</loc> ...
Are my UTF-8 URLs encoded?
If not, How to do it in Java ?
Below is the line in my program where I add the URL to the sitemap:
siteMap.addUrl(StringEscapeUtils.escapeXml(countryName+"/"+twoCharFile.getRelativeFileName().toLowerCase()));
** = I am not sure which of them cause the error, perhaps the first two examples.
Sorry for all the editing.
source share