Remove diacritics using Go
How to remove all diacritics from a given UTF8 encoded string using Go? for example, convert the string "žůžo" => "zuzo" . Is there a standard way?
You can use the libraries described in Normalizing Text in Go .
Here is the application of these libraries:
// Example derived from: http://blog.golang.org/normalization package main import ( "fmt" "unicode" "golang.org/x/text/transform" "golang.org/x/text/unicode/norm" ) func isMn(r rune) bool { return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks } func main() { t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC) result, _, _ := transform.String(t, "žůžo") fmt.Println(result) } To expand the existing answer a bit:
The Internet standard for comparing strings of different character sets is called "PRECIS" (preparation, observance and matching of internationalized strings in application protocols) and is documented in RFC7564 . There is also a Go implementation at golang.org/x/text/secure/precis .
None of the standard profiles will do what you want, but it would be quite simple to define a new profile. You would like to apply the Unicode Normalization Form D (“D” for “Decomposition”, which means that the emphasis will be separated and be their own combination symbol), and then remove any combination symbol as part of the additional matching rule, and then reconfigure it with the rule normalization. Something like that:
package main import ( "fmt" "unicode" "golang.org/x/text/secure/precis" "golang.org/x/text/transform" "golang.org/x/text/unicode/norm" ) func main() { loosecompare := precis.NewIdentifier( precis.AdditionalMapping(func() transform.Transformer { return transform.Chain(norm.NFD, transform.RemoveFunc(func(r rune) bool { return unicode.Is(unicode.Mn, r) })) }), precis.Norm(norm.NFC), // This is the default; be explicit though. ) p, _ := loosecompare.String("žůžo") fmt.Println(p, loosecompare.Compare("žůžo", "zuzo")) // Prints "zuzo true" } This allows you to later increase the comparison with additional parameters (e.g. width matching, displaying cases, etc.)
It is also worth noting that removing emphasis will almost never be what you really want to do by comparing such lines, however, without knowing your use case, I cannot actually make this statement about your project. To prevent the distribution of precis profiles, it is useful to use one of the existing profiles where possible. Also note that no effort has been made to optimize the example profile.