The text-icu library contains many Unicode utilities. We will also need the text library to convert our String to text . I installed them by adding the following two lines to build-depends in my cabal file:
build-depends: --- other packages... , text-icu >= 0.7.0.1 && < 1 , text
With the dependencies established, we can remove accents using the following process:
- Convert
String input to text - Normalize input (see documentation , why it is necessary)
- Filter accents
- Return to
String .
With this in mind, we offer the following function:
import Data.List import qualified Data.Text as T import Data.Text.ICU.Char import Data.Text.ICU.Normalize canonicalForm :: String -> String canonicalForm s = T.unpack noAccents where noAccents = T.filter (not . property Diacritic) normalizedText normalizedText = normalize NFD (T.pack s)
If you do not need to convert from String , you can skip calls to T.pack and T.unpack .
Adam hammes
source share