Let's say I have the following code:
String description = "β
β
β
β
β
β« β¬ This description ββ β¬ β β is a mess. β« β¬ β
β
β
β
β
";
I want to remove non-latin characters: β , β¬ , β , β , β« , β¬ and β
.
And whether it becomes the following: This description is a mess.
I know that there are probably such characters that look like wings, so instead of specifying what I would like to delete, I think it's better to list what I want to keep: Basic Latin and Latin-1 complements characters.
I found that I can use the following code to remove everything except the basic Latin characters
String clean_description = description.replaceAll("[^\\x00-\\x7F]", "").trim();
But is there a way to preserve Latin-1 padding characters?
source share