Removing garbage characters from string using regex

I want to remove characters from a string other than az and AZ. Created the following function for her, and it works great.

public String stripGarbage(String s) { String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz"; String result = ""; for (int i = 0; i < s.length(); i++) { if (good.indexOf(s.charAt(i)) >= 0) { result += s.charAt(i); } } return result; } 

Can someone tell me the best way to achieve the same. Perhaps regex may be the best option.

Hi

Harry

+4
source share
6 answers

Here you go:

 result = result.replaceAll("[^a-zA-Z0-9]", ""); 

But if you understand your code and read it, perhaps you have a better solution:

Some people, faced with a problem, think: "I know, I will use regular expressions." Now they have two problems.

+4
source

The following should be faster than anything using a regular expression and your initial attempt.

 public String stripGarbage(String s) { StringBuilder sb = new StringBuilder(s.length()); for (int i = 0; i < s.length(); i++) { char ch = s.charAt(i); if ((ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') || (ch >= '0' && ch <= '9')) { sb.append(ch); } } return sb.toString(); } 

Key points:

  • This greatly speeds up the use of StringBuilder than concatenating strings in a loop. (The latter generates strings N - 1 garbage and copies the characters N * (N + 1) / 2 to create a string containing the characters N )

  • If you have a good estimate of the length of the String result, it is recommended that you first allocate a StringBuilder to store this number of characters. (But if you do not have a good estimate, the cost of internal redistributions, etc. Depreciates to O(N) , where N is the final string length ... so this is usually not the main problem.)

  • Searching for testing a character in a range (up to 3 characters) will be much faster on average than searching for a character in a string with 62 characters.

  • The switch statement can be faster, especially if there are more character ranges. However, in this case, displaying all the letters and numbers will require many more lines of code.

  • If garbage-free characters match existing predicates of the Character class (e.g. Character.isLetter(char) , etc.), you can use them. This would be a good option if you would like to match any letter or number ... and not just ASCII letters and numbers.

  • Other alternatives to consider are the HashSet<Character> or boolean[] indexable character, which was pre-populated with no garbage characters. These approaches work well if the character set other than garbage is unknown at compile time.

+3
source

This regex works:

 result=s.replace(/[^A-Z0-9a-z]/ig,''); 

s is the string passed to you, and the result is the string with alphanumeric and numbers.

+1
source

it works:

 public static String removeGarbage(String s) { String r = ""; for ( int i = 0; i < s.length(); i++ ) if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers r = r.concat(s.substring(i, i+1)); return r; } 

(edit: although this is not so efficient)

0
source

I know this post is old, but you can shorten Stephen C a bit using the System.Char structure.

 public String RemoveNonAlphaNumeric(String value) { StringBuilder sb = new StringBuilder(value); for (int i = 0; i < value.Length; i++) { char ch = value[i]; if (Char.IsLetterOrDigit(ch)) { sb.Append(ch); } } return sb.ToString(); } 

Still doing the same thing more compactly.

Char has some really great features for checking text. Here are some of your future links.

 Char.GetNumericValue() Char.IsControl() Char.IsDigit() Char.IsLetter() Char.IsLower() Char.IsNumber() Char.IsPunctuation() Char.IsSeparator() Char.IsSymbol() Char.IsWhiteSpace() 
0
source
 /** * Remove characters from a string other than ASCII * * */ private static StringBuffer goodBuffer = new StringBuffer(); // Static initializer for ACSII static { for (int c=1; c<128; c++) { goodBuffer.append((char)c); } } public String stripGarbage(String s) { //String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz"; String good = goodBuffer.toString(); String result = ""; for (int i = 0; i < s.length(); i++) { if (good.indexOf(s.charAt(i)) >= 0) { result += s.charAt(i); } else result += " "; } return result; } 
0
source

Source: https://habr.com/ru/post/1311411/


All Articles