.*?|<.*?/>", ""); ...">

Java regex for highlighting XML tags, but not for tag content

I have the following Java code:

str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", ""); 

This turns the line like this:

 How now <fizz>brown</fizz> cow. 

IN:

 How now cow. 

However, I want it to simply separate the <fizz> and </fizz> tags or just standalone </fizz > tags and leave the content of the element separately. So, a regex that would turn this into:

 How now brown cow. 

Or using a more complex string that turns:

 How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow. 

IN:

 How now brown cow. 

I tried this:

 str = str.replaceAll("<.*?></.*?>|<.*?/>", ""); 

And it does not work at all. Any ideas? Thanks in advance!

+7
source share
6 answers
 "How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "") 
+30
source

You were almost there;)

Try the following:

 str = str.replaceAll("<.*?>", "") 
+6
source

You can also try:

 str = str.replaceAll("<.*?>", ""); 

Please see the example below for a better understanding:

 public class StringUtils { public static void main(String[] args) { System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow.")); System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.")); } public static String replaceAll(String strInput) { return strInput.replaceAll("<.*?>", ""); } } 

Output:

 How now brown cow. How now brown cow. 
+2
source

As long as there are other correct answers, none of them give any explanation.

The reason your regular expression <.*?>.*?</.*?>|<.*?/> Does not work is because it will select any tags , as well as everything inside them . You can see that in action on debuggex .

The reason your second attempt <.*?></.*?>|<.*?/> Does not work is because it will select from the beginning of the tag to the first closing tag after the tag . This is a sip of sorts, but you can better understand what is going on in this example .

The regular expression you need is much simpler: <.*?> . It simply selects each tag, ignoring open / close. Visualization .

+2
source

It is not elegant, but easy to follow. The code below removes the start and end XML tags if they are present in the string together

<url>"www.xml.com"<\url>, <body>"This is xml"<\body>

Regular expression:

 to_replace='<\w*>|<\/\w*>',value="" 
+1
source

If you want to parse the XML log file so that you can use the regular expressions {java}, <[^<]+< This way you get <name>DEV</name> . Print as name> DEV. You should just play with REGEX.

0
source

All Articles