Java regex for highlighting XML tags, but not for tag content
I have the following Java code:
str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", ""); This turns the line like this:
How now <fizz>brown</fizz> cow. IN:
How now cow. However, I want it to simply separate the <fizz> and </fizz> tags or just standalone </fizz > tags and leave the content of the element separately. So, a regex that would turn this into:
How now brown cow. Or using a more complex string that turns:
How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow. IN:
How now brown cow. I tried this:
str = str.replaceAll("<.*?></.*?>|<.*?/>", ""); And it does not work at all. Any ideas? Thanks in advance!
You can also try:
str = str.replaceAll("<.*?>", ""); Please see the example below for a better understanding:
public class StringUtils { public static void main(String[] args) { System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow.")); System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.")); } public static String replaceAll(String strInput) { return strInput.replaceAll("<.*?>", ""); } } Output:
How now brown cow. How now brown cow. As long as there are other correct answers, none of them give any explanation.
The reason your regular expression <.*?>.*?</.*?>|<.*?/> Does not work is because it will select any tags , as well as everything inside them . You can see that in action on debuggex .
The reason your second attempt <.*?></.*?>|<.*?/> Does not work is because it will select from the beginning of the tag to the first closing tag after the tag . This is a sip of sorts, but you can better understand what is going on in this example .
The regular expression you need is much simpler: <.*?> . It simply selects each tag, ignoring open / close. Visualization .