Regular matching of each character up to a specific character ONLY if that specific character exists

Basically, I read the information from the Wikipedia API, which returns the JSON code containing the source code for the page in their markdown. I used the JSON API to filter what I want from the page, and now I want to format the text so that it removes all links, etc.

Markdown displays these links: [[wiki page|display text]]
But it can also be displayed as follows: [[wiki page]]

So what I'm trying to do is retrieve the display text if the pipe character exists, but if not, I just want the wiki page text.

This is my code for this right now, which should determine if there is a channel character and handle these lines correctly, but does not:

 private static String format(String s) { return s.replaceAll("\\[\\[.+?(\\]\\]|\\|)", "").replace("[[", "").replace("]]", "").trim(); } 

When doing this, it sometimes takes out any text that appears as simple [[wiki page]] , but it works if the channel symbol exists. How do I get the right work?

+4
source share
2 answers

You can use:

 private static String format(String s) { return s.replaceAll("\\[\\[(?:[^|\\]]*\\|)?(.+?)\\]\\]", "$1"); } 

RegEx Demo

+3
source
 ((?<=\\[\\[)[^|]*|(?<=\\|).*?)(?=\\]\\]) 

You can use this.Grab $1 Watch a demo.

https://regex101.com/r/rO0yD8/2

+1
source

All Articles