Regular expression to replace relative link with root relative link

I have a line of text containing html with all types of links (relative, absolute, root). I need a regular expression that PHP preg_replace can execute to replace all relative links to root links without affecting any of the other links. I already have a root path.

Substituted links:

 <tag ... href="path/to_file.ext" ... > ---> <tag ... href="/basepath/path/to_file.ext" ... > <tag ... href="path/to_file.ext" ... /> ---> <tag ... href="/basepath/path/to_file.ext" ... /> 

Untouched links:

 <tag ... href="/any/path" ... > <tag ... href="/any/path" ... /> <tag ... href="protocol://domain.com/any/path" ... > <tag ... href="protocol://domain.com/any/path" ... /> 
+1
source share
2 answers

If you just want to change the base URI, you can try the BASE element :

 <base href="/basepath/"> 

But note that changing the base URI affects all relative URIs, not just relative URI paths.

Otherwise, if you really want to use a regular expression, consider that the relative path, as you want, should be of type path-noscheme (see RFC 3986 ):

 path-noscheme = segment-nz-nc *( "/" segment ) segment = *pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" 

So, the beginning of the URI should match:

 ^([a-zA-Z0-9-._~!$&'()*+,; =@ ]|%[0-9a-fA-F]{2})+($|/) 

But please use the correct HTML parser for HTML parsing. You can then query the DOM to get the href attributes and check the value using the regular expression above.

+4
source

I came up with this:

 preg_replace('#href=["\']([^/][^\':"]*)["\']#', $root_path.'$1', $html); 

This may be too simplistic. The obvious drawback that I see is that it will also match href="something" when it is outside the tag, but hopefully it can launch you.

0
source

All Articles