I am having problems parsing a URL that has no path but has a slash in the request. For example: http://example.com?q=a/b
I know that such a URL is most likely invalid (*) - this requires at least a slash as the path: http://example.com/?q=a/b .
All browsers in which I tried such a URL, correct the URL correctly. And thatโs basically what I want to reproduce: identify and fix such a URL.
Using parse_url , however, produces:
var_dump( parse_url('http://example.com?q=a/b') ); array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(15) "example.com?q=a" ["path"]=> string(2) "/b" }
So far, a URL without a slash in the request works fine:
var_dump( parse_url('http://example.com?q=ab') ); array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(11) "example.com" ["query"]=> string(4) "q=ab" }
All of the external libraries I've tried ( Jwage \ Purl , League \ Url , Saber \ Uri ) basically do the same thing, which surprises me a bit.
Why do (all?) Browsers get this โrightโ and (all?) PHP libraries get this โwrongโ?
Besides trying to catch these cases with a regular expression before parsing the URL (which may be unreliable - why do I want to use the library in the first place), what are my alternatives?
(*) I turned to three sources: RFC 1738 , RFC 3986 , WHATWG URL Standard , and they all disagree with what is considered valid.
source share