Parsing a URL without a path, but with a slash in the request

Question

Parsing a URL without a path, but with a slash in the request

I am having problems parsing a URL that has no path but has a slash in the request. For example: http://example.com?q=a/b

I know that such a URL is most likely invalid (*) - this requires at least a slash as the path: http://example.com/?q=a/b .

All browsers in which I tried such a URL, correct the URL correctly. And that’s basically what I want to reproduce: identify and fix such a URL.

Using parse_url , however, produces:

 var_dump( parse_url('http://example.com?q=a/b') ); array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(15) "example.com?q=a" ["path"]=> string(2) "/b" }

So far, a URL without a slash in the request works fine:

 var_dump( parse_url('http://example.com?q=ab') ); array(3) { ["scheme"]=> string(4) "http" ["host"]=> string(11) "example.com" ["query"]=> string(4) "q=ab" }

All of the external libraries I've tried ( Jwage \ Purl , League \ Url , Saber \ Uri ) basically do the same thing, which surprises me a bit.

Why do (all?) Browsers get this “right” and (all?) PHP libraries get this “wrong”?

Besides trying to catch these cases with a regular expression before parsing the URL (which may be unreliable - why do I want to use the library in the first place), what are my alternatives?

(*) I turned to three sources: RFC 1738 , RFC 3986 , WHATWG URL Standard , and they all disagree with what is considered valid.

+5

url php

Rotora Jul 15 '15 at 11:42

source share

2 answers

cars10m · Answer 1 · 2015-07-15T12:13:26+0000

If you still want to apply the regex, the following should generate the URL you are looking for:

 $url=pcre_replace('/([^/]+:\/\/[^/]+)\?/', '$1/?',$url);

The URL is required to start with the protocol name of at least one character, followed by ": //", the domain name of at least one character ("localhost" would also be acceptable). After that, it will insert '/' before the character '?', But only if before the '?' No more//.

Anne · Answer 2 · 2015-11-08T09:13:24+0000

The WHATWG URL standard comes closest to what browsers do. Other software is not quite aligned yet, although for PHP https://phppackages.org/p/esperecyan/url may work. (Did not try.)

Parsing a URL without a path, but with a slash in the request

More articles: