Regular expression for image url

I already parse pages using HtmlAgilityPack and get most img sources. However, many websites include img URLs in places other than img src attributes (e.g. inline javascript, different attribute, different element). I would like to overlay a slightly wider network and run the regex in the entire html line, fixing the following in the regex.

  • You must start with http: //, https: //, // or /
  • Then any number of valid characters for the URL path
  • It should end either: .jpeg, .jpg, .png, or .gif

I suppose it would be easy to write, however I am not an amazing regexer. I assume the parts will look like this:

  • ^ ((https? \: \ / \ /) | (\ / {1,2}))
  • (any ideas?)
  • (. (jpe? g | png | gif)) $

Can someone help me fill in the blanks?

thank

Answer

(https?:)?//?[^\'"<>]+?\.(jpg|jpeg|gif|png)
+5
source share
2 answers

There are a number of special regular expressions for matching URLs, but none of them know full reliability. However, this one will try to satisfy your conditions.

[1] URL ( ) - $-_.+!*'(),. , +/?%#&, [2] - RFC. , , =;, . , URL-, ( , , , ).

, URL-, - - ( ), - .

@(https?:)?//?[^'"<>]+?\.(jpg|jpeg|gif|png)@

+6
(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*\.(?:jpg|gif|png))(?:\?([^#]*))?(?:#(.*))?
0

All Articles