Regex to get file name from url

Question

Regex to get file name from url

I am trying to write a regex to get the file name from the url if it exists.

This is what I have so far:

(?:[^/][\d\w\.]+)+$

So from url http://www.foo.com/bar/baz/filename.jpg I have to match filename.jpg

Unfortunately, I'm matching something after the last one.

How can I tighten it so that it only grabs it if it looks like a file name?

+6

regex

shenku Jan 23 '13 at 5:40

source share

11 answers

The above examples cannot get the file name "file-1.name.zip" from this URL:

 "http://sub.domain.com/sub/sub/handler?file=data/file-1.name.zip&v=1"

So, I created my version of REGEX:

 [^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$))

Explanation:

 [^/\\&\?]+ # file name - group of chars without URL delimiters \.\w{3,4} # file extension - 3 or 4 word chars (?=([\?&].*$|$)) # positive lookahead to ensure that file name is at the end of string or there is some QueryString parameters, that needs to be ignored

+19

janeks malinovskis Oct 08 '14 at 9:05

source share

This works well for me.

 (\w+)(\.\w+)+(?!.*(\w+)(\.\w+)+)

+13

deleter1 Jul 13 '14 at 7:54

source share

 (?:.+\/)(.+)

Select everything to the last slash (/), grab everything after that slash. Use the $ 1 subpattern.

+8

yolo May 05, '15 at 20:58

source share

It can also work:

 (\w+\.)+\w+$

0

Sina Iravanian Jan 23 '13 at 6:05

source share

You know what your delimiters look like, so you don't need a regex. Just a split line. Since you did not specify a language, the implementation in Perl is implemented here:

 use strict; use warnings; my $url = "http://www.foo.com/bar/baz/filename.jpg"; my @url_parts = split/\//,$url; my $filename = $url_parts[-1]; if(index($filename,".") > 0 ) { print "It appears as though we have a filename of $filename.\n"; } else { print "It seems as though the end of the URL ($filename) is not a filename.\n"; }

Of course, if you need to worry about certain file name extensions (png, jpg, html, etc.), then adjust them accordingly.

0

user554546 Jan 23 '13 at 9:10

source share

 > echo "http://www.foo.com/bar/baz/filename.jpg" | sed 's/.*\/\([^\/]*\..*\)$/\1/g' filename.jpg

0

Vijay Jan 23 '13 at 9:41

source share

Assuming you will use javascript:

  var fn=window.location.href.match(/([^/])+/g); fn = fn[fn.length-1]; // get the last element of the array alert(fn.substring(0,fn.indexOf('.')));//alerts the filename

0

Aj Jan 23 '13 at 11:24

source share

Here is the code you can use:

 \/([\w.][\w.-]*)(?<!\/\.)(?<!\/\.\.)(?:\?.*)?$

the names "." and ".." are not considered normal.

You can play with this regex here https://regex101.com/r/QaAK06/1/ :

0

Andy ko Aug 2 '18 at 21:21

source share

I use this:

 (?<=\/)[^\/\?#]+(?=[^\/]*$)

Explanation:

(? <=): the positive look behind, claiming that the string has this expression but does not match it.

(? <= /): A positive look at the forward slash "/" means that I am looking for an expression that precedes but does not match the forward slash.

[^ / \? #] +: one or more characters that are not "/", "?" or "#" by deleting search parameters and hash.

(? = [^ /] * $): a positive look at everything that does not match the slash, and then at the end of the line. This is to ensure that the last slash segment is selected.

Usage example:

 const urlFileNameRegEx = /(?<=\/)[^\/\?#]+(?=[^\/]*$)/; const testCases = [ "https://developer.mozilla.org/en-US/docs/Web/API/MutationObserverInit#yo", "https://developer.mozilla.org/static/fonts/locales/ZillaSlab-Regular.subset.bbc33fb47cf6.woff2", "https://developer.mozilla.org/static/build/styles/locale-en-US.520ecdcaef8c.css?is-nice=true" ]; testCases.forEach(testStr => console.log('The file of ${testStr} is ${urlFileNameRegEx.exec(testStr)[0]}'))

0

deckele May 22, '19 at 13:37

source share

Try instead:

 (?:[^/]*+)$(?<=\..*)

-1

wafdude123 Jul 07 '13 at 12:40

source share

slier · Accepted Answer · 2013-01-23T09:07:29+0000

Non pcre

 (?:[^/][\d\w\.]+)$(?<=\.\w{3,4})

Pcre

 (?:[^/][\d\w\.]+)$(?<=(?:.jpg)|(?:.pdf)|(?:.gif)|(?:.jpeg)|(more_extension))

Demo

Since you are testing with javascript based regexpal.com (does not support lookbehind), try this instead

 (?=\w+\.\w{3,4}$).+

Regex to get file name from url

More articles: