Regex to get file name from url

I am trying to write a regex to get the file name from the url if it exists.

This is what I have so far:

(?:[^/][\d\w\.]+)+$ 

So from url http://www.foo.com/bar/baz/filename.jpg I have to match filename.jpg

Unfortunately, I'm matching something after the last one.

How can I tighten it so that it only grabs it if it looks like a file name?

+6
source share
11 answers

Non pcre

 (?:[^/][\d\w\.]+)$(?<=\.\w{3,4}) 

Pcre

 (?:[^/][\d\w\.]+)$(?<=(?:.jpg)|(?:.pdf)|(?:.gif)|(?:.jpeg)|(more_extension)) 

Demo

Since you are testing with javascript based regexpal.com (does not support lookbehind), try this instead

 (?=\w+\.\w{3,4}$).+ 
+5
source

The above examples cannot get the file name "file-1.name.zip" from this URL:

 "http://sub.domain.com/sub/sub/handler?file=data/file-1.name.zip&v=1" 

So, I created my version of REGEX:

 [^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$)) 

Explanation:

 [^/\\&\?]+ # file name - group of chars without URL delimiters \.\w{3,4} # file extension - 3 or 4 word chars (?=([\?&].*$|$)) # positive lookahead to ensure that file name is at the end of string or there is some QueryString parameters, that needs to be ignored 
+19
source

This works well for me.

 (\w+)(\.\w+)+(?!.*(\w+)(\.\w+)+) 
+13
source
 (?:.+\/)(.+) 

Select everything to the last slash (/), grab everything after that slash. Use the $ 1 subpattern.

+8
source

It can also work:

 (\w+\.)+\w+$ 
0
source

You know what your delimiters look like, so you don't need a regex. Just a split line. Since you did not specify a language, the implementation in Perl is implemented here:

 use strict; use warnings; my $url = "http://www.foo.com/bar/baz/filename.jpg"; my @url_parts = split/\//,$url; my $filename = $url_parts[-1]; if(index($filename,".") > 0 ) { print "It appears as though we have a filename of $filename.\n"; } else { print "It seems as though the end of the URL ($filename) is not a filename.\n"; } 

Of course, if you need to worry about certain file name extensions (png, jpg, html, etc.), then adjust them accordingly.

0
source
 > echo "http://www.foo.com/bar/baz/filename.jpg" | sed 's/.*\/\([^\/]*\..*\)$/\1/g' filename.jpg 
0
source

Assuming you will use javascript:

  var fn=window.location.href.match(/([^/])+/g); fn = fn[fn.length-1]; // get the last element of the array alert(fn.substring(0,fn.indexOf('.')));//alerts the filename 
0
source

Here is the code you can use:

 \/([\w.][\w.-]*)(?<!\/\.)(?<!\/\.\.)(?:\?.*)?$ 

the names "." and ".." are not considered normal.

You can play with this regex here https://regex101.com/r/QaAK06/1/ :

0
source

I use this:

 (?<=\/)[^\/\?#]+(?=[^\/]*$) 

Explanation:

(? <=): the positive look behind, claiming that the string has this expression but does not match it.

(? <= /): A positive look at the forward slash "/" means that I am looking for an expression that precedes but does not match the forward slash.

[^ / \? #] +: one or more characters that are not "/", "?" or "#" by deleting search parameters and hash.

(? = [^ /] * $): a positive look at everything that does not match the slash, and then at the end of the line. This is to ensure that the last slash segment is selected.

Usage example:

 const urlFileNameRegEx = /(?<=\/)[^\/\?#]+(?=[^\/]*$)/; const testCases = [ "https://developer.mozilla.org/en-US/docs/Web/API/MutationObserverInit#yo", "https://developer.mozilla.org/static/fonts/locales/ZillaSlab-Regular.subset.bbc33fb47cf6.woff2", "https://developer.mozilla.org/static/build/styles/locale-en-US.520ecdcaef8c.css?is-nice=true" ]; testCases.forEach(testStr => console.log('The file of ${testStr} is ${urlFileNameRegEx.exec(testStr)[0]}')) 
0
source

Try instead:

 (?:[^/]*+)$(?<=\..*) 
-1
source

All Articles