Extract specific src attributes from script tags

I want to get JS file names from input content that contains jqueryRE as a substring.

This is my code:

Step 1: Extract the JS file from the content.

>>> data = """    <script type="text/javascript" src="js/jquery-1.9.1.min.js"/>
...     <script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/>
...     <script type="text/javascript" src="js/jquery-ui.min.js"/>
...     <script type="text/javascript" src="js/abc_bsub.js"/>
...     <script type="text/javascript" src="js/abc_core.js"/>
...     <script type="text/javascript" src="js/abc_explore.js"/>
...     <script type="text/javascript" src="js/abc_qaa.js"/>"""
>>> import re
>>> re.findall('src="js/([^"]+)"', data)
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js', 'abc_bsub.js', 'abc_core.js', 'abc_explore.js', 'abc_qaa.js']

Step 2: Get a JS file that has a substring as jquery

>>> [ii for ii in re.findall('src="js/([^"]+)"', data) if "jquery" in ii]
['jquery-1.9.1.min.js', 'jquery-migrate-1.2.1.min.js', 'jquery-ui.min.js']

Can I do the above Step 2 in step 1 means the RE pattern to get the result?

+4
source share
1 answer

Of course. One way is to use

re.findall('src="js/([^"]*jquery[^"]*)"', data)

This will match everyone after "js/to the nearest "if it contains jqueryanywhere. If you know more about the position jquery(for example, if it is always at the beginning), you can adjust the regular expression accordingly.

, jquery - , :

re.findall(r'src="js/([^"]*\bjquery\b[^"]*)"', data)
+7

All Articles