Python Regex to parse individual Android user agent device name

I am working on parsing Android user agents with Python 2.5, and so far I have managed to figure out a regex that works for the "majority" of Android user agents that compile the major and minor versions.

(?P<browser>Android) (?P<major_version>\d*).(?P<minor_version>\d*) 

The above regex works for the example below:

 Mozilla/5.0 (Linux; U; Android 2.2; en-gb; Nexus One Build/FRF50) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 

However, I would also like to know what type of Android device it is. I see a general diagram in Android user agents where you can find the device name using this link: http://www.botsvsbrowsers.com/category/6/index.html

Basically, this always happens after a language such as "en-gb;" and before "build /"

So, how do I change my regex so that in the example above, I can parse "Nexus One".

Another example of a user agent for Android would be:

 Mozilla/5.0 (Linux; U; Android 2.1; en-us; HTC Legend Build/cupcake) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17 

In the above example, I want to get "HTC Legend"

+4
source share
2 answers

Try the following:

 (?P<browser>Android) (?P<major_version>\d*)\.(?P<minor_version>\d*);[^;]*;(?P<device>[ \w]+) Build\/ 
+2
source
 (?P<browser>Android)\s(?P<major_version>\d+)\.(?P<minor_version>\d+);[^;]*;\s(?P<device>.+)\sBuild 
+1
source

All Articles