I am trying to extract some HTML from different blogs and noticed that different providers use the same tag in different ways.
For example, here are two main providers that use the meta name generation tag in different ways:
- Blogger:
<meta content='blogger' name='generator'/>(first content, name later and yes, single quotes!) - WordPress:
<meta name="generator" content="WordPress.com" />(first name, content later)
Is there a way to extract the content value for all cases (single / double quotes, first / last in a string)?
PS Although I use Java, the answer would probably help more people if it were usually used for regular expressions.
source
share