Select regex HTML text element?

I want to search ©in an HTML document, and basically get the object to which copyright is attributed.

The copyright line shows several different ways:

<p class="bg-copy">&copy; 2011  The New York Times Company</p>

or

<a href="http://www.nytimes.com/ref/membercenter/help/copyright.html">
&copy; 2011</a> 
<a href="http://www.nytco.com/">The New York Times Company</a>

or

<br>Published since 1996<br>Copyright &copy; CounterPunch<br>
All rights reserved.<br>

I want to ignore dates and intermediate tags and just get "The New York Times Company" or "Counterpunch".

I have not been able to find much to use regex with JavaScript or jQuery, although I get the impression that this can lead to serious headaches. If there is a better approach to this, let me know.

+5
source share
2 answers

, , DOM . , ...

&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)

. .

rubular

:

&copy; // copyright symbol
[\s\d]* // followed by spaces or digits 
(?:</.+?>[^>]*>)? // maybe followed by a closing tag and another opening one
([^<]*) // than match anything up to the next tag

. , javascript jquery. match (/regex/):

var result = string.match(/&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)/)
+2
$('*:contains(©)').filter(function(){
    return $(this).find('*:contains(©)').length == 0
}).text();

http://jsfiddle.net/unloco/kGPYA/

0

All Articles