How can I get the contents of the <title> tag if it cannot be parsed as XML?

Question

How can I get the contents of the <title> tag if it cannot be parsed as XML?

I am using PHP libcurl to load a page. Now I need to get the contents of this <title> tag and other information. I tried parsing it using SimpleXML, but no luck, because the page is not valid XML. Can you suggest another way to easily get the contents of the <title tag? Thanks.

+4

html xml php curl libcurl

Aleksejs Popovs Jun 28 '10 at 19:25

source share

4 answers

Or you can use Simple HTML DOM

+1

Sarfraz Jun 28 '10 at 19:32

source share

You can use this script to get the page title.

 # Script Title.txt var str page, content cat $page > $content stex -r -c "^<title&</title&\>^" $content

Save this small code in the file C: /Scripts/Title.txt. The code is in biterscripting. Run biterscripting and enter this command.

 script "C:/Scripts/Title.txt" page("http://stackoverflow.com/questions/3135488/how-can-i-get-pages-title-tags-content-if-it-cant-be-parsed-as-xml")

It will get the name of this page (the one you are viewing). Use any other URL or local file path as the page value (). Use double quotes. When I ran this command, I got

How can I get the page <& name GT; tag if it cannot be parsed as XML? - Stack Overflow

You can call this script from any executable or batch file.

0

PM Jun 28 '10 at 19:47

source share

Try using the Yahoo YQL console. You can request almost any URL, and then request the results in XML format. You can even add xpath to narrow it down.

http://developer.yahoo.com/yql/console/

Perhaps you can call this service using curl. It is very comfortable.

0

misterte Jun 29 '10 at 1:05

source share

Artefacto · Accepted Answer · 2010-06-28T19:27:31+0000

You can use DOMDocument :: loadHTML .

This will be the echo "Name":

<?php $doc = <<<HTML <html> <head> <title>The title</title> <body> hhhhhh HTML; libxml_use_internal_errors(true); $d = new DOMDocument; $d->loadHTML($doc); $ts = $d->getElementsByTagName("title"); if ($ts->length > 0) { echo $ts->item(0)->textContent; }

How can I get the contents of the <title> tag if it cannot be parsed as XML?

More articles: