How can I get the contents of the <title> tag if it cannot be parsed as XML?

I am using PHP libcurl to load a page. Now I need to get the contents of this <title> tag and other information. I tried parsing it using SimpleXML, but no luck, because the page is not valid XML. Can you suggest another way to easily get the contents of the <title tag? Thanks.

+4
source share
4 answers

You can use DOMDocument :: loadHTML .

This will be the echo "Name":

<?php $doc = <<<HTML <html> <head> <title>The title</title> <body> hhhhhh HTML; libxml_use_internal_errors(true); $d = new DOMDocument; $d->loadHTML($doc); $ts = $d->getElementsByTagName("title"); if ($ts->length > 0) { echo $ts->item(0)->textContent; } 
+3
source

Or you can use Simple HTML DOM

+1
source

You can use this script to get the page title.

 # Script Title.txt var str page, content cat $page > $content stex -r -c "^<title&</title&\>^" $content 

Save this small code in the file C: /Scripts/Title.txt. The code is in biterscripting. Run biterscripting and enter this command.

 script "C:/Scripts/Title.txt" page("http://stackoverflow.com/questions/3135488/how-can-i-get-pages-title-tags-content-if-it-cant-be-parsed-as-xml") 

It will get the name of this page (the one you are viewing). Use any other URL or local file path as the page value (). Use double quotes. When I ran this command, I got

How can I get the page <& name GT; tag if it cannot be parsed as XML? - Stack Overflow

You can call this script from any executable or batch file.

0
source

Try using the Yahoo YQL console. You can request almost any URL, and then request the results in XML format. You can even add xpath to narrow it down.

http://developer.yahoo.com/yql/console/

Perhaps you can call this service using curl. It is very comfortable.

0
source

Source: https://habr.com/ru/post/1314115/


All Articles