I need to turn HTML into equivalent Markdown-structured text.
OBS .: Quick and easy way to do this using PHP and Python .
As I program in PHP, some people tell Markdownify to do this work, but unfortunately the code is not updating, but in fact it does not work . Sourceforge.net/projects/markdownify has a "NOTE: unsupported - do you want to save this project? Contact me! Markdownify is an HTML converter for Markdown, written in PHP. See it as a successor to html2text.php, because it has a better design. better performance and smaller corner cases. "
From what I could find, I have only two good options:
So, from PHP, I need to pass the HTML code, call Ruby / Python Script and get the output back.
(By the way, people asked a similar question here ("how to call ruby ββScript from php?"), But without practical information for my case).
Following the prompt of the tin man (below), I got to this:
PHP code:
$t='<p><b>Hello</b><i>world!</i></p>'; $scaped=preg_quote($t,"/"); $program='python html2md.py'; //exec($program.' '.$scaped,$n); print_r($n); exit; //Works!!! $input=$t; $descriptorspec=array( array('pipe','r'),//stdin is a pipe that the child will read from array('pipe','w'),//stdout is a pipe that the child will write to array('file','./error-output.txt','a')//stderr is a file to write to ); $process=proc_open($program,$descriptorspec,$pipes); if(is_resource($process)){ fwrite($pipes[0],$input); fclose($pipes[0]); $r=stream_get_contents($pipes[1]); fclose($pipes[1]); $return_value=proc_close($process); echo "command returned $return_value\n"; print_r($pipes); print_r($r); }
Python Code:
#! /usr/bin/env python import html2text import sys print html2text.html2text(sys.argv[1]) #print "Hi!" #works!!!
With the above, I get the following:
command returned 1 array ([0] => Resource Identifier No. 17 1 => Resource Identifier # 18)
And the file "error-output.txt" says:
Traceback (last last call): File "html2md.py", line 5, in print html2text.html2text (sys.argv 1 ) IndexError: index index is out of range
Any ideas ???
Ruby code ( still parsed )
#!/usr/bin/env ruby require_relative 'html2markdown' puts HTML2Markdown.new("<h1>#{ ARGF.read }</h1>").to_s
Just for the record, I tried using the simplest "exec ()" PHP before, but I had problems with some special characters that are very common for the HTML language.
PHP code:
echo exec('./hi.rb'); echo exec('./hi.py');
Ruby Code:
#!/usr/bin/ruby puts "Hello World!"
Python Code:
#!usr/bin/python import sys print sys.argv[1]
Both work fine. But when the line is a little more complicated:
$h='<p><b>Hello</b><i>world!</i></p>'; echo exec("python hi.py $h");
This did not work.
This is because html string requires special characters. I got this using this:
$t='<p><b>Hello</b><i>world!</i></p>'; $scaped=preg_quote($t,"/");
Now it works, as I said here .
I'm confused: Fedora 14 ruby ββ1.8.7 Python 2.7 perl 5.12.2 PHP 5.3.4 nginx 0.8.53