Is there an easy way in linux to strip the website of text from the command line?

I was looking for a command line tool that would turn the html code into text that will be displayed on the site ... so this would be equivalent to selecting everything in a web browser and then inserting it into the text editor ...

Does anyone know something in Ubuntu to do this? I am trying to write a script to parse some web pages, but would prefer not to deal with HTML and would rather just parse the text that appears on the website.

Thanks,

Dan

+6
html linux bash parsing
source share
3 answers
lynx -dump http://example.com/ 
+12
source share

if you already have the html file:

 lynx -dump file.html > file.txt 

otherwise use @Ignacio

+7
source share

I think you need a lynx:

 lynx -dump http://stackoverflow.com > file 
+3
source share

All Articles