Is there an easy way in linux to strip the website of text from the command line?

Question

Is there an easy way in linux to strip the website of text from the command line?

I was looking for a command line tool that would turn the html code into text that will be displayed on the site ... so this would be equivalent to selecting everything in a web browser and then inserting it into the text editor ...

Does anyone know something in Ubuntu to do this? I am trying to write a script to parse some web pages, but would prefer not to deal with HTML and would rather just parse the text that appears on the website.

Thanks,

Dan

+6

html linux bash parsing

Dan Feb 24 '10 at 22:12

source share

3 answers

if you already have the html file:

 lynx -dump file.html > file.txt

otherwise use @Ignacio

+7

John boker Feb 24 '10 at 22:16

source share

I think you need a lynx:

 lynx -dump http://stackoverflow.com > file

+3

shuvalov Feb 24 '10 at 22:22

source share

Ignacio Vazquez-Abrams · Accepted Answer · 2010-02-24T22:15:57+0000

lynx -dump http://example.com/

Is there an easy way in linux to strip the website of text from the command line?

More articles: