-, regex . , / , HTML .
, Nokogiri HTML- :
require 'nokogiri'
html = '
<html>
<body>
<p>This is
some text.</p>
<p>This is some more text.</p>
<pre>
This is
preformatted
text.
</pre>
</body>
</html>
'
doc = Nokogiri::HTML(html)
puts doc.text
>> This is
>> some text.
>> This is some more text.
>>
>> This is
>> preformatted
>> text.
, Nokogiri , , , , . HTML tidy, .
, HTML- . , HTML , , HTML . .
HTML- , , "\n" "\r", <br> . SO , , - . , Nokogiri .
, , <li> <ul> <ol>, .
, lynx. , - , , . , , .