Search and replace with ruby ​​regular expression

I have a blob text box in a MySQL column that contains HTML. I need to change part of the markup, so I decided that I would do it in a ruby ​​script. Ruby doesn't matter here, but it would be nice to see the answer with it. The markup is as follows:

<h5>foo</h5> <table> <tbody> </tbody> </table> <h5>bar</h5> <table> <tbody> </tbody> </table> <h5>meow</h5> <table> <tbody> </tbody> </table> 

I need to change only the first block <h5>foo</h5> each text to <h2>something_else</h2> , leaving only the remaining line.

It is not possible to get the correct PCRE regular expression using Ruby.

+8
ruby regex
source share
3 answers
 # The regex literal syntax using %r{...} allows / in your regex without escaping new_str = my_str.sub( %r{<h5>[^<]+</h5>}, '<h2>something_else</h2>' ) 

Using String#sub instead of String#gsub only causes the first replacement. If you need to dynamically select what "foo" is, you can use string interpolation in regular expression literals:

 new_str = my_str.sub( %r{<h5>#{searchstr}</h5>}, "<h2>#{replacestr}</h2>" ) 

Then, if you know what "foo" is, you do not need a regular expression:

 new_str = my_str.sub( "<h5>searchstr</h5>", "<h2>#{replacestr}</h2>" ) 

or even:

 my_str[ "<h5>searchstr</h5>" ] = "<h2>#{replacestr}</h2>" 

If you need to run the code to determine the replacement, you can use the sub block form:

 new_str = my_str.sub %r{<h5>([^<]+)</h5>} do |full_match| # The expression returned from this block will be used as the replacement string # $1 will be the matched content between the h5 tags. "<h2>#{replacestr}</h2>" end 
+31
source share

Whenever I have to parse or modify HTML or XML, I get to the parser. I almost never think about regex if it's absolutely uninteresting.

Here's how to do it using Nokogiri, without any regex:

 text = <<EOT <h5>foo</h5> <table> <tbody> </tbody> </table> <h5>bar</h5> <table> <tbody> </tbody> </table> <h5>meow</h5> <table> <tbody> </tbody> </table> EOT require 'nokogiri' fragment = Nokogiri::HTML::DocumentFragment.parse(text) print fragment.to_html fragment.css('h5').select{ |n| n.text == 'foo' }.each do |n| n.name = 'h2' n.content = 'something_else' end print fragment.to_html 

After parsing, this is what Nokogiri returned from the fragment:

 # >> <h5>foo</h5> # >> <table><tbody></tbody></table><h5>bar</h5> # >> <table><tbody></tbody></table><h5>meow</h5> # >> <table><tbody></tbody></table> 

This is after launch:

 # >> <h2>something_else</h2> # >> <table><tbody></tbody></table><h5>bar</h5> # >> <table><tbody></tbody></table><h5>meow</h5> # >> <table><tbody></tbody></table> 
+6
source share

Use String.gsub with regex <h5>[^<]+<\/h5> :

 >> current = "<h5>foo</h5>\n <table>\n <tbody>\n </tbody>\n </table>" >> updated = current.gsub(/<h5>[^<]+<\/h5>/){"<h2>something_else</h2>"} => "<h2>something_else</h2>\n <table>\n <tbody>\n </tbody>\n </table>" 

Note. You can easily check the ruby ​​regex in your browser .

+2
source share

Source: https://habr.com/ru/post/651111/


All Articles