I assume that there will be many Recruit Profile profiles, followed by tables that complete all the details. The following method takes your entire HTML page, finds only those spaces, and for each of them, it finds the following table, and then finds the necessary fields somewhere below this table:
require 'nokogiri'
def recruits_details(html,fields=%W[Name #{"EDU ID"} Phone Email Gender])
doc = Nokogiri::HTML(html)
recruit_labels = doc.xpath('//span[b[text()="Recruit Profile"]]')
recruit_labels.map do |recruit_label|
recruit_table = recruit_label.at_xpath('following-sibling::table')
Hash[ fields.map do |field_label|
label_td = recruit_table.at_xpath(".//td[b[text()='#{field_label}']]")
[field_label, label_td.at_xpath('following-sibling::td/text()').text ]
end ]
end
end
require 'pp'
pp recruits_details(html_string)
XPath, .//foo[bar[text()="jim"]], :
- "foo" - node
- ... "bar"
- ... "bar" "jim"
XPath, following-sibling::... " , node, ...
XPath .../text() Text node; text ( ) node.
Nokogiri xpath , , at_xpath , .
source
share