I mainly use Ruby for this, but my attack plan so far looks like this:
Use gems rdf, rdf-rdfa and either rdf-microdata or mida to analyze data with any URI. I think it is best to map a single schema, for example schema.org, for example, take this yaml file, which tries to describe the conversion between a data dictionary and opengraph in schema.org:
# Schema X to schema.org conversion
#data-vocabulary
DV:
name:name
street-address:streetAddress
region:addressRegion
locality:addressLocality
photo:image
country-name:addressCountry
postal-code:postalCode
tel:telephone
latitude:latitude
longitude:longitude
type:type
#opengraph
OG:
title:name
type:type
image:image
site_name:site_name
description:description
latitude:latitude
longitude:longitude
street-address:streetAddress
locality:addressLocality
region:addressRegion
postal-code:postalCode
country-name:addressCountry
phone_number:telephone
email:email
Then I can store the information found in one format and re-display them using the schema.org syntax.
. schema.org, "Thing" (Thing), . , opengraph "bar", "BarOrPub" (BarOrPub).
? - ? ? .
EDIT:
, , ( all_tags , schema.org ):
RDF::RDFa::Reader.open(url) do |reader|
reader.each_statement do |statement|
tag = statement.predicate.to_s.split('/')[-1].split('#')[-1]
Rails.logger.debug "rdf tag: #{tag}"
Rails.logger.debug "rdf predicate: #{statement.predicate}"
if all_tags.keys.include? tag
Rails.logger.debug "Found mapping for #{statement.predicate} and #{all_tags[tag]}"
results[all_tags[tag]] = statement.object.to_s.strip
end
end
end