YAML loading with line number for each key

Let's say I have a YAML file that looks like this:

  en:
     errors:
       # Some comment
       format: "% {attribute}% {message}"

       # One more comment
       messages:
         "1": "Message 1"
         "2": "Message 2"

     long_error_message: |
       This is a
       multiline message

     date:
       format: "YYYY-MM-DD"

How can I read this in Ruby Hash like this?

 { 'en': { 'errors': { 'format': { value: '%{attribute} %{message}', line: 4 } 'messages': { '1': { value: 'Message 1', line: 8 }, '2': { value: 'Message 2', line: 9 } } 'long_error_message' : { value: "This is a\nmultiline message", line: 11 } }, 'date': { 'format': { value: 'YYYY-MM-DD', line: 16 } } } } 

I tried using the tip mentioned in YAML: Find key line number? as a starting point and implemented a Psych::Handler , but it seemed to me that I had to rewrite a lot of code from Psych to make it work.

Any ideas how I can solve this?

+7
ruby yaml
source share
4 answers

It looks like you want to take any scalar value, which is the display value, and replace it with a hash using the value key containing the original value and the line key with the line number.

The following almost work, the main problem is a multi-line line, where the specified line number is the beginning of the next thing in Yaml. The problem is that by the time the scalar handler scalar is called a parser, it has already gone beyond the scalar of interest, and therefore mark gives a line of position when it knows that the scalar has ended. In most cases in your example, this does not matter, but with a multi-line case this gives the wrong value. I see no way to get parser information from mark to start scalars without going into Psych C.

 require 'psych' # Psych first step is to parse the Yaml into an AST of Node objects # so we open the Node class and add a way to track the line. class Psych::Nodes::Node attr_accessor :line end # We need to provide a handler that will add the line to the node # as it is parsed. TreeBuilder is the "usual" handler, that # creates the AST. class LineNumberHandler < Psych::TreeBuilder # The handler needs access to the parser in order to call mark attr_accessor :parser # We are only interested in scalars, so here we override # the method so that it calls mark and adds the line info # to the node. def scalar value, anchor, tag, plain, quoted, style mark = parser.mark s = super s.line = mark.line s end end # The next step is to convert the AST to a Ruby object. # Psych does this using the visitor pattern with the ToRuby # visitor. Here we patch ToRuby rather than inherit from it # as it makes the last step a little easier. class Psych::Visitors::ToRuby # This is the method for creating hashes. There may be problems # with Yaml mappings that have tags. def revive_hash hash, o o.children.each_slice(2) { |k,v| key = accept(k) val = accept(v) # This is the important bit. If the value is a scalar, # we replace it with the desired hash. if v.is_a? ::Psych::Nodes::Scalar val = { "value" => val, "line" => v.line + 1} # line is 0 based, so + 1 end # Code dealing with << (for merging hashes) omitted. # If you need this you will probably need to copy it # in here. See the method: # https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365 hash[key] = val } hash end end yaml = get_yaml_from_wherever # Put it all together handler = LineNumberHandler.new parser = Psych::Parser.new(handler) # Provide the handler with a reference to the parser handler.parser = parser # The actual parsing parser.parse yaml # We patched ToRuby rather than inherit so we can use to_ruby here puts handler.root.to_ruby 
+5
source share

I suggest you choose @matts solution. Besides being more circumspect, it handles scalars correctly.


The trick might be the monkeypatch TreeBuilder#scalar :

 y=' en: errors: # Some comment format: "%{attribute} %{message}" # One more comment messages: "1": "Message 1" "2": "Message 2" long_error_message: | This is a multiline message date: format: "YYYY-MM-DD"' require 'yaml' yphc = Class.new(YAML.parser.handler.class) do def scalar value, anchor, tag, plain, quoted, style value = { value: value, line: $line } if style > 1 $line = $parser.mark.line + 1 # handle multilines properly super value, anchor, tag, plain, quoted, style end end $parser = Psych::Parser.new(yphc.new) # more careful handling required for multidocs result = $parser.parse(y).handler.root.to_ruby[0] 

Actually, we are almost done. It remains only to leave the corrected values ​​with line numbers only in sheets. I did not put this logic into an interview with goals.

 def unmark_keys hash hash.map do |k,v| [k.is_a?(Hash) ? k[:value] : k, v.is_a?(Hash) ? unmark_keys(v) : v] end.to_h end p unmark_keys result #β‡’ {"en"=> #β‡’ {"errors"=> #β‡’ { #β‡’ "format"=>{:value=>"%{attribute} %{message}", :line=>4}, #β‡’ "messages"=> #β‡’ { #β‡’ "1"=>{:value=>"Message 1", :line=>8}, #β‡’ "2"=>{:value=>"Message 2", :line=>9} #β‡’ } #β‡’ }, #β‡’ "long_error_message"=>{ #β‡’ :value=>"This is a\nmultiline message\n", :line=>11 #β‡’ }, #β‡’ "date"=>{"format"=>{:value=>"YYYY-MM-DD", :line=>16}} #β‡’ } #β‡’ } 

Of course, you can get rid of global variables, etc. I tried to simplify the implementation of the kernel as much as possible.

Here we go. Hope this helps.

UPD Thanks to @matt, the code above does not work on scalars:

 key1: val1 key2: val2 

This syntax is allowed by YAML, but the above approach does not have the ability to handle it correctly. No string will be returned for this. Besides the annoying lack of scalar support, the lines are being reported correctly for anything else, please refer to the comments on this answer for more information.

+3
source share

I made @matt's decision and created a version that does not require a mankey fix. It also processes values ​​that span multiple lines and the YAML << operator.

 require "psych" require "pp" ValueWithLineNumbers = Struct.new(:value, :lines) class Psych::Nodes::ScalarWithLineNumber < Psych::Nodes::Scalar attr_reader :line_number def initialize(*args, line_number) super(*args) @line_number = line_number end end class Psych::TreeWithLineNumbersBuilder < Psych::TreeBuilder attr_accessor :parser def scalar(*args) node = Psych::Nodes::ScalarWithLineNumber.new(*args, parser.mark.line) @last.children << node node end end class Psych::Visitors::ToRubyWithLineNumbers < Psych::Visitors::ToRuby def visit_Psych_Nodes_ScalarWithLineNumber(node) visit_Psych_Nodes_Scalar(node) end private def revive_hash(hash, node) node.children.each_slice(2) do |k, v| key = accept(k) val = accept(v) if v.is_a? Psych::Nodes::ScalarWithLineNumber start_line = end_line = v.line_number + 1 if k.is_a? Psych::Nodes::ScalarWithLineNumber start_line = k.line_number + 1 end val = ValueWithLineNumbers.new(val, start_line..end_line) end if key == SHOVEL && k.tag != "tag:yaml.org,2002:str" case v when Psych::Nodes::Alias, Psych::Nodes::Mapping begin hash.merge! val rescue TypeError hash[key] = val end when Psych::Nodes::Sequence begin h = {} val.reverse_each do |value| h.merge! value end hash.merge! h rescue TypeError hash[key] = val end else hash[key] = val end else hash[key] = val end end hash end end # Usage: handler = Psych::TreeWithLineNumbersBuilder.new handler.parser = Psych::Parser.new(handler) handler.parser.parse(yaml) ruby_with_line_numbers = Psych::Visitors::ToRubyWithLineNumbers.create.accept(handler.root) pp ruby_with_line_numbers 

I posted the text above as well as some comments and examples

+2
source share

We can add numbers manually, recursively through the processed hash provided by Psych, and finding the line number of each key. The following code will match the result you specified.

 require 'psych' def add_line_numbers(lines, hash) # Ruby cannot iterate and modify a hash at the same time. # So we dup the hash and iterate over the dup. iterator = hash.dup iterator.each do |key, value| if value.is_a?(Hash) add_line_numbers(lines, value) else index = lines.index { |line| line =~ /^\s.?*#{key}.?\:/ } hash[key] = { "value" => value, "line" => (index + 1) } end end end yaml_file = File.expand_path('../foo.yml', __FILE__) lines = File.readlines(yaml_file) data = Psych.load(lines.join("\n")) add_line_numbers(lines, data) puts data 
0
source share

All Articles