Mixed XML decoding in golang save order

I need to extract sentences from XML, but given the order of the nodes:

  <items>
   <offer />
   <product>
     <offer />
     <offer />
   </product>
   <offer />
   <offer />
 </items>

The following structure will decode the values, but in two different slices, which will lead to the loss of the original order:

  type Offers struct {
     Offers [] offer `xml:" items> offer "`
     Products [] offer `xml:" items> product> offer "`
 }

Any ideas?

+5
source share
1 answer

One way is to overwrite the UnmarshalXML method. Let's say our contribution is as follows:

 <doc> <head>My Title</head> <p>A first paragraph.</p> <p>A second one.</p> </doc> 

We want to deserialize the document and preserve the order of the chapter and paragraphs. To order, we need a piece. To accommodate both head and p we need an interface. We could define our document as follows:

 type Document struct { XMLName xml.Name `xml:"doc"` Contents []Mixed `xml:",any"` } 

Annotation ,any will collect any element in Contents . This is a Mixed type that we need to define as a type:

 type Mixed struct { Type string // just keep "head" or "p" in here Value interface{} // keep the value, we could use string here, too } 

We need more control over the deserialization process, so we turn Mixed into xml.Unmashaler by implementing UnmarshalXML . We define a code path based on the name of the start element, for example. head or p . Here we just fill our Mixed structure with some values, but you can basically do something here:

 func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error { switch start.Name.Local { case "head", "p": var e string if err := d.DecodeElement(&e, &start); err != nil { return err } m.Value = e m.Type = start.Name.Local default: return fmt.Errorf("unknown element: %s", start) } return nil } 

Combining all of this, using the above structures might look like this:

 func main() { s := ` <doc> <head>My Title</head> <p>A first paragraph.</p> <p>A second one.</p> </doc> ` var doc Document if err := xml.Unmarshal([]byte(s), &doc); err != nil { log.Fatal(err) } fmt.Printf("#%v", doc) } 

What will print.

 #{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]} 

We kept order and saved some type information. Instead of just one type, like Mixed , you can use many different types for deserialization. The cost of this approach is that your container - here the document's Contents field - is the interface. To do something specific to an element, you need a type statement or some helper method.

Full game code: https://play.golang.org/p/fzsUPPS7py

+5
source

All Articles