Bash, remove empty XML tags

I need help a few questions using bash tools

  • I want to remove empty xml tags from a file, for example:
 <CreateOfficeCode>
      <OperatorId>ve</OperatorId>
      <OfficeCode>1234</OfficeCode>
      <CountryCodeLength>0</CountryCodeLength>
      <AreaCodeLength>3</AreaCodeLength>
      <Attributes></Attributes>
      <ChargeArea></ChargeArea>
 </CreateOfficeCode>

to become:

 <CreateOfficeCode>
      <OperatorId>ve</OperatorId>
      <OfficeCode>1234</OfficeCode>
      <CountryCodeLength>0</CountryCodeLength>
      <AreaCodeLength>3</AreaCodeLength>
 </CreateOfficeCode>

for this i did it with this command

sed -i '/><\//d' file

which is not so strict, it looks more like a trick, it would be more appropriate to find <pattern></pattern>and remove it. Sentence?

  1. Secondly, how to go from:
 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
       </CreateOfficeCode>
 </CreateOfficeGroup>

at

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
 </CreateOfficeGroup>
  1. Generally? of:
 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
            <OperatorId>ve</OperatorId>
            <OfficeCode>1234</OfficeCode>
            <CountryCodeLength>0</CountryCodeLength>
            <AreaCodeLength>3</AreaCodeLength>
            <Attributes></Attributes>
            <ChargeArea></ChargeArea>
       </CreateOfficeCode>
       <CreateOfficeSize>
            <Chairs></Chairs>
            <Tables></Tables>
       </CreateOfficeSize>
 </CreateOfficeGroup>

at

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
            <OperatorId>ve</OperatorId>
            <OfficeCode>1234</OfficeCode>
            <CountryCodeLength>0</CountryCodeLength>
            <AreaCodeLength>3</AreaCodeLength>
       </CreateOfficeCode>
 </CreateOfficeGroup>

Can you answer the questions as individuals? Thank you very much!

+4
source share
3 answers
sed '#n
1h;1!H
$ { x
:remtag
  s#\(\n* *\)*<\([^>]*>\)\( *\n*\)*</\2##g
  t remtag

  p
  }' YourFile

(posix version of so --posixin GNU sed)

  • recursively remove an empty tag from the lower arm to the upper, until a more empty tag appears.
  • XML, - <tag1 prop="<tag2></tag2>"> ... , , xml.
+3

XMLStarlet - XML- . , , - ( ) XML, :

:

xmlstarlet ed \
  -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
  input.xml

:

strip_recursively() {
  local doc last_doc
  IFS= read -r -d '' doc 
  while :; do
    last_doc=$doc
    doc=$(xmlstarlet ed \
           -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
           /dev/stdin <<<"$last_doc")
    if [[ $doc = "$last_doc" ]]; then
      printf '%s\n' "$doc"
      return
    fi
  done
}
strip_recursively <input.xml

/dev/stdin - ( ) XMLStarlet; .


, , XML, , Python.

#!/usr/bin/env python

import xml.etree.ElementTree as etree
import sys

doc = etree.parse(sys.stdin)
def prune(parent):
    ever_changed = False
    while True:
        changed = False
        for el in parent.getchildren():
            if len(el.getchildren()) == 0:
                if ((el.text is None or el.text.strip() == '') and
                    (el.tail is None or el.tail.strip() == '')):
                    parent.remove(el)
                    changed = True
            else:
                changed = changed or prune(el)
        ever_changed = changed or ever_changed
        if changed is False:
            return ever_changed

prune(doc.getroot())
print etree.tostring(doc.getroot())
+4

sed:

sed -i ':a;N;$!ba;s/<\([^>]*\)>[ \t\n]*<\/\1>//g;s/\([\n][\t\n ]*[\n]\)/\n/g;' yourfile.xml

script (:l;N;$!bl) (: a - a; N - ; $! bl - )

(<\([^>]*\)>) - ([ \t\n]*) - (<\/\1>). , , \1. .

, (s/[\n][\n]*/\n/g) .

+2

All Articles