Removing all tags of a specific type in emacs

I have an XML file. Just by reading this, I can say that you are excited.

Now there are some tags that I want to completely remove:

<qwerty option=1> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> 

This is a large file. How to remove all nmo and blue tags, including their contents? In Emacs or something else that my mac can use.

+4
source share
3 answers

I assume your XML file is well formed. And I also assume that contrary to your example, your "real" data is a little more complicated than a single tag per line (except for the root). Otherwise, we agree that it will be as simple as deleting lines containing this tag?

Here is a suggestion for a function that could do the trick:

 (defun my-remove-tag (tag) (save-excursion (let ((case-fold-search nil)) (while (search-forward-regexp (concat "<" tag "[^\\>]*>")) (delete-region (match-beginning 0) (search-forward (concat "</" tag ">"))))))) 

When calling this function, you can search for nmo , blue or qwerty tags nmo this:

 (my-remove-tag "nmo") (my-remove-tag "qwerty") 

Justification searches for an opening tag, and then searches for a closing tag and removes everything in the middle. Attributes for a tag can go in the middle of the path, and this function is associated with opening a tag containing attributes.

Case sensitivity is disabled and restored after the function is executed. Emacs Point is also restored using a regular macro: save-excusion .

Update

I deleted the external let. There is no need to restore the case-fold-search value manually, let binding just obscures the global value, it is restored using "unshadowing".

+3
source

Emacs has commands for navigating symbolic expressions or "sexps". In xml-mode sexp navigation commands work with tags. You can go to opening < , press CMf ( forward-sexp ) to go to the end of the tag, or press CMk ( kill-sexp ) to kill it. The variable nxml-sexp-element-flag controls whether you go to the end of an open tag (by default) or to the end of a closing tag. I prefer the latter.

To remove those tags, first set nxml-sexp-element-flag to Mx customize-variable nxml-sexp-element-flag . Then find the tag you want to kill, move the dot to the opening < and press CMk . Wrap this all in a macro and repeat through the entire file until the search completes with an error.

+4
source

I believe that a more general approach would be to use another XML-oriented tool such as XSL (T) (don't be afraid, he doesn't like it), but it can come in handy if you need to work with XML (don't be afraid, nobody doesn't like it).

So here we go:

This is your XSL file (it copies all materials in the source XML file and replaces the nodes that you want to delete with blank lines). Finally, it prints it out, making it more attractive, and then if you replaced it using a regular expression.

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl" > <xsl:output method="xml" indent="yes"/> <xsl:strip-space elements="*"/> <!-- Copy everything --> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template> <!-- Find any node named nmo or blue and replace it with nothing --> <xsl:template match="nmo | blue"/> </xsl:stylesheet> 

This is my example that I used for testing:

 <?xml version="1.0" encoding="utf-8"?> <nodes> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> <nodes> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> </nodes> <nodes> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> <other node=""/> <nodes> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> <qwerty option="1"> <nmo>sdfsdf</nmo> <blue>sdfsdf</blue> </qwerty> </nodes> </nodes> </nodes> 

And this is the result that I get:

 <?xml version="1.0"?> <nodes> <qwerty option="1"/> <nodes> <qwerty option="1"/> </nodes> <nodes> <qwerty option="1"/> <other node=""/> <nodes> <qwerty option="1"/> <qwerty option="1"/> <qwerty option="1"/> </nodes> </nodes> </nodes> 

Notice how he also closed qwerty nodes.

The command line for this will look something like this:

 xsltproc ./remove-nodes.xsl ./nodes-to-be-removed.xml > result.xml 

You can run it from the Emacs shell or use any Emacs function to call / create a process with it, etc. man xsltproc for more information - this is a really basic use. It was installed on my Fedora, but I would suggest that, thanks to the widespread XML around the world, it will either already be installed on the Mac, or it must be installed in some way.

+3
source

All Articles