XML parsing via command line

Question

XML parsing via command line

So, I have an XML file that I want to parse using a BASH script, etc. using xmlstarlet (or an alternative if people can give me an example).

The basic structure is as follows:

<character>  <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning>     <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading>     <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup>     </reading_meaning> </character>

There are other fields there, and the values and indications can vary in number. Basically, I would like to get all the readings, values, number of strokes, etc. And create an HTML table using BASH.

It is also a large file with many characters to look for. So I would like to do this with a script that takes $ 1 and uses this to search for values based on a tag. Ideally, this would be:

 kanjilookup.sh 恵

Then create an html table based on the content.

Thoughts? (I would also like to use another program like xpath)

+4

xml bash xpath xmlstarlet

user798080 Feb 17 '13 at 5:57

source share

2 answers

There is no longer any reason to use XSLT with XQuery now, XQuery is much nicer.

eg. with my XQuery interpreter , you can run it directly without an additional file as follows:

 xidel --printed-node-format xml characters.xml -e "(character:='恵')[2]" -e - <<<'xquery version "1.0"; (<title>{$character}</title>, for $char in //character[literal eq $character] return <table> <tbody> <caption>{$character}</caption> <tr> <td>Stroke count</td> <td>{$char/misc/stroke_count/text()}</td> </tr> { for $reading in $char//rmgroup/reading return <tr> <td>Reading ({$reading/@r_type/data(.)})</td> <td>{$reading/text()}</td> </tr> } { for $meaning in $char//rmgroup/meaning return <tr> <td>Meaning</td> <td>{$meaning/text()}</td> </tr> } </tbody> </table> ) '

Creates a similar table as xslt response. (but you need to add <?xml version="1.0" encoding="utf-8"?> to the .xml characters placed there)

0

Benibela Feb 17 '13 at 12:13

source share

Eero helenius · Accepted Answer · 2013-02-17T10:19:52+0000

As suggested by @thatotherguy, you probably want to do this using something like XSLT instead of Bash. You can parse XML with Bash , but it will probably be quite complicated.

Following @thatotherguy's suggestion, you can create an XSLT stylesheet that looks something like this:

 <!-- kanjilookup.xsl --> <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:param name="character"/> <xsl:output method="html" indent="yes"/> <xsl:strip-space elements="*"/> <!-- From https://stackoverflow.com/questions/9611569/xsl-how-do-you-capitalize-first-letter --> <xsl:variable name="vLower" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:variable name="vUpper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:template name="capitalize"> <xsl:param name="string"/> <xsl:value-of select= "concat(translate(substring( $string, 1, 1), $vLower, $vUpper), substring($string, 2) ) "/> </xsl:template> <xsl:template match="/"> <xsl:if test="string-length($character) = 0 or not(//literal[. = $character])"> <xsl:message terminate="yes">ERR: No input character given.</xsl:message> </xsl:if> <xsl:apply-templates select="characters/character[literal[. = $character]]"/> </xsl:template> <xsl:template match="character"> <xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html> </xsl:text> <html> <head/> <body> <table> <tbody> <xsl:apply-templates/> </tbody> </table> </body> </html> </xsl:template> <xsl:template match="literal"> <caption> <xsl:value-of select="."/> </caption> </xsl:template> <xsl:template match="stroke_count"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="translate(local-name(), '_', ' ')"/> </xsl:call-template> </td> <td><xsl:value-of select="."/></td> </tr> </xsl:template> <xsl:template match="misc | reading_meaning | rmgroup"> <xsl:apply-templates/> </xsl:template> <xsl:template match="reading | meaning"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="local-name()"/> </xsl:call-template> <xsl:apply-templates select="@r_type"/> </td> <td> <xsl:value-of select="."/> </td> </tr> </xsl:template> <xsl:template match="@r_type"> <xsl:value-of select="concat(' ', '(', ., ')')"/> </xsl:template> </xsl:stylesheet>

Let's say you have a file called characters.xml :

 <characters> <character> <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning> <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading> <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup> </reading_meaning> </character> </characters>

You can run kanjilookup.xsl on it using XMLStarlet as follows:

 xml tr kanjilookup.xsl -s character=恵 characters.xml

This will create an HTML table that looks like this (after a pretty-printed one):

 <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <table> <tbody> <caption>恵</caption> <tr> <td>Stroke count</td> <td>10</td> </tr> <tr> <td>Reading (ja_on)</td> <td>ケイ</td> </tr> <tr> <td>Reading (ja_on)</td> <td>エ</td> </tr> <tr> <td>Reading (ja_kun)</td> <td>めぐ.む</td> </tr> <tr> <td>Reading (ja_kun)</td> <td>めぐ.み</td> </tr> <tr> <td>Meaning</td> <td>favor</td> </tr> <tr> <td>Meaning</td> <td>blessing</td> </tr> <tr> <td>Meaning</td> <td>grace</td> </tr> <tr> <td>Meaning</td> <td>kindness</td> </tr> </tbody> </table> </body> </html>

Of course, you will need to modify the XSLT stylesheets to suit your needs.

XML parsing via command line

More articles: