Parsing with SAX and Character Processing

I am analyzing a MathML expression with SAX (although the fact that it is MathML may not be completely relevant). Input line example

<math xmlns='http://www.w3.org/1998/Math/MathML'> <mrow> <mo>&lambda;</mo> </mrow> </math> 

In order for the SAX parser to accept this line, I expand it a bit:

 <?xml version="1.0"?> <!DOCTYPE doc_type [ <!ENTITY nbsp "&#160;"> <!ENTITY amp "&#38;"> ]> <body> <math xmlns='http://www.w3.org/1998/Math/MathML'> <mrow> <mo>&lambda;</mo> <mrow> </math> </body> 

Now when I run the SAX parser, I get an exception:

 [Fatal Error] :5:86: The entity "lambda" was referenced, but not declared. org.xml.sax.SAXParseException: The entity "lambda" was referenced, but not declared. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) 

However, I know how to fix it. I just add this line to the parsed line:

  <!ENTITY lambda "&#923;"> 

It gives me

 <?xml version="1.0"?> <!DOCTYPE doc_type [ <!ENTITY nbsp "&#160;"> <!ENTITY amp "&#38;"> <!ENTITY lambda "&#923;"> ]> <body> <math xmlns='http://www.w3.org/1998/Math/MathML'> <mrow> <mo>&lambda;</mo> <mrow> </math> </body> 

Now he understands perfectly, thanks.

However, the problem is that I cannot add an ENTITY declaration for all possible character objects that can be used in MathML (for example, "part", "notin" and "sum").

How to rewrite this string so that it can be parsed for any possible character object that can be included?

+4
source share
1 answer

Use a DOCTYPE declaration that references the MathML DTD :

 <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 3.0//EN" "http://www.w3.org/Math/DTD/mathml3/mathml3.dtd"> 

or a local copy of the same.

+5
source

All Articles