I am trying to parse several standard XML documents that use the MARCXML schema from various sources.
Here are the first few lines of an example XML file to process ...
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> <marc:record> <marc:leader>00925njm 22002777a 4500</marc:leader>
and one without namespace prefixes ...
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>01142cam 2200301 a 4500</leader>
Key points: in order to get XPaths for further solution in the program, I need to go through a regular procedure to add namespaces to NameTable (which does not add them by default). It seems unnecessary to me.
Regex xmlNamespace = new Regex("xmlns:(?<PREFIX>[^=]+)=\"(?<URI>[^\"]+)\"", RegexOptions.Compiled); XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(xmlRecord); XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable); MatchCollection namespaces = xmlNamespace.Matches(xmlRecord); foreach (Match n in namespaces) { nsMgr.AddNamespace(n.Groups["PREFIX"].ToString(), n.Groups["URI"].ToString()); }
The XPath call looks something like this:
XmlNode leaderNode = xmlDoc.SelectSingleNode(".//" + LeaderNode, nsMgr);
Where LeaderNode is a custom value and will be equal to "marc:leader" in the first example and "leader" in the second example.
Is there a better, more efficient way to do this? Note. Suggestions for solving this problem using LINQ are welcome, but basically I would like to know how to solve this using XmlDocument .
EDIT: I took the advice of GreyWizardx and now I have the following code ...
if (LeaderNode.Contains(":")) { string prefix = LeaderNode.Substring(0, LeaderNode.IndexOf(':')); XmlNode root = xmlDoc.FirstChild; string nameSpace = root.GetNamespaceOfPrefix(prefix); nsMgr.AddNamespace(prefix, nameSpace); }
Now there is no more dependence on Regex!