Reading XML Files with the XmlTextReader Class

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In the previous article, I presented the XmlTextWriter class as a noncached, forward-only means of writing XML data. In this article, you’ll look at the reciprocal class for reading XML data—the XmlTextReader class. The XmlTextReader class is also a sequential, forward-only class, meaning that you cannot dynamically search for any node—you must read every node from the beginning of the file until the end (or until you’ve reached the desired node). Therefore, this class is most useful in scenarios where you’re dealing with small files or the application requires the reading of the entire file. Also, note that the XmlTextReader class does not provide any sort of XML validation; this means that the class assumes that the XML being read is valid. In this week’s article, I’ll illustrate the following aspects of using the XmlTextReader class:

Reading and parsing XML nodes
Retrieving names and values

Reading and Parsing XML Nodes

As mentioned, the XmlTextReader does not provide a means of randomly reading a specific XML node. As a result, the application reads each node of an XML document, determining along the way whether the current node is what is needed. This is typically accomplishd by constructing an XmlTextReader object and then iteratively calling—within a loop—the XmlTextReader::Read method until that method returns false. The code will generally look like the following:

// skeleton code to enumerate an XML file's nodestry{ XmlTextReader* xmlreader = new XmlTextReader(fileName); while (xmlreader->Read()) { // parse based on NodeType }}catch (Exception* ex){}__finally{}

As each call to the Read method will read the next node in the XML file, your code must be able to distinguish between node types. This includes everything from the XML file’s opening declaration node to element and text nodes and even includes special nodes for comments and whitespace. The XmlTextReader::NodeType property is an enum of type XmlNodeType that indicates the exact type of the currently read node. Table 1 lists the different types defined by the XmlNodeType type.

Table 1 has been abbreviated to show only those XmlNodeType values that are currently used by the NodeType property.

Table 1: XmlNodeType Enum Values

XmlNodeType Value	Description
Attribute	An attribute defined within an element
CDATA	Identifies a block of data that will not parsed by the XML reader
Comment	A plain-text comment
DocumentType	Document type declaration
Element	Represents the beginning of an element
EndElement	The end element tag—for example, </author>
EntityReference	An entity reference
None	The state the reader is in before Read has been called
ProcessingInstruction	An XML processing instruction
SignificantWhitespace	White space between markup tags in a mixed content model
Text	The text value of an element
Whitespace	White space between tags
XmlDeclaration	The XML declaration node that starts the file/document

Now that you see how to discern node types, look at a sample XML file and a code snippet that will read and output to the console all found nodes within that file. This will illustrate what the XmlTextReader returns to you with each Read and what you should look for in your code as you enumerate through the file’s nodes. Here first is a simple XML file:

<?xml version="1.0" encoding="us-ascii"?><!-- Test comment --><emails> <email language="EN" encrypted="no"> <from>Tom@ArcherConsultingGroup.com</from> <to>BillG@microsoft.com</to> <copies> <copy>Krista@ArcherConsultingGroup.com</copy> </copies> <subject>Buyout of Microsoft</subject> <message>Dear Bill...</message> </email></emails>

Now for the code. The following code snippet opens an XML file and—within a while loop—enumerates all nodes found by the XmlTextReader. As each node is read, its NodeType, Name, andValue properties are output to the console:

// Loop to enumerate and output all nodes of an XML fileString* format = S"XmlNodeType::{0,-12}{1,-10}{2}";XmlTextReader* xmlreader = new XmlTextReader(fileName);while (xmlreader->Read()){ String* out = String::Format(format, __box(xmlreader->NodeType), xmlreader->Name, xmlreader->Value); Console::WriteLine(out);}

Looking at the file and code listings, you should easily be able to see how each of the lines in Figure 1 were formed.

Figure 1: Enumerating all the nodes of an XML file

Retrieving Names and Values

Looking at Figure 1, you can see that, to retrieve the value for a given element, you need to look programatically for an node of type XmlNodeType::Text. However, here’s the problem. Once you’ve reached that node, you no longer know the element name for which that text applies becausr that part was read the previous time through the loop. To illustrate what I mean, locate the from element in Figure 1. During that iteration of the loop, what you know is that the NodeType value is XmlNodeType::Element and that its Name property is “from”. However, you won’t know its value until the next time through the loop when you read the next node, which is the XmlNodeType::Text node for that element. At that point, you can then use the reader’s Value property to get the element’s text value.

Therefore, there are two ways to read the names and values of the elements you’re code needs. One way is to keep track of the current element as you’re enumerating the file. Then, when you reach a text node, you’ll know for which element the text applies. Here’s a code snippet to illustrate how to do that:

// Loop to read the names and values of all elementsString* format = S"{0,-20}{1}";String* currentElement;XmlTextReader* xmlreader = new XmlTextReader(fileName);while (xmlreader->Read()){ if (xmlreader->NodeType == XmlNodeType::Element) { currentElement = xmlreader->Name; } else if (xmlreader->NodeType == XmlNodeType::Text) { String* out = String::Format(format, currentElement, xmlreader->Value); Console::WriteLine(out); }}

Running this code against the test XML file shown earlier yields the results shown in Figure 2 where only the elements are displayed and each element name is properly associated with its value.

Figure 2: Using two reads to get each element’s name and value

Keeping in mind that the XmlTextReader is a forward-only reader, there are also methods to tell it what to read next. For example, the XmlTextReader::ReadString method will read the entire contents of the current element or text node into a String object. Here’s a loop that illustrates using the ReadString method:

// Loop to read each element's string valueString* format = S"{0,-20}{1}";XmlTextReader* xmlreader = new XmlTextReader(fileName);while (xmlreader->Read()){ if (xmlreader->NodeType == XmlNodeType::Element) { String* out = String::Format(format, xmlreader->Name, xmlreader->ReadString()); Console::WriteLine(out); }}

While the ReadString method would seem to be much cleaner than the first approach (of using two distinct reads to obtain the element’s name and value), take a look at Figure 3.

Figure 3: Using the ReadString method

As you can see, with this latest modification you now have several blank nodes. This is because the code is no longer looking for an element node followed by a text node—which would indicate an element with text data. Now, the code is simply stating give me the entire string representing each element. In some cases—such as the <emails> node, that node doesn’t contain data. Therefore, you need to be careful in knowing what your data is before calling methods such as ReadString.

In most cases where you’re looking for a the values of an element, you know the name of that element. Therefore, you would simply insert conditional logic into your code to only call the ReadString method for the desired elements:

if (0 == String::Compare(xmlreader->Name, S"subject", true))...

Looking Ahead

In this article, you learned how to enumerate XML files using the XmlTextReader class. You also saw code snippets detailing how to parse for specific node types and two different methods for reading the names and values of element nodes. In the next article, I’ll cover three more important issues regarding the XmlTextReader class: skipping to content, ignoring whitespace, and reading attributes.

Download the Code

To download the accompanying source code for the demo, click here.

About the Author

Tom Archer owns his own training company, Archer Consulting Group, which specializes in educating and mentoring .NET programmers and providing project management consulting. If you would like to find out how the Archer Consulting Group can help you reduce development costs, get your software to market faster, and increase product revenue, contact Tom through his Web site.

Reading XML Files with the XmlTextReader Class | Developer.com (2024)

Reading and Parsing XML Nodes

Retrieving Names and Values

Looking Ahead

Download the Code

About the Author

References