Thursday 16 May 2013

StAX(Streaming API for XML)

Java with XML Streaming API

As we know XML is platform independent, so every emerging technology wants to deal with XML. We have multiple ways in JAVA through which we can process XML. Java supports JAXB, StAX, SAX and DOM for XML processing. With this post we will focus on StAX parsing API.

StAX stands for Streaming API for XML.  
  • It is a streaming pull-parsing java API to read and write XML documents.  
  • StaX parser is fast, easy to use and memory efficient. 
  • The primary goal of the StAX API is to give “parsing control to the developer by exposing a simple iterator based API. 
  • StAX was created to address limitations in the two most prevalent parsing APIs, SAX and DOM. StAX is not as powerful or flexible as TrAX(Transformation API for XML) or JDOM.



StAX vs SAX(Simple API for XML)

StAX is a bidirectional API, it means we can perform both operations like read and write XML documents but SAX is read only, so another API is needed if we want to write XML documents.  StAX is based on pull-parse API where as SAX is based on push-parse API.


StAX vs DOM(Document Object Model)

  • The DOM parser involves creating in-memory tree objects representing an entire XML document  Once loaded in memory, DOM trees can be navigated freely and parsed randomly, so providing maximum flexibility for developers. But the cost of flexibility is large memory and needs more time of processor.
  • Streaming refers to a programming model in which XML documents are transmitted and parsed serially at application runtime. Stream-based parsers can start generating output immediately, and infoset elements can be discarded and garbage collected immediately after they are used. It requires less memory, reduce processor requirement and provide good performance. Streaming models for XML processing are particularly useful when application has strict memory limitations.

XML parser API Feature details



Feature
SAX
DOM
StAX
TrAX
API Type
Push, streaming
In memory tree
Pull, streaming
XSLT Rule
Ease of Use
Medium
High
High
Medium
XPath Capability
No
Yes
No
Yes
CPU and Memory Efficiency
Good
Varies
Good
Varies
Forward Only
Yes
No
Yes
No
Read XML
Yes
Yes
Yes
Yes
Write XML
No
Yes
Yes
Yes
Create, Read, Update, Delete
No
Yes
No
No



Note:- TrAX stands for Transformations API for XML.It is included in the later version of JAXP(Java extension API for XML parsing). JAXP adds two packages one for XML parsing and other for XML transformations(TrAX).

For processing any XML document, 3 components are required.

  • XML document.

  • Parser API

  • Client code


We have below two approaches for XML document parsing

  • pull-parsing model

  • push-parsing model

What is pull-parsing model?

Answer:- In case of pull-parsing, Client application have control over parsing the XML documents by pulling the events from the parser and parsing happens according to client requirements. In pull parsing model, application client code invokes the parsing API's methods to read the data, then parser reads the XML document, writes the required data and returns it. But in the pull model, the client is “pushed” with data, irrespective of whether he actually needs it or not. pull-parsing libraries are smaller as compare to push-parsing libraries.

What is push-parsing model?

Answer:- In case of push parsing, parser API reads the XML document and whenever an event is generated, it pushes the respective data to the client application and continues. SAX is a push-parse API model. When the SAX parser encounters the beginning of an XML element, it calls the startElement on our handler object. It “pushes” the information from the XML into our object. Thus named as push-parsing model.



StAX core API is separated into two categories

  •  Cursor API and

  •  Event Iterator API

StAX also offers an API for writing XML documents. 

  • It offers above two APIs: a low-level, cursor-based API (XMLStreamWriter), and a higher-level, event-based API (XMLEventWriter). 
  • The cursor-based API is useful in data binding scenarios (for example, creating a document from application data), the event-based API is generally useful in pipelining scenarios where a new document is constructed on the basis of the data provided by input documents.

Cursor API: - 

This API follows the similar fashion like JDBC resultset while traversing through the XML documents. It always moves forward and once forwarded then never come back. 
There are two main interfaces in Cursor API i.e. XMLStreamReader and XMLStreamWriter.

The XMLStreamReader interface in the StAX cursor API helps to read XML documents in a forward direction only. The following methods are available for pulling data from the stream or for skipping unwanted events: -

  • Get the value of an attribute specified in XML document
  • Read XML content of the document
  • Determine whether an element has content or it is empty
  • Get indexed access to a collection of attributes
  • Get indexed access to a collection of namespaces
  • Get the name of the current event (if applicable)
  • Get the content of the current event (if applicable)
The XMLStreamReader.next method loads the properties of the next event in the stream. We can then access those properties by using the XMLStreamReader.getLocalName and XMLStreamReader.getText methods.

Implementation of XMLStreamReader for Reading a XML file

XMLStreamReaderDemo.java

package com.gaurav.staxparsers;

import java.io.FileReader;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

/*  CONSTANTS SPECIFIED FOR PARSING THE XML DOCUMENT.
 XMLStreamConstants.START_ELEMENT
 XMLStreamConstants.END_ELEMENT
 XMLStreamConstants.PROCESSING_INSTRUCTION
 XMLStreamConstants.CHARACTERS
 XMLStreamConstants.COMMENT
 XMLStreamConstants.SPACE
 XMLStreamConstants.START_DOCUMENT
 XMLStreamConstants.END_DOCUMENT
 XMLStreamConstants.ENTITY_ REFERENCE
 XMLStreamConstants.ATTRIBUTE
 XMLStreamConstants.DTD
 XMLStreamConstants.CDATA
 XMLStreamConstants.NAMESPACE
 XMLStreamConstants.NOTATION_DECLARATION
 XMLStreamConstants.ENTITY_DECLARATION

 */

public class XMLStreamReaderDemo {

    public static void main(String[] args) throws Exception {

        XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
        FileReader fileReader = new FileReader("C://employeeDetails.xml");
        XMLStreamReader xmlStreamReader = xmlInputFactory
                .createXMLStreamReader(fileReader);
        try {

            int eventType = xmlStreamReader.getEventType();

            while (true) {
                switch (eventType) {

                case XMLStreamConstants.START_DOCUMENT:
                    System.out.println("DOCUMENT READING STARTED");
                    System.out.println("******************************");
                    break;

                case XMLStreamConstants.START_ELEMENT:
                    System.out.println("Start Tag : "
                            + xmlStreamReader.getName());
                    for (int i = 0, n = xmlStreamReader.getAttributeCount(); i < n; ++i)
                        System.out.println("Attribute : "
                                + xmlStreamReader.getAttributeName(i) + "="
                                + xmlStreamReader.getAttributeValue(i));
                    break;

                case XMLStreamConstants.CHARACTERS:
                    if (xmlStreamReader.isWhiteSpace())
                        break;

                    System.out.println("Value : " + xmlStreamReader.getText());
                    break;

                case XMLStreamConstants.END_ELEMENT:
                    System.out.println("End Tag :" + xmlStreamReader.getName());
                    break;

                case XMLStreamConstants.END_DOCUMENT:
                    System.out.println("******************************");
                    System.out.println("DOCUMENT READING COMPLETED.");
                    break;
                }

                if (!xmlStreamReader.hasNext())
                    break;

                eventType = xmlStreamReader.next();
            }
        } catch (Exception e) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, e);
        } finally {
            xmlStreamReader.close();
        }
    }
}


Note:- employeeDetails.xml file content is as below:-

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<employees>
    <employee>
        <id>1</id>
        <name>GAURAV</name>
        <email>gaurav@yahoo.co.in</email>
    </employee>
    <recCount>1</recCount>
</employees>


Result:-

DOCUMENT READING STARTED
******************************
Start Tag : employees
Start Tag : employee
Start Tag : id
Value : 1
End Tag :id
Start Tag : name
Value : GAURAV
End Tag :name
Start Tag : email
Value : gaurav@yahoo.co.in
End Tag :email
End Tag :employee
Start Tag : recCount
Value : 1
End Tag :recCount
End Tag :employees
******************************
DOCUMENT READING COMPLETED.


Implementation of XMLStreamWriter for Writing a XML file

XMLStreamWriterDemo.java

package com.gaurav.staxparsers;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public class XMLStreamWriterDemo {

    public static void main(String[] args) throws XMLStreamException {

        XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
        XMLStreamWriter xmlStreamWriter = null;

        try {
           
               xmlStreamWriter = xmlOutputFactory.createXMLStreamWriter(new
               FileOutputStream( "C://GeneratedXMLUsingXMLStreamWriter.xml"),"UTF-8");
             
               xmlStreamWriter.writeStartDocument("UTF-8","1.0");
              
               /**If We want to remove encoding="UTF-8" from the XML document the we can
                 * Use the below commented lines and we can comment the above two lines of code.
                 */

            /* xmlStreamWriter = xmlOutputFactory
                    .createXMLStreamWriter(new FileWriter(
                            "C://GeneratedXMLUsingXMLStreamWriter.xml"));

            xmlStreamWriter.writeStartDocument(); */
              
            xmlStreamWriter.writeStartElement("employees");
            xmlStreamWriter.writeStartElement("employee");
            xmlStreamWriter.writeAttribute("id", "1234");
            xmlStreamWriter.writeAttribute("name", "KUMAR GAURAV");
            xmlStreamWriter.writeAttribute("designation", "SOFTWARE ENGINEER");
            xmlStreamWriter.writeEndElement();
            xmlStreamWriter.writeEndElement();
            xmlStreamWriter.writeEndDocument();
           
            Logger.getLogger(XMLStreamWriterDemo.class.getName())
                    .info("Contents are written successfully in the specified XML file using XMLStreamWriter");
           
            xmlStreamWriter.flush();
           
        } catch (XMLStreamException xmlse) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, xmlse);
        } catch (IOException ioe) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, ioe);
        } finally {
            xmlStreamWriter.close();
        }

    }
}

Result:- content of GeneratedXMLUsingXMLStreamWriter.xml

<?xml version="1.0" encoding="UTF-8"?>
<employees><employee id="1234" name="KUMAR GAURAV" designation="SOFTWARE ENGINEER">
</employee>
</employees>


Event Iterator API:- 

  • This API is having two main interfaces which is XMLEventReader and XMLEventWriter
  • Event Iterator API parses the XML document and it returns the event objects. 
  • The events are for element, attributes, text, values, comment etc. 
  • This is similar to the java collection iterator. XMLEvent is the basic interface and XMLEvent. nextEvent() is the key method which returns the next event in XML document.
  • This is similar to next() method available in iterator interface of collection. 
  • Available Event types are START_DOCUMENT, START_ELEMENT, END_ELEMENT, CHARACTERS, PROCESSING_INSTRUCTION, COMMENT, SPACE, END_DOCUMENT, ENTITY_REFERENCE, ATTRIBUTE, DTD, CDATA, NAMESPACE, NOTATION_DECLARATION, ENTITY_DECLARATION.


Implementation of XMLEventReader for Reading a XML file

XMLEventReaderDemo.java

package com.gaurav.staxparsers;

import java.io.FileReader;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.XMLEvent;

public class XMLEventReaderDemo {
    public static void main(String args[]) {

        XMLEventReader xmlEventReader = null;

        try {

            XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
            FileReader fileReader = new FileReader("C://employeeDetails.xml");
            xmlEventReader = xmlInputFactory.createXMLEventReader(fileReader);

            while (xmlEventReader.hasNext()) {
                XMLEvent xmlEvent = xmlEventReader.nextEvent();
                if (xmlEvent.isCharacters()
                        && ((Characters) xmlEvent).isWhiteSpace())
                    continue;

                System.out.println(xmlEvent);
            }
        } catch (Exception exception) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, exception);
        } finally {
            try {
                xmlEventReader.close();
            } catch (XMLStreamException e) {
                Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                        Level.SEVERE, null, e);
            }
        }
    }
}

Result:-

<?xml version="1.0" encoding='null' standalone='yes'?>
<employees>
<employee>
<id>
1
</id>
<name>
GAURAV
</name>
<email>
gaurav@yahoo.co.in
</email>
</employee>
<recCount>
1
</recCount>
</employees>
ENDDOCUMENT


Implementation of XMLEventWriter for Writing a XML file

XMLEventWriterDemo.java

package com.gaurav.staxparsers;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;

public class XMLEventWriterDemo {

    public static void main(String args[]) {
       
        XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
        XMLEventFactory xmlEventFactory = XMLEventFactory.newInstance();
        XMLEventWriter xmlEventWriter = null;
       
        try {
           
            xmlEventWriter = xmlOutputFactory.createXMLEventWriter(new
             FileOutputStream( "C://GeneratedXMLUsingXMLEventWriter.xml"),"UTF-8");
           
            XMLEvent event = xmlEventFactory.createStartDocument("UTF-8","1.0");
           
            /**If We want to remove encoding="UTF-8" from the XML document the we can
             * Use the below commented lines and we can comment the above two lines of code.
             */
           
            /*xmlEventWriter = factory.createXMLEventWriter(new FileWriter(
                    "C://GeneratedXMLUsingXMLEventWriter.xml"));

            XMLEvent event = eventFactory.createStartDocument();*/
            xmlEventWriter.add(event);

            event = xmlEventFactory.createStartElement("employees",
                    "http://www.javatechtipssharedbygaurav.com", "employee");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createNamespace("gaurav",
                    "http://www.javatechtipssharedbygaurav.com");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createAttribute("id", "1234");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createAttribute("name", "KUMAR GAURAV");
            xmlEventWriter.add(event);
           
            event = xmlEventFactory.createStartElement("employees",
                    "http://www.javatechtipssharedbygaurav.com", "company");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createAttribute("software", "java");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createEndElement("employees",
                    "http://www.javatechtipssharedbygaurav.com", "company");
            xmlEventWriter.add(event);

            event = xmlEventFactory.createEndElement("employees",
                    "http://www.javatechtipssharedbygaurav.com", "employee");
            xmlEventWriter.add(event);

            Logger.getLogger(XMLStreamWriterDemo.class.getName())
                    .info("Contents are written successfully in the specified XML file using XMLEventWriter");

            xmlEventWriter.flush();

        } catch (XMLStreamException xmle) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, xmle);
        } catch (IOException ioe) {
            Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                    Level.SEVERE, null, ioe);
        } finally {
            try {
                xmlEventWriter.close();
            } catch (XMLStreamException xmle) {
                Logger.getLogger(XMLStreamWriterDemo.class.getName()).log(
                        Level.SEVERE, null, xmle);
            }
        }
    }
}

Result:- content of GeneratedXMLUsingXMLEventWriter.xml

<?xml version="1.0" encoding="UTF-8"?>
<employees:employee xmlns:gaurav="http://www.javatechtipssharedbygaurav.com" id="1234" name="KUMAR GAURAV">
<employees:company software="java">
</employees:company>
</employees:employee>


Advantage of pull-parsing model:-


  • In pull-parsing model,  parsing happens according to client requirements.
  • pull-parsing libraries are smaller.
  • Application client code is also smaller which will interacts with parser API and even smaller for more complex documents.
  • Filtering of elements is simpler as the client aware of that when a specific element comes in, for filtering of elements client has time to take decisions.
  • Pull clients can read multiple documents at one time with a single thread.
  • A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored, and it can support XML views of non-XML data.


reference taken from oracle and IBM

No comments:

Post a Comment