Tuesday, 4 February 2014

What is XPath?

How can we use XPath In Java PART - 1?



XPath :- XPath means XML Path Language. It's a query language for selecting specific nodes from an XML document and it is used to find information in an XML file. It works like sql query where we execute a query to retrieve specific data from the data base.

            XPath uses path expression for navigating around the tree, by selecting nodes by variety of creteria inside the XML documents. These expression looks like the traditional file system path expression. These expressions specifies a pattern that selects a set of XML nodes. Xpath is defined by W3C on 16 November 1999. XPath is designed to be used by XSLT, Xpointer and other XML parsing applications.


Reference taken from http://www.w3schools.com


XPath Data Model : As we know now that, Xpath views an XML document as a tree of nodes which is similar to Document Object Model(DOM).

Seven types of nodes are available in XPath data model
  •             Root node(It is only one per document)
  •             Element nodes
  •             Attribute nodes
  •             Namespace nodes
  •             Processing instruction nodes
  •             Comment nodes
  •             Text nodes

Available XPath Nodes description
  •  Root Node : - The root node contains the entire document and it is specified as a single slash(/).
  •  Element Nodes : - Each and every element in the XML document is defined as XPath element node. Like : employee, name, designation, java-experience, salary.
  •   Attribute Nodes : - An element node is the parent of all the attributes defined with that specific element in the XML source document. Here id is the attribute defined with the employee node.
  •   Namespace Nodes : - A Namespace can be declared such as xmlns:prefix=”URI”.
An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. For example : -
            <t:table xmlns:t="http://www.http://www.javatechtipssharedbygaurav.com//search//label//CoreJava">      

If it will be like above then even though it is the attribute in the XML document, it becomes as a Namespace node not an attribute node.

  •    Processing Instruction nodes : - A processing instruction node has two parts, first is the name which is returned by the names() function) and a string value. After the name everything is the String value which is inckuding the white space, but excluding the ?> that closes the processing instruction node.
Note:- The XML declaration is not a processing instruction. Therefore, there is no processing instruction node corresponding to the XML declaration.

  •  Comment nodes : - Every comment in the XML document becomes a Comment node except that any comment that occurs within the document type declaration.The string-value of comment is the content of the comment not including the opening <!-- or the closing -->.
          For example : <!--Test is a comment-->

  •   Text Nodes : - These nodes contain text from an element. Like  KUMAR GAURAV is the text value for the name node
           
           
XPath data types   

It contains a libarary collection with standard functions, mainly the Xpath API provides factory methods.XPath constant class is having few supportive contants which are available below:-

  • XpathConstants.NODESET :- It represents a set of nodes. It can be empty or contain many nodes.
  • XpathConstants.NODE :- It represents a single node. It can be empty or contain one or more child nodes.
  •  XpathConstants.BOOLEAN :- It represents the value true or false.
  •  XpathConstants.STRING :- It represents 0 or more characters. Returns the String value of the argument.
  • XpathConstants.NUMBER :- It represents a floating point number. In XPath and XSLT doesn't contain integer datatype that is the reason all numbers are treated as floating point numbers.

 XPath syntax:-

As now we know that XPath uses path expression for selecting nodes from a XML document. This expression is similar to Java regular expressions.

Expression
Description
nodename
Will help to select all nodes with the name “nodename”
/
Provide the selection from the root node
//
Useful for selecting nodes in the document from the current node that match the selection no matter where they are
.
This is helpful to select the current node.
..
This is helpful to select the parent of the current node
@
This will help to select the attributes.

A Sample XML

<employees>
    <!--Test is a comment-->
      <employee id="1234">
            <name>KUMAR GAURAV</name>
            <designation>SYSTEM ENGINEER</designation>
            <java-experience>7</java-experience>
            <salary>50000</salary>
      </employee>
      <employee id="2341">
            <name>KUMAR AADITYA</name>
            <designation>PROGRAM MANAGER</designation>
            <java-experience>15</java-experience>
            <salary>70000</salary>
      </employee>
</employees>


XPath Syntax example:-

XPath expression
Result
employees
Will select all nodes with the name “employees”
/employees
Will help to selects the root element employees
Note: If the path starts with a slash (/) it always represents an absolute path to an element!

/employees/employee
Will give the result as the selection of all employee elements that are children of employees
//employee
Will provide the result as the selection of all employee elements no matter where they are in the XML source document
employees//employee
Will provide the result as the selection of  all employee elements that are descendant of the employees element, no matter where they are under the employees element
//@id
Will provide the result as the selection of all attributes that are named as id.


Predicates

Predicates are useful to find a specific node or a node that contains a specific value. Predicates are always embedded or used in square brackets.

XPath Expression
Result
/employees/employee[1]
Will provide the result as the selection of the first employee element that is the child of the employees element.
/employees/employee[last()]
Will provide the result as the selection of the  last employee element that is the child of the employees element
/employees/employee[last()-1]
Will provide the result as the selection of the last but one employee element that is the child of the employees element
/employees/employee[position()<3]
Will provide the result as the selection of the first two employee elements that are children of the employees element
//employee[@id]
Will provide the result as the selection of all the employee elements that have an attribute named id
//employee[@id='1234']
Will provide the result as the selection of  all the employee elements that have an attribute named id with a value of  '1234'
/employees/employee[salary>50000]
Will provide the result as the selection of  all the employee elements of the employees element that have a salary element with a value greater than 50000


Extraction of Unknown nodes:-

XPath wildcards can be used to select unknown XML elements.

WILDCARD
DESCRIPTION
*
Will result as the matches of any element node
@*
Will result as the matches of any attribute node
node()
Will result as the matches of any node of any type

Example of WildCard syntax use

XPath Expression
Result
/employees/*
Will provide the result as the selection of  all the child elements of the employees element.
//*
Will provide the result as the selection of all the elements available in the XML source document.
//employee[@*]
Will provide the result as the selection of all the employee elements having any attribute.


Combining multiple XPath expressions

XPath expression
Result
//employee/name | //employee/designation
Will provide the result as the selection of all the name and designation elements of all the employee elements.
//name | //java-experience
Will provide the result as the selection of all the name and java-experienec elements in the XML source document
/employees/employee/name | //designation
Will provide the result as the selection of all the name elements of the employee element of the employees element AND all the designation elements in the XML source document

Note:- Next Post, we will see the Complete Java code for XPath expression implementations

No comments:

Post a Comment