Tuesday, April 29, 2008

GSoC 2008 is On

The all important initial phase of the Google Summer of Code 2008 (GSoC) came to an end last week. The selected project proposals were published on the GsoC website on 21st April. I too applied for this year's contest and my project proposal to implement type alternatives for Apache Xerces2-J was among the qualified proposals. The Apache Software Foundation has received around 30 slots this year and one of them is allocated to my proposal (yipeee!!!). I consider this as an great opportunity to learn and master XML and XML schema while contributing to a world renowned open source project.

Apache Xerces2-J is a high performance, standard compliant XML parser. It currently supports a number of XML related standards like XML 1.0, XML 1.1, DOM, SAX, JAXP and XML schema 1.0. A variety of open source and proprietary software projects make use of Apache Xerces2-J as the core XML parsing and processing mechanism. The reason for this immense popularity of Apache Xerces2-J is probably the high number of standards it supports and the way it supports them. Nowadays Apache Xerces2-J is even distributed along with popular Sun's JDK.

Xerces development team is currently involved in getting Xerces2-J to support the XML schema 1.1 standard which is the latest XML schema specification. XML schema 1.1 specification like its predecessor is comprised of three main parts namely the primer, structures and data types. Type alternatives is a feature that falls under the XML schema 1.1: structures spec. This is one of the most significant additions to the XML schema standard and it provides a well organized mechanism to implement conditional type assignment which has been in the XML schema feature wish lists for years.

With type alternatives XML elements can be assigned types based on one or more conditions (thus the name conditional type assignment). The conditions are specified as Xpath 2.0 expressions and the relationship between a condition and the corresponding type can be expressed using the 'alternative' element as in the following example.

The Xpath expression which specifies the condition is expressed as the value of the 'test' attribute and the corresponding type is expressed as the value of the 'type' attribute. Alternatively one could use 'simpleType' or 'complexType' child elements to specify the type instead of using the 'type' attribute. A complete example illustrating XML schema type alternatives would be as follows.
Here we have defined an element named 'value' which is of declared type 'valueType'. But based on the actual value of the 'kind' attribute the 'value' elements can have a different governing type. When XML schema validations are performed the elements will be validated against their governing types.

Type alternatives can add lot more flexibility to the way XML schema documents are used and it gives more freedom and power to the XML schema author. With type alternatives the XML elements having the same name can be of different governing types. In the above example different 'value' elements can take one of three types (integer, short or byte). Also the same element can have a governing type that is different of the declared type. This wouldn't have been possible if not for the type alternatives.

All in all type alternatives is a very interesting and useful feature for XML schema authors. That makes it very important for Xerces2-J to support it. I have worked with XML and XML parsers like DOM and SAX a lot in the past. I have used Apache Xerces2-J in a number of occasions too. But to be honest I haven't really worked with XML schema much. So this really is a big learning opportunity to me. I have been studying the XML schema specs for the last few weeks and I have already collected a whole bunch of stuff on XML schema to my knowledge base.

My heart is itching to start with the coding part but I know that there are lot of things to be studied, analyzed and clarified before I get to that point. Wish me luck!!

No comments: