How to express series of generic data points in XML and read them without a pain

Foreword

It’s often desirable to provide generic data point series data in some XML format. XML gives the ability to decorate the series with attributes, nest them in complex data objects, mix different data series/data objects in one data file and load them with concise LINQ to XML code.

Generic data point is a structure with just two required {X, Y} properties expressing the point position in 2D space. Each point dimension has its own "base" type, e.g. numeric, DateTime, etc, that’s why the term "generic" is applied. The data point series contain one or more data points with matching dimensions base types.

Generic Data Points Series XML format

Data point series XML format has to be developed in a way providing for ability of validation with XML schema.

First, we have to define the root element. Suppose it is called Items. For the sake of safety we’ll require it defines the default XML namespace urn:PointSeries-schema. I.e. the root element looks like that:

<?xml version="1.0" encoding="utf-8"?>
<Items xmlns="urn:PointSeries-schema">
</Items>

 

The root element contains unrestricted number of data point series. First try is to define point series elements as follows:

<Items xmlns="urn:PointSeries-schema">
  <Points …>
  </Points>
  <Points …>
  </Points>
</Items>

 

That won’t work because different point series elements could contain points of different base types and, so, the point series elements themselves could be of different types. XML schema rules don’t allow elements of different types have the same name in the same scope. Hence we must assign different names to point series elements of different types.

It’s decided to name point series elements according to the following patterns:

  • <Points.BaseType …> if both data series dimensions have the same base type. E.g. <Points.Double …>.
  • <Points.XBaseType.YBaseType …> if data series dimensions have the different base types. E.g. <Points.DateTime.Int …>.

BaseType, XBaseType and YBaseType point series element name parts are called collectively "type strings". It’s necessary to draw an agreement on how to define these type strings, establish the mapping between the type strings, XSD-defined data types and .NET data types.

Table 1. XSD type to .NET type to type string mapping
XSD Type

Description

Examples

Type string

.NET type

xsd:int

An integer that can be represented as a four-byte, two’s complement number

-2147483648, 2147483645,…, -3, -2, -1, 0, 1, 2, 3, …

Int

System.Int32

xsd:double

IEEE 754 64-bit floating-point number

-INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42

Double

System.Double

xsd:dateTime

A particular moment in Coordinated Universal Time, up to an arbitrarily small fraction of a second

1999-05-31T13:20:00.000-05:00, 1999-05-31T18:20:00.000Z, 1999-05-31T13:20:00.000, 1999-05-31T13:20:00.000-05:32

DateTime

System.DateTime

xsd:date

A specific day in history

0044-03-15, 0001-01-01, 1969-06-27, 2000-10-31, 2001-11-17

Date

System.DateTime

xsd:gMonth

A month in no particular year

–01–, –02–, –03–,…, –09–, –10–, –11–, –12–

Month

System.Int32

 

This table contains the partial list of XSD simple types. You can extend it by including other XSD types and, moreover, any types you derived from XSD simple types with XML schema derivation rules.

According the mapping above, for example, the <Points.Double …> point series XML element should contain points of xsd:double type for both x and y dimensions and these points will be loaded as points with System.Double x,y properties.

Point element itself is something like <Point x="2008-01-01" y="-20"/> with x and y required attributes.

Below is the excerpt from the example input XML data file:

<?xml version="1.0" encoding="utf-8"?>
<Items xmlns="urn:PointSeries-schema">
    <Points.Int.Double YName="y=x^2">
        <Point x="0" y="0"/>
        <Point x="1" y="0.01"/>
        …
    </Points.Int.Double>
    <Points.Date.Int YName="temperature" XName="Date">
        <Point x="2008-01-01" y="-20"/>
        <Point x="2008-02-01" y="-25"/>
        …
    </Points.Date.Int>
    <Points.Month.Double YName="2008 year month temperatures" XName="Month">
        <Point x="--01--" y="-20.8"/>
        <Point x="--02--" y="-25.2"/>
        …
    </Points.Month.Double>
    …
</Items>

Note that point series elements are decorated with optional YName and XName attributes intending to represent x and y dimensions labels.

Generic data point series data XML format is defined by the XML schema which excerpt follows:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- Root element -->
  <xs:element name="Items" type="itemsType"/>
  
  <!-- Root element type -->
  <xs:complexType name="itemsType">
    <xs:choice maxOccurs="unbounded">
      <xs:element name="Points.Int" type="pointsIntIntType"/>
      <xs:element name="Points.Int.DateTime" type="pointsIntDttmType"/>
      …
      <xs:element name="Points.Double" type="pointsDblDblType"/>
      <xs:element name="Points.Double.Int" type="pointsDblIntType"/>
      …
    </xs:choice>
  </xs:complexType>
  
  <!-- Point Series Type attributes -->
  <xs:attributeGroup name="pointSetAttributes">
    <xs:attribute name="YName" type="xs:string" use="optional" />
    <xs:attribute name="XName" type="xs:string" use="optional" />
  </xs:attributeGroup>

  <!-- Point Series Types -->
  <xs:complexType name="pointsIntIntType">
      <xs:sequence>
        <xs:element minOccurs="1" maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" type="xs:int" use="required" />
            <xs:attribute name="y" type="xs:int" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  <xs:complexType name="pointsIntDttmType">
      <xs:sequence>
        <xs:element minOccurs="1" maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" type="xs:int" use="required" />
            <xs:attribute name="y" type="xs:dateTime" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  …  
  <xs:complexType name="pointsDblIntType">
      <xs:sequence>
        <xs:element minOccurs="1" maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" type="xs:double" use="required" />
            <xs:attribute name="y" type="xs:int" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  <xs:complexType name="pointsDblDblType">
      <xs:sequence>
        <xs:element minOccurs="1" maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" type="xs:double" use="required" />
            <xs:attribute name="y" type="xs:double" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  …
</xs:schema>

 

That schema defines <Items …> root element whose expected contents is defined by the XSD choice selector. You should modify the contents of the selector to just those point series element types your application expect.

In the code attached to the article the choice element contains all point series elements corresponding to the members of the Cartesian self-product of points base types from the Table 1.

The rest of the schema contains the long list of element type definitions. Each of these types defines the point series with specific x, y base types.

Now the data format is completely defined and we can proceed to the data reading.

Reading the Data

In the code attached to this post (see below) all data reading code is placed into Loader class library project producing XmlDataPointSeries.Loader assembly. Loader class contains the data reading/parsing code and two supplementary classes DataPoint and DataPointSeries provide the place to store the results.

That is the full Loader class code:

/// <summary>
/// Class to load DataPointSeries collection from a file.
/// </summary>
/// <remarks>
/// On error contains the string indicating what wrong is happen while loading.
/// </remarks>
public class Loader
{
  // XML Schema resource name.
  const string schemaFileName = "schema.xsd";
  // XML namespace must be used in XML data files.
  const string namespaceName = "urn:PointSeries-schema";

  /// <summary>
  /// Gets the error string.
  /// </summary>
  /// <remarks>
  /// List of error descritions happen on the input file loading or validation.
  /// </remarks>
  /// <value>The error string or null.</value>
  public string Errors { get; private set; }

  /// <summary>
  /// Loads DataPointSeries collection from the file specified.
  /// </summary>
  /// <param name="fileName">DataPointSeries collection XML file Name.</param>
  /// <returns>DataPointSeries collection on success; null on error.</returns>
  public DataPointSeries[] Load(string fileName)
  {
    Errors = null;
    XmlReader reader = null;
    try
    {
      // Get xml schema stream from the "schemaFileName" resource.
      Assembly assembly = Assembly.GetAssembly(this.GetType());
      ResourceManager rm = new ResourceManager(assembly.GetName().Name + ".g", assembly);
      using (XmlTextReader schemaReader = new XmlTextReader(rm.GetStream(schemaFileName)))
      {
        // Prepaire XmlReaderSettings for input file validation.
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.Schema;
        settings.Schemas.Add(namespaceName, schemaReader);
        StringBuilder sbErrors = null;
        settings.ValidationEventHandler += (sender, e) =>
        {
          if (sbErrors == null)
            sbErrors = new StringBuilder();
          sbErrors.AppendFormat("Validation error: {1}{0}Line={2}, position={3}{0}"
            , System.Environment.NewLine, e.Exception.Message
            , e.Exception.LineNumber, e.Exception.LinePosition);
        };

        // Load and validate the file.
        reader = XmlReader.Create(fileName, settings);
        XDocument xml = XDocument.Load(reader);
        if (sbErrors != null)
        { // Validation error(s) occured.
          Errors = sbErrors.ToString();
          return null;
        }

        XNamespace xns = namespaceName;
        // Check the root element (i.e. Items in "urn:PointSeries-schema" xmlns).
        XElement rootElement = xml.Element(xns + "Items");
        if (rootElement == null)
        {
          Errors = string.Format("Root element {0} missed", xns + "Items");
          return null;
        }
        // Parse the Point.XXX elements.
        return rootElement.Elements().Select<XElement, DataPointSeries>(
          (item) => {
            // Parse item tag name for X/Y type strings.
            string xType, yType;
            getXYTypeStrings(item.Name.ToString(), out xType, out yType);
            // Optional attributes.
            var yName = item.Attribute("YName");
            var xName = item.Attribute("XName");
            DataPointSeries series = new DataPointSeries()
            {
              YName = yName == null ? "" : yName.Value,
              XName = xName == null ? "" : xName.Value,
              XTypeString = xType,
              YTypeString = yType
            };
            foreach (var pt in from pt in item.Elements(xns + "Point") select new { X = pt.Attribute("x").Value, Y = pt.Attribute("y").Value })
            {
              series.AddPoint(pt.X, pt.Y);
            }
            return series;
          }).ToArray();
      }
    }
    catch (Exception ex)
    {
      Errors = ex.Message;
    }
    finally
    {
      if (reader != null)
        reader.Close();
    }

    return null;
  }

  /// <summary>
  /// Parses the point series element tag and returns x,y type strings.
  /// </summary>
  /// <param name="tagName">Tag name.</param>
  /// <param name="xType">Output Type of the x-dimension.</param>
  /// <param name="yType">Output Type of the y-dimension.</param>
  static void getXYTypeStrings(string tagName, out string xType, out string yType)
  {
    int n = tagName.IndexOf('}');
    Debug.Assert(n > 0, "n > 0");
    const string pointsTagPrefix = "Points";
    int pointsTagPrefixLength = pointsTagPrefix.Length;
    Debug.Assert(tagName.Length > n + pointsTagPrefixLength + 1, "tagName.Length > n + pointsTagPrefixLength + 1");
    string xyTypes = tagName.Substring(n + pointsTagPrefixLength + 2);
    n = xyTypes.IndexOf('.');
    if (n < 0)
    {
      xType = xyTypes;
      yType = xyTypes;
    }
    else
    {
      xType = xyTypes.Substring(0, n);
      yType = xyTypes.Substring(n + 1);
    }
  }
}

 

This class isn’t static just because it stores the Errors string if any error happens during data reading. Load method does all interesting work. It’s based on the two pillars:

  1. Data is loaded and validated by XmlReader object with the help of the XML schema defined above.
  2. Loaded XML Data is parsed and stored into target location with the LINQ to XML.

This approach make the Loader class very generic and rather concise.

By design the Load method fails on any error occurred on file opening, reading, parsing and validating and stores the errors descriptions in the string returning by the Errors property. It composes all validation errors found by XmlReader into the Errors string; this give the user a chance to fix all errors at once.

The schema file is stored as the resource in the Loader assembly. Load method uses the ResourceManager class to get that resource stream and pass it to XmlReaderSettings class instance. This instance ValidationType property is set to ValidationType.Schema and ValidationEventHandler is attached to collect all validation error strings into StringBuilder instance.

XmlReader is created with reader = XmlReader.Create(fileName, settings) call and XML is loaded into memory with xml = XDocument.Load(reader). If no errors happen at this point the data is loaded and is valid against the schema.

Load method gets the root element with

XNamespace xns = namespaceName;
XElement rootElement = xml.Element(xns + "Items");

 

Note the XNamespace type variable: it assures that the Items element is defined in the right XML namespace. After that the Load method parses loaded XML and returns the result with

return rootElement.Elements().Select<XElement, DataPointSeries>(…)

 

statement. DataPointSeries instances are create by the lambda statement which

  1. Extracts the data series base types from the XElement tag name with getXYTypeStrings method.
  2. Gets optional attributes.
  3. Creates the instance of the DataPointSeries class.
  4. Fills that instance Points property with the call to DataPointSeries.AddPoint method.

AddPoint method is rather long. It takes string x, y values and converts them into .NET types basing on base type strings extracted from the XElement tag name. It consists of nested switch statements to process all assumed x, y types combinations. Certainly this makes the Loader class less generic but hides from the user some subtleties of converting XML schema types to .NET types (e.g. xsd:double type allows the NaN, INF and –INF values which can not be converted to the right System.Double values with Convert.ToDouble method or its counterparts). If you prefer you could resort to do not convert x, y values to specific types and store them as generic object type.

Using the Code

You can download this post source code here:http://cid-39d56f0c7a08d703.skydrive.live.com/embedrow.aspx/.Public/XmlDataPointSeries/XmlDataPointSeries060409.zip

 

This code contains the Visual Studio 2008 SP1 solution targeted to .NET Framework 3.5 with three projects. Main part is the Loader class library project described above.

One other project is the simple Console Application which loads the XML data from the file pointed by the first command line argument and either reports errors or displays the results of XML data parsing.

Another project is more important. It’s the Unit Test project containing the tests for every data point series type supported by the Loader class. It provides you with examples of which data is currently supported by XML format in question and how they could look like.

Technorati Теги: ,,,

Advertisements

About ovpwp

I am engaged in programming, maintenance and supply of computing technique since 1970. Started with the computers M, BESM, Minsk series and received appropriate education in the least, in which it was then possible in THE USSR. Then he went the usual way - ES series, IBM 360/370, Sun Spark, Ibm Power & PS2, PC. Programming started in the code (machine commands), then were sorts of assemblers, Algol, FORTRAN, PL/1, C, C , VB, C#. It is only the ones that I used the production scale; but there were, of course, others like List, Modula, Pascal, Java, etc. Currently I prefer .NET platform for desktop development. I don't really like web-programming (probably because of the inability for quality design), but I have enough experience in site building in LAMP environment using PHP.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s