This is just some general housekeeping.

Several users reported a problem downloading .ZIP files from this blog; a problem which I could not reproduce until I tried downloading under Internet Explorer. Evidently, my web host did not have the correct MIME type set; it was using application/zip (works on Firefox only) instead of application/octet-stream (works on Firefox and IE).

Apologies to any readers who experienced this problem, this has now been rectified.

Instances of the IEnumerable interface (either generic or non-generic) are termed sequences. This is a term which has gained increasing significance with the advent of iterator blocks and, more recently, LINQ. As the most primitive form in which an aggregation of objects can be represented, sequences have diverse uses and are the bread and butter of general-purpose code that operates on multiple objects at a time. They can represent static or dynamic sets of objects which can be either persistent or procedural in origin. A sequence can be anything from an array to a list to a random number generator. By operating on sequences in their most primitive form, we can completely abstract their implementation and write code which regards them as equivalents;

// property which returns a sequence used below
public IEnumerable<int> RandomNumbers {
    get {
        Random rand = new Random(0);            // ensure sequence is deterministic
        while (true) yield return rand.Next();
    }
}

// demonstrates how the LINQ operators can work on many different sequence types
public void QuerySequences() {
    // this sequence is a list, but we don't care about that
    List cheeses = new List() { "feta", "brie", "camenbert", "cheddar", "edam" };
    Console.WriteLine(cheeses.Where((x) => x.Length > 4).First());

    // this sequence is a finite range of integers
    Console.WriteLine(Enumerable.Range(100, 200).Where((x) => x > 100).First());

    // this sequence is an infinite set of random numbers
    Console.WriteLine(RandomNumbers.Where((x) => x > 50).First());
}

Until LINQ and the concept of ‘deferred execution’ came along (which really just means that a sequence is dynamic and isn’t persisted in memory), we tended to regard sequences as being static and persistent. As a result, we used them to very little effect, aside from the all important foreach statement (which was the original use case for IEnumerable). The way we use sequences has changed a lot; the result of a deferred LINQ query can be thought of as a deeply-nested set of iterator blocks, with each progression through the sequence triggering a progression through the inner sequence, and each progression through the inner sequence triggering a progression through the next nested sequence, and so on. This can be quite a concept to grasp when initially working with LINQ. However, while the way we use sequences has changed, the way we define them still represents a great many opportunities for innovation.

Concatenated Sequences

The Concat<> operator in LINQ does far more than simply join two sequences together. As it is a deferred operator, it returns a dynamic iterator that always represents the concatenation of the two (or however many you want) original sequences. In this way, the concatenation can be used much like a view can be used in a database:

// define two independent lists
List<string> first = new List<string>() { "apple", "orange", "grape", "peach", "nectarine" };
List<string> second = new List<string>() { "bacon", "beef", "chicken", "pancetta" };

// concatenate the two lists and output the contents
IEnumerable<string> wrapped = first.Concat(second);
Console.WriteLine(wrapped.Aggregate((x, y) => String.Format("{0}, {1}", x, y)));

// alter the contents of the original lists
first.Remove("orange");
second.Insert(2, "pork");

// output the contents - the changes are reflected in the output
Console.WriteLine(wrapped.Aggregate((x, y) => String.Format("{0}, {1}", x, y)));

This means that we can alter, reorder or even clear either of the original lists and still expect our changes to be reflected in the concatenated sequence. More than that, we can pass the object representing the concatenation to other methods or components – the linkage will always be preserved and we will never have to update or synchronise the sequences.

The Bigger Picture

So, let’s say that we have an object – the result of a combination of deferred LINQ operators – that represents a view on our original data. Recall that this data might come from a single location, or possibly dozens of different, independent sequences. Wouldn’t it be great if we could combine this “view” with the power of data-binding? Well, great news! We can. Data-binding requires an instance of IList (again, either generic or non-generic) to use as a data source, but the flexibility of the .NET Framework’s definition of a “list” is broad enough to make it easy to wrap a sequence with the functionality provided by IList.

Important: In this context, i’m talking about binding to something analogous to a database view. This means one-way (read-only) data-binding. Although it would be possible to implement a read/write IList wrapper for a sequence, it would be an inefficient solution.

After defining a wrapper class called ListSequenceWrapper<T>, which implements IList<T> and references the sequence via the property InnerSequence, we implement the required functionality as follows:

  • IndexOf(item)– Easily simulated using LINQ:
    return InnerSequence.TakeWhile((x) => !Object.Equals(x, item)).Count();
  • Insert() – Explicit implementation, throws NotSupportedException.
  • RemoveAt() – Explicit implementation, throws NotSupportedException.
  • this[] – Get uses the LINQ ElementAt<> operator, set throws NotSupportedException.
  • Add() – Explicit implementation, throws NotSupportedException.
  • Clear() – Explicit implementation, throws NotSupportedException.
  • Contains() – Uses the LINQ Contains<> operator.
  • CopyTo() – Simple foreach loop with an index variable.
  • Count – Uses the LINQ Count<> operator.
  • IsReadOnly – Unconditionally returns true.
  • Remove() – Explicit implementation, throws NotSupportedException.
  • GetEnumerator() – Simply calls the corresponding method on the wrapped sequence.

With this class at our disposal, we could easily bind our earlier concatenated sequence to a Windows Forms ComboBox control:
(assumes a BindingSource component has been associated with the ComboBox and the form also contains a Button control)

List<string> first = new List<string>() { "apple", "orange", "grape", "peach", "nectarine" };
List<string> second = new List<string>() { "bacon", "beef", "chicken", "pancetta" };

ListSequenceWrapper<string> wrapped = new ListSequenceWrapper<string>(first.Concat(second));
myBindingSource.DataSource = wrapped;

myButton.Click += (x, y) => {
    first.Remove("orange");
    second.Insert(2, "pork");
    myBindingSource.DataSource = null;      // force re-binding
    myBindingSource.DataSource = wrapped;
};

Other Applications

In case the possibilities aren’t immediately obvious, here are some potential applications for data-binding to a dynamic/deferred sequence:

  • A concatenation of a sequence of folders and a sequence of files would allow you to apply a sort to both sequences while maintaining folder-first order. Binding to the concatenation would remove the need to store and maintain the union in a separate collection.
  • Binding to a procedural-style sequence (such as Enumerable.Range) would make it possible to measure the performance of data-binding to a long list without occupying a lot of memory.
  • The drop-down portion of a ComboBox may contain both static/hard-coded choices as well as dynamic items. Binding to a concatenated sequence would allow independent edits to one sequence while maintaining the integrity of the other.
  • A user-interface may assign some significance to the order of the sequence it is bound to, while the order may be unimportant to the underlying sequence. Rationalising the sequence as a list avoids unnecessarily augmenting the data representation.
  • It may be useful in some applications to bind to a non-deterministic data source. For example, a “Tip of the Day” control might be bound to a sequence that changes each time it is observed.

Final Words

If there is anything that readers should take away with them after reading this article, it’s the need to move away from thinking of IEnumerable as only defining sequences with a backing store, like IList or ICollection. There is immense power in using sequences that represent logical transformations of data, rather than physically-redundant copies. Utilising sequences in this way can reduce memory usage, streamline and beautify code, as well as tipping the balance between feasibility and infeasibility of a solution. If developers were to exploit sequences to their own ends as much as they do when using LINQ, I think we could see some very exciting code in future.

Source Code

ListSequenceWrapper.cs

During the development of an Excel-based report generation system, I frequently had to grapple with the appaulingly slow speed of Office Automation using VSTO. Virtually every call to a method or property on one of the COM objects exposed by VSTO carries with it a significant performance cost. Thankfully, it is possible to read/write cell values using arrays, allowing huge amounts of data to be ferried between the CLR and Excel with minimal calls. Unfortunately, however, there are some operations that still require a lot of method calls.

One such example is the task of inserting a large number of rows in an existing worksheet. I found myself in a situation where we had to ‘finish’ pro-forma spreadsheets with dynamic data by inserting the appropriate number of rows into the sheet (maintaining the formatting in the first ‘pro-forma row’) and then populating the values. The latter part is simple, using the array technique mentioned earlier, but the process of inserting the rows – which can number in the thousands in some cases – was causing a huge performance hit.

It’s not often that algorithms have to be designed specifically to minimise the number of method calls, but I think i’ve come up with a winner. I exploit the fact that Excel is able to insert multiple rows from the clipboard, and make use of the efficiency gained by using powers of 2. The algorithm is structured thusly:

  1. Let index represent the number of the row we most recently inserted.
  2. Let step represent the number of rows we can insert per method call (will always be a power of 2).
  3. While indexis less than the total number of rows to be inserted:
    1. Copy the range of rows starting at the insert position and spanning step rows.
    2. Insert the copied rows at the insert position.
    3. Increment index by step.
    4. Let diff represent the number of rows remaining to be inserted.
    5. If diff is greater than the number of rows represented by twice the value of step, double the value of step.
    6. Otherwise, set step to the value of diff.

So, essentially, the number of rows inserted doubles on each iteration (taking advantage of the fact that there are more and more rows that can be copied each time) – until we pass the half-way point, then only one further insertion occurs. As a result, the number of method calls on Excel objects approximates to O(log n).

Source Code

// using Excel = Microsoft.Office.Interop.Excel;

void InsertRows(Excel.Range firstRow, int rowCount) {
    Excel.Worksheet worksheet = firstRow.Worksheet;
    Excel.Range insertPos = worksheet.Cells[1, firstRow.Row + 1].EntireRow;
    int i = 1;
    int step = 1;
    while (i < rowCount) {
        // copy the existing row(s)
        worksheet.get_Range(
            "$" + (firstRow.Row) + ":$" + (firstRow.Row + step - 1),
            Type.Missing
        ).Copy(Type.Missing);

        // insert copied rows
        insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);

        i += step;

        int diff = (rowCount - i);
        if (diff > (step * 2))
            step *= 2;
        else
            step = diff;
    }
}

This question was asked on Stack Overflow recently, and it got me thinking about a more elegant and efficient method for reading the XML documentation (comments) from a class, method, property, etc at run-time. As i’m currently reading a book on LINQ, I figured this would be a good chance to make use of the new XDocument class from LINQ-to-XML.

Before I continue, one begs the question as to why you’d need to read XML comments at run-time. Well, I can think of these reasons, at the very least:

  • Avoiding duplication when adding design-time support (since your XML doc usually echoes what you put into DescriptionAttribute)
  • Generating API documentation using a combination of reflection and XML comments
  • Providing more meaningful error messages when a method/class is invoked improperly by the caller

Navigating XML documentation

We all use XML comments and know how they’re structured in code, but the way that they translate into a complete XML document requires some explanation. When you enable the generation of XML documentation for a project, it will spit out an XML file with the same name as the assembly/executable, e.g. SomeProject.dll produces SomeProject.xml. The document is structured as follows:

<?xml version="1.0" ?>
<doc>
  <assembly>
    <name>SomeProject</name>
  </assembly>
  <members>
    <member name="T:SomeProject.SomeClass">
      <summary>Documentation for SomeClass.</summary>
    </member>
    <member name="M:SomeProject.SomeClass.SomeMethod(System.String)">
      <summary>Documentation for SomeMethod.</summary>
      <param name="someParam">Documentation for someParam.</param>
      <returns>Documentation for return value.</returns>
    </member>
    <!-- ... -->
  </members>
</doc>

Of note in the above example are the following:

  • There is no hierarchical structure beyond <members> – all classes, methods, properties and even nested types appear in one massive flat list.
  • There is a very specific naming convention for the name attribute on the <member>tag:
    • Starts with a prefix character: T=type, M=method/constructor, P=property, E=event, F=field
    • The prefix is followed by a colon (:)
    • This is followed by the full name of the member, including the namespace
    • Method/constructor parameters are identified by their type (not their name) and are separated by commas (without spaces)

Furthermore:

  • Constructors are named as #ctor instead of the reflected name .ctor
  • Nested types are named as OwningType.NestedType instead of the reflected name OwningType+NestedType

Translating from a reflected member

Bearing in mind the above, the process for obtaining the value of a name attribute for a member is fairly simple. Given a reflected member of type MemberInfo (the base class from which Type, MethodInfo, PropertyInfo, etc descend):

char prefixCode;
string memberName = (member is Type)
    ? ((Type)member).FullName                               // member is a Type
    : (member.DeclaringType.FullName + "." + member.Name);  // member belongs to a Type

switch (member.MemberType) {
    case MemberTypes.Constructor:
        memberName = memberName.Replace(".ctor", "#ctor");
        goto case MemberTypes.Method;
    case MemberTypes.Method:
        prefixCode = 'M';
        string paramTypesList = String.Join(
            ",",
            ((MethodBase)member).GetParameters()
                .Cast<ParameterInfo>()
                .Select(x => x.ParameterType.FullName
            ).ToArray()
        );
        if (!String.IsNullOrEmpty(paramTypesList)) memberName += "(" + paramTypesList + ")";
        break;

    case MemberTypes.Event: prefixCode = 'E'; break;
    case MemberTypes.Field: prefixCode = 'F'; break;

    case MemberTypes.NestedType:
        memberName = memberName.Replace('+', '.');
        goto case MemberTypes.TypeInfo;
    case MemberTypes.TypeInfo:
        prefixCode = 'T';
        break;

    case MemberTypes.Property: prefixCode = 'P'; break;

    default:
        throw new ArgumentException("Unknown member type", "member");
}

return String.Format("{0}:{1}", prefixCode, memberName);

Note the use of LINQ to effortlessly transform the array of ParameterInfo objects into a comma-separated list of strings. Now that we have the name that we expect to be able to locate in the XML documentation, we can read the comments.

Reading comments using XDocument and XPath

LINQ-to-XML introduces XDocument, designed to overcome the enormous list of shortcomings and complexities relating to XmlDocument. It is now the preferred object model for reading, querying and otherwise operating upon XML documents (or even fragments, the implementation doesn’t care).

Assuming the XML documentation file is in the same location as the executable, it’s ludicrously simple to get the XML comments for a reflected member:

AssemblyName assemblyName = member.Module.Assembly.GetName();
XDocument xml = XDocument.Load(assemblyName.Name + ".xml");
return xml.XPathEvaluate(
    String.Format(
        "string(/doc/members/member[@name='{0}']/summary)",
        GetMemberElementName(member)
    )
).ToString().Trim();

As you can see above, it’s just a simple XPath expression which returns the text within the <summary> node for the appropriate member.

The process for getting the documentation for a <param> or <returns> node is only slightly more complicated. Given a ParameterInfo instance:

if (parameter.IsRetval || String.IsNullOrEmpty(parameter.Name))
    return xml.XPathEvaluate(
        String.Format(
            "string(/doc/members/member[@name='{0}']/returns)",
            GetMemberElementName(parameter.Member)
        )
    ).ToString().Trim();
else
    return xml.XPathEvaluate(
        String.Format(
            "string(/doc/members/member[@name='{0}']/param[@name='{1}'])",
            GetMemberElementName(parameter.Member),
            parameter.Name
        )
    ).ToString().Trim();

Putting it altogether with extension methods

Extension methods are brilliantly suited to the task of providing an intuitive entry point for this functionality. Since reflected members all descend from MemberInfo (except parameters, which are of the type ParameterInfo as previously indicated), we can define a GetXmlDocumentation() extension method for MemberInfo:

public static string GetXmlDocumentation(this MemberInfo member) { /* ... */ }

…and a separate one for ParameterInfo:

public static string GetXmlDocumentation(this ParameterInfo parameter) { /* ... */ }

This means that calling the method is as simple as:

Console.WriteLine(typeof(SomeClass).GetMethod("SomeMethod").GetXmlDocumentation());
Console.WriteLine(typeof(SomeClass).GetMethod("SomeMethod").GetParameter("someParam").GetXmlDocumentation());
Console.WriteLine(typeof(SomeClass).GetMethod("SomeMethod").ReturnParameter.GetXmlDocumentation());

So, there you have it. My full implementation offers some overloads, as well as a mechanism for caching the XML data for each queried assembly, however these are fairly trivial additions. The real guts of the implementation have been described above.

I hope this code helps you leverage your XML comments 🙂

Download

XmlDocumentationExtensions.cs

Data binding is awesome. Sometimes we take it for granted that a list or grid control can, with only a few lines of code, both visualise and manipulate a collection. It’s a principle that supports the separation of model and presenter, not to mention making the task of developing a GUI a real breeze. More than that, though; when you understand the underlying principles behind data binding, it becomes so much cooler. In a previous article, I gave some background on how data binding accesses properties (which, as I explained, might not be actual property members; they could be logical, like columns in a DataTable). PropertyDescriptor is the magic class in all this, defining the ‘property’, naming it and providing a mechanism to get/set its value. The properties exposed by a data source (either by reflection or by ITypedList) can be used for the DisplayMember/ValueMember properties (of ListBox and ComboBox) and as columns in a DataGridView.

That’s all fine and dandy, but what if the property we want to bind to doesn’t belong directly to the list item? What if that property belongs to an object that the list item contains? For example, consider the following situation:

UML diagram for this example

A Person has a Name and a HairStyle. In UML, we’d say that Person “has a” HairStyle (or, HairStyle is aggregated by Person). A HairStyle, in turn, has a Colour and a Length. Say we’re binding a List<Person> to a DataGridView control. How can we bind one column to the Person‘s Name property and another to the Person‘s HairStyle‘s Colour property?

One column’s DataPropertyName will be set to “Name”. What about the other? Can we have “HairStyle.Colour”? The answer is that, no, with conventional data binding, we can’t. Only properties which belong directly to Person are available for binding. The good news is that, thanks to the aforementioned ITypedList interface (already used in, for example, DataTable) we can solve this problem.

Okay, before we go any further, you might be asking, “Why not transform the data into a flat form first?”. True, we could use LINQ to return a sequence of anonymous types which would contain bindable properties from both classes. There are other approaches too, however all of these destroy the potential two-way relationship that data binding offers. Such a data source would be immutable, hence it would be read-only in the DataGridView control. Without editing capabilities, the power of the grid control is significantly diminished. What if we want the user to be able to view and edit the person’s name and hair colour in the same place? It would be really cool if we could bind to properties on aggregated objects…

Yes, we can (with ITypedList)

The ITypedList interface exposes a method called GetItemProperties(), which returns a collection of PropertyDescriptor objects. By implementing this interface, we can create a collection class which exposes not only the properties on the list items, but also those on objects owned by the list items. In fact, we can use recursion to get the properties on the properties, and the properties on those properties, and so on! We’ll extend the existing List<T> generic class and implement ITypedList:

    public class AggregationBindingList<T> : List<T>, ITypedList {
        public PropertyDescriptorCollection GetItemProperties(PropertyDescriptor[] listAccessors) { /* ... */ }
        IEnumerable<PropertyDescriptor> GetPropertiesRecursive(Type t, PropertyDescriptor parent, Attribute[] attributes, int depth) { /* ... */ }
    }

We can get the properties on the list items (and the aggregated objects) using the TypeDescriptor class, as follows:

IEnumerable<PropertyDescriptor> GetPropertiesRecursive(Type t, PropertyDescriptor parent, Attribute[] attributes, int depth) {
    if (depth >= MAX_RECURSION) yield break;

    foreach (PropertyDescriptor property in TypeDescriptor.GetProperties(t, attributes)) {
        yield return property;

        foreach (PropertyDescriptor aggregated in GetPropertiesRecursive(property.PropertyType, parent, attributes, depth+1)) {
            yield return property;
        }
    }
}

(I included a cap on the depth of recursion; if an class has a property of its own type, walking through all the aggregated properties would result in an infinite loop.)

See anything wrong with the code above? On the face of it, it looks sound, however one must recall how PropertyDescriptor is accessed by data binding; namely, it will pass an instance of the list item to the descriptor’s GetValue/SetValue method. A PropertyDescriptor for a property on an aggregated object will expect an instance of the aggregated object, not the list item! The other problem presented is that we have no way of uniquely identifying each property, making it impossible to relate the properties to the properties that own them.

A PropertyDescriptor for properties on aggregated objects

We can solve both of the aforementioned problems by creating our own class which derives from PropertyDescriptor. Essentially, we need to wrap the property descriptor supplied to us by TypeDescriptor.GetProperties() and hold a reference to the property which owns it; this property may, in turn, be owned by another property, and so on (due to the use of recursion). When data binding calls GetValue or SetValue, supplying an instance of the list item, we’ll call the corresponding method on the owning property first, in order to return an instance of the aggregated object. We can then call the wrapped property’s method to get or set the value on this object:

public override object GetValue(object component) {
    return AggregatedProperty.GetValue(OwningProperty.GetValue(component));
}

In the constructor for our AggregatedPropertyDescriptor class, we collect the inner and outer properties and set the name of the aggregated property appropriately:

Note: We cannot use the dot (.) symbol to delimit aggregation (e.g. “HairStyle.Colour”) because the ComboBox control will truncate the string when it encounters that character. Instead, i’ve opted to use the C++ pointer-to-member symbol (->).

public AggregatedPropertyDescriptor(PropertyDescriptor owner, PropertyDescriptor aggregated, Attribute[] attributes)
    : base(owner.Name + "->" + aggregated.Name, attributes) {
    OwningProperty = owner;
    AggregatedProperty = aggregated;
}

Completing the GetItemProperties method

Now that we have a mechanism to correctly handle properties on aggregated objects, we can finish the GetItemProperties() method on our AggregationBindingList<T> class:

IEnumerable<PropertyDescriptor> GetPropertiesRecursive(Type t, PropertyDescriptor parent, Attribute[] attributes, int depth) {
    if (depth >= MAX_RECURSION) yield break;

    foreach (PropertyDescriptor property in TypeDescriptor.GetProperties(t, attributes)) {
        if (parent == null) {
            // property belongs to root type, return as-is
            yield return property;
        }
        else {
            // property is on an aggregated object, wrap and return
            yield return new AggregatedPropertyDescriptor(parent, property, attributes);
        }

        foreach (PropertyDescriptor aggregated in GetPropertiesRecursive(property.PropertyType, parent, attributes, depth+1)) {
            yield return new AggregatedPropertyDescriptor(property, aggregated, attributes);
        }
    }
}

And there you have it; when data binding enumerates the properties on the list items, it will see:

  • Name
  • HairStyle
  • HairStyle->Colour
  • HairStyle->Length

Furthermore, we will be able to get and set values on all of these properties.

Example usage

A more complex example using a self-referencing Person class:

public class Person {
    public string Name { get; set; }
    public int Age { get; set; }
    public Person Father { get; set; }
    public Person Mother { get; set; }
}

Assuming a Form with a DataGridView control on it:

dataGridView.AutoGenerateColumns = false;

dataGridView.Columns.Add("Name", "Person");
dataGridView.Columns["Name"].DataPropertyName = "Name";

dataGridView.Columns.Add("Age", "Age");
dataGridView.Columns["Age"].DataPropertyName = "Age";

dataGridView.Columns.Add("FatherName", "Father's name");
dataGridView.Columns["FatherName"].DataPropertyName = "Father->Name";

dataGridView.Columns.Add("MotherName", "Mother's name");
dataGridView.Columns["MotherName"].DataPropertyName = "Mother->Name";

dataGridView.Columns.Add("GrandfatherName", "Grandfather's name");
dataGridView.Columns["GrandfatherName"].DataPropertyName = "Father->Father->Name";

AggregationBindingList<Person> people = new AggregationBindingList<Person>();

Person harry = new Person { Name = "Harry", Age = 75, Father = null, Mother = null };
Person frank = new Person { Name = "Frank", Age = 65, Father = null, Mother = null };
Person angela = new Person { Name = "Angela", Age = 68, Father = null, Mother = null };
Person bob = new Person { Name = "Bob", Age = 35, Father = frank, Mother = angela };
Person fred = new Person { Name = "Fred", Age = 32, Father = harry, Mother = angela };
Person mary = new Person { Name = "Mary", Age = 36, Father = null, Mother = null };
Person jim = new Person { Name = "Jim", Age = 5, Father = bob, Mother = mary };

people.AddRange(new Person[] { bob, fred, mary, jim });
dataGridView.DataSource = people;

Download

AggregationBindingList.cs