In C#, when we think of member-wise equality, we often think of value types. Indeed, their default behaviour (in contrast to that of reference types) is to provide both Equals() and GetHashCode() methods based on their composition rather than their location in memory. However, there are times when we want our reference types to be regarded as equal based on composition as well; the most obvious example of this is System.String, which masquerades as a value type in more ways than one. Let’s look at some other examples of reference types that we might want to equate in a member-wise fashion:

  • Database objects; where multiple instances of the same record should be regarded as equal.
  • An in-memory tuple; whose values are of importance and their provenance not.
  • A compound key in a dictionary or table; where it must be possible to create a new, separate instance to match an existing one.

It’s also worthwhile to point out that anonymous types exhibit member-wise equality as well; this makes it possible to do such things in LINQ as performing a group-by on multiple fields.

Needless to say, there are times when we want this behaviour in our reference types. The usual method of implementing this behaviour (unless using an anonymous type, of course) is to override the Equals/GetHashCode methods in a similar manner to this:

public override bool Equals(object obj) {
    if (obj is MyType) {
        MyType that = (MyType)obj;
        return (this.PropertyA == that.PropertyA) && (this.PropertyB == that.PropertyB);
    }
    else {
        return base.Equals(obj);
    }
}

public override int GetHashCode() {
    return PropertyA.GetHashCode() ^ PropertyB.GetHashCode();
}

Note: It’s important to override both methods and be consistent in the logic used to determine equality/identity for the type. Some framework methods will call the full-blown Equals() method, while others will be content to operate on the hash code only. As a developer, you expect to get the same behaviour every time, without having to know how the framework methods are implemented!

As you can see, this is typical boilerplate code; there seems to be no obvious way to avoid having to type this code out every time, since each class will have a different set of properties (or other members) that must be considered for equality… or is there? What if we could write a very clever base class from which to extend any reference types we want to equate in a member-wise fashion? A base class that would somehow know the public properties of whichever subclass was being compared at the time?

I Know What You’re Thinking…

…you’re thinking reflection, aren’t you? Well, true, reflection does allow a base class to discover the properties of its children, and yes, it could pop the values out of each property of each instance and try to equate them… but… and this is the big BUT… this would be a terribly inefficient technique!

public override bool Equals(object obj) {
    // we're not even checking whether 'obj' is of the same type
    foreach (PropertyInfo property in this.GetType().GetProperties()) {
        if (!property.GetValue(this, null).Equals(property.GetValue(obj, null))) return false;
    }

    return true;
}

Even if we cached the complete set of PropertyInfo objects for each subclass (so they would only have to be retrieved once per type), there’s still the massive overheads of using the reflection API to dynamically get the property values – something which must be performed during every call to Equals() and GetHashCode(). (Think about it – there’s no other way to get the property values. We can’t just cast to the subclass type, because we don’t know what the type is! It might not even exist at the point when the base class is compiled.)

No, i’m afraid reflection doesn’t address this need adequately. In my tests, a reflection-based Equals method like the one above performed at roughly 16 times slower than a type written using boilerplate code!

Expression Trees to the Rescue!

The poor performance of the reflection technique comes largely from the inability to ‘teach’ the runtime how to efficiently repeat the equality comparison once it has discovered the properties of the class; it never learns and it never gets faster! However, there is a different technique we can use which allows the runtime to discover how to perform an equality comparison efficiently after learning the process once; we can build an expression tree to model the equality comparison:

Expression tree for member-wise Equals method

Expression trees are not only a means of representing lambda expressions in a dynamic, structured manner – their real power comes from the ability to dynamically construct an expression tree using the API, then compile it into bona fide, executable code – using Expression<T>.Compile(). What expression trees allow us to do in the context of our base class, then, is to dynamically create (and then compile) a tailor-made Equals() and GetHashCode() implementation for each subclass – and then call these methods again and again and again, with very little performance hit!

static Func<object, object, bool> MakeEqualsMethod(Type type) {
    ParameterExpression pThis = Expression.Parameter(typeof(object), "x");
    ParameterExpression pThat = Expression.Parameter(typeof(object), "y");

    // cast to the subclass type
    UnaryExpression pCastThis = Expression.Convert(pThis, type);
    UnaryExpression pCastThat = Expression.Convert(pThat, type);

    // compound AND expression using short-circuit evaluation
    Expression last = null;
    foreach (PropertyInfo property in type.GetProperties()) {
        BinaryExpression equals = Expression.Equal(
            Expression.Property(pCastThis, property),
            Expression.Property(pCastThat, property)
        );

        if (last == null)
            last = equals;
        else
            last = Expression.AndAlso(last, equals);
    }

    // call Object.Equals if second parameter doesn't match type
    last = Expression.Condition(
        Expression.TypeIs(pThat, type),
        last,
        Expression.Equal(pThis, pThat)
    );

    // compile method
    return Expression.Lambda<Func<object, object, bool>>(last, pThis, pThat)
        .Compile();
}

Of course, we must still use reflection when constructing the expression trees – how else will we discover the properties? But the key point is that we only have to do this step once; from then on, we have a compiled method (callable via a delegate) which is almost indistinguishable from one written using the boilerplate technique. The method takes parameters of type System.Object, because it would be impossible to cast to the subclass type (or write the delegate signature for it) if it required strongly-typed parameters.

The process for creating the GetHashCode() method is fairly similar; we simply replace the conditional AND nodes with XOR, and the Equals nodes with calls to each property’s own hash code method. Obviously, there’s only one input in this case.

We store the delegates for each subclass type against their corresponding System.Type in a class-scoped Dictionary<TKey,TValue>. When an instance is constructed, it checks to see if its type is in the dictionary; if not, it generates and compiles the methods, otherwise it simply proceeds. The delegates are invoked thusly:

public override int GetHashCode() {
    return _functions[GetType()].GetHashCodeFunc(this);
}

public override bool Equals(object obj) {
    return _functions[GetType()].EqualsFunc(this, obj);
}

Usage

With the base class written, all that is needed in order to access the member-wise equality functionality is to extend the class and add an arbitrary number of public properties (only a get accessor is required). For example:

public class MyMemberwiseType : MemberwiseEqualityObject {
    public Guid ID { get; set; }
    public int Number { get; set; }
    public DateTime Date { get; set; }
}

Any attempt to equate two instances of MyMemberwiseType will result in a member-wise comparison. Likewise, operations using the hash code (such as Enumerable.GroupBy) will also operate in this fashion.

Performance

I ran a simple test which compares the execution time of three different memberwise equality comparison techniques (anonymous types, reflection and my expression tree implementation). Each subclass had 8 properties, with each test instance being initialised with the same set of random values (making the test completely fair on each implementation). Under these conditions, when faced with the task of comparing 5 million instances for equality, I obtained the following results:

  • Using an anonymous type: 0.770 sec – this is the fastest, obviously, since its Equals() method is generated at compile-time.
  • Using reflection: 9.882 sec
  • Using dynamically-compiled equality comparison: 0.972 sec

As you can see, there is a massive overhead in using reflection, however my dynamic compilation technique produces results which are much more comparable to a hard-coded method. There’s no such thing as a free lunch, as they say, but in this case the sacrifice is only a small one to avoid a lot of boilerplate code!

Final Words

Currently, the base class performs its comparison on all public properties, unconditionally. It would not be difficult to use a custom attribute type to include/exclude members from consideration, or even to allow non-public properties or fields to be included. Since these changes would affect only the structure of the expression tree (and not add any processing to the Equals() method itself), there would be no performance hit in doing so.

In summary, then, i’ve demonstrated how to save on boilerplate code for member-wise equality comparisons by writing a base class which dynamically generates Equals() and GetHashCode() methods, using expression trees. If, like me, you find the task of writing such code to be a tedious exercise, I hope you enjoy the possibilities presented by this solution 🙂

Download

MemberwiseEquality.zip – includes MemberwiseEqualityObject (the implementation itself), ReflectEqualityObject (for performance comparison) and a console application to replicate my tests.

9 thoughts on “Member-Wise Equality Without the Boilerplate

  1. vote

    Actually, I may be having a it’s-Friday-I-need-to-go-home moment, but there’s two things I’m curious about:

    (1) By changing line 12 in the MakeEqualsMethod() method to something like

    var equals = Expression.Call(
    equalsMethod,
    Expression.Property(pCastThis, property),
    Expression.Property(pCastThat, property));

    where equalsMethod is something like

    var equalsMethod = typeof(object).GetMethod(“Equals”, BindingFlags.Public | BindingFlags.Static);

    then we end up compiling a bunch of “Object.Equals(x.Property, y.Property)” as opposed to “x.Property == y.Property” statements.

    The advantage here is that if any of those properties overrode the Equals() method to implement their own kind of equality, then we will preserve the equality. The idea being that most reference types will be overriding Equals() and not the == operator, even if they are immutable.

    (2) I’m not sure I get the bit in line 23. If I have an x that is of type Foo, and a y that is of type Bar, well then they’re not equal–they’re not even the same type! Why not just return false here? If we change that to

    last = Expression.Condition(
    Expression.AndAlso(
    Expression.NotEqual(pThat, Expression.Constant(null)),
    Expression.TypeIs(pThat, type)),
    last,
    Expression.Constant(false, typeof(bool))

    then we build an expression tree that looks something like

    if (y != null & y is Foo)
    {
    return Object.Equals(x.A, y.A) && Object.Equals(x.B, y.B); // let each property define its sense of equality
    }
    else
    {
    return false; // it’s either null or it’s not my type
    }

    All in all, still a fantastic piece of work! Thanks again!

    Reply
  2. vote

    This is brilliant code. A great timesaver over writing and rewriting all of that boilerplate code. There’s one other flaw other than those that Nicholas Piasecki pointed out. The code above doesn’t work if the reference type has members that are of type IEnumerable (that includes arrays).

    I adjusted the MakeEqualsMethod code like so:

    // compound AND expression using short-circuit evaluation
    Expression last = null;
    foreach (PropertyInfo property in type.GetProperties())
    {
    dynamic equals;
    if (property.PropertyType != typeof(String) && property.PropertyType.GetInterface(typeof(System.Collections.IEnumerable).FullName) != null)
    {
    var arg = property.PropertyType.IsArray ? property.PropertyType.GetElementType() : property.PropertyType.GetGenericArguments()[0];
    var enEqualsMethod = typeof(Extensions).GetMethod("EnumerablesAreEqual").MakeGenericMethod(arg);
    equals = Expression.Call(enEqualsMethod, Expression.Property(pCastThis, property), Expression.Property(pCastThat, property));
    }
    else if (property.PropertyType.IsValueType)
    {
    equals = Expression.Equal(Expression.Property(pCastThis, property), Expression.Property(pCastThat, property));
    }
    else
    {
    equals = Expression.Call(equalsMethod, Expression.Property(pCastThis, property), Expression.Property(pCastThat, property));
    }

    if (last == null)
    {
    last = equals;
    }
    else
    {
    last = Expression.AndAlso(last, equals);
    }
    }

    This assumes that you have a static class called “Extensions” that has a Generic method in it like so:


    /// more than ten times faster than SequenceEquals
    public static bool ArraysAreEqual(T[] b1, T[] b2)
    {
    // if both are null, ReferenceEquals returns true
    if (ReferenceEquals(b1, b2))
    {
    return true;
    }
    if (b1 == null || b2 == null)
    {
    return false;
    }
    if (b1.Length != b2.Length)
    {
    return false;
    }

    EqualityComparer comparer = EqualityComparer.Default;
    return !b1.Where((t, i) => !comparer.Equals(t, b2[i])).Any();
    }

    and I’ve adjusted the MakeGetHashCodeMethod like so:

    foreach (PropertyInfo property in type.GetProperties())
    {
    MethodCallExpression hash;
    if (property.PropertyType.IsGenericType)
    {
    var enumerationEqualsMethod = typeof(Extensions).GetMethod("GetHashCodeAggregate").MakeGenericMethod(property.PropertyType.GetGenericArguments()[0]);
    hash = Expression.Call(enumerationEqualsMethod, Expression.Property(pCastThis, property));
    }
    else
    {
    hash = Expression.Call(Expression.Property(pCastThis, property), "GetHashCode", Type.EmptyTypes);
    }

    if (last == null)
    last = hash;
    else
    last = Expression.ExclusiveOr(last, hash);
    }

    Reply
  3. vote

    IN my previous post, I forgot to mention that you’d need a static class that contains this:

    public static int GetHashCodeAggregate(this IEnumerable source)
    {
    var hash = 17;
    unchecked
    {
    hash = source.Aggregate(hash, (current, item) => current * 31 + item.GetHashCode());
    }

    return hash;
    }

    Reply
  4. vote

    I’m surprised no one has mentioned hash collisions as a problem with this solution. If you really need to know if two things are equal, this can fail you sometimes and make hard to track, low frequency bugs. Sad face!

    Reply
    • vote

      Remember that the Equals method that is generated by this code compares property values directly, not hash codes. Hash collisions don’t enter into it unless you use the GetHashCode method instead and, in that respect, the problem of collisions is certainly not unique to the method generated by my solution. If you need a non-trivial implementation of GetHashCode for a class, obviously you would not use the basic XOR technique that i’m using here. That would warrant writing the code by hand.

      Reply
  5. vote

    Both the original code, and the newer one in the Gist fails this test case, why?

    [Fact]
    public void GetHashSmallTest()
    {
    var x = new Small();
    var res = x.GetHashCode();
    }

    public class Small : MemberwiseEqualityObject
    {
    public string Str1 { get; set; }
    }

    Reply
    • vote

      I think I solved it by make string a special case and checking for null.

      static Expression MakeGetHashCodeExpression(Expression thisObject, PropertyInfo propertyInfo)
      {
      var thisProperty = Expression.Property(thisObject, propertyInfo);
      if (typeof(IEnumerable).IsAssignableFrom(propertyInfo.PropertyType) && !propertyInfo.PropertyType.Equals(typeof(string)))
      {
      var hashMethod = CreateGetEnumeratedHashCodeMethod(propertyInfo.PropertyType);
      return Expression.Call(hashMethod, thisProperty);
      }
      else if (propertyInfo.PropertyType.Equals(typeof(string)))
      {
      BinaryExpression binaryExpression = Expression.Equal(thisProperty, Expression.Constant(null, typeof(object)));
      ConstantExpression constantExpression = Expression.Constant(17, typeof(int));
      var elseExpression = Expression.Call(thisProperty, “GetHashCode”, Type.EmptyTypes);
      // Expression.Condition is ?:
      return Expression.Condition(
      binaryExpression,
      constantExpression,
      elseExpression);
      }
      else
      {
      return Expression.Call(thisProperty, “GetHashCode”, Type.EmptyTypes);
      }
      }

      Reply
      • 0
        vote

        You’re right; my original GetHashCode method doesn’t allow for null values. Though I will comment that is generally seen as bad practice to leave String properties uninitialised; it is better to set them to String.Empty in the constructor.

        Reply

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required