Text Object Model Demo

I’ve recently been doing a lot of work with the WinForms RichTextBox control, trying to implement the familiar red wavy underlines associated with spell-checking. After some research, I discovered that the underlying (native) RichEdit control used by the RichTextBox does in fact support this style of underlining. The catch? It is only accessible through the EM_SETCHARFORMAT window message, which requires PInvoke. While it is relatively simple to set this style, there are drawbacks:

  • High overheads associated with marshalling
  • Clunky operation, requires you to send the CHARFORMAT2 structure
  • You can only get/set the style for the selected text

The last point in particular is of most concern to me, and is a serious limitation of the RichTextBox control. Almost all of the rich text functionality is restricted to the text selected in the control. In practice, this means that any non-trivial text manipulation requires you to constantly store the current selection, select the text you wish to manipulate, apply the styles and then restore the old selection. This is slow and extremely inefficient.

I then looked into something called the Text Object Model (or TOM). This comprises a set of COM interfaces which are implemented by the underlying RichEdit control. The functionality exposed by these interfaces is quite similar to that of the Microsoft Word object model, where you have the concept of Document, Range, Selection and so on. Importantly, TOM allows you to operate on ranges of rich text without the need to alter the selection. It also provides access to a range of functionality not otherwise available for the RichTextBox control:

  • Granular selection by character, word, sentence and paragraph
  • Font weights, underline styles, all-caps
  • Tab stops, list styles, full justification of text
  • Find functionality, translation to/from screen coordinates

Using the Text Object Model

The TOM interfaces are:

  • ITextDocument – Represents a top-level document, from which text ranges can be obtained.
  • ITextRange – Represents a range of rich text. Ranges can be moved, resized and styled with ease.
  • ITextSelection – Special text range that represents the selected text in the control.
  • ITextFont – Character formatting options.
  • ITextPara – Paragraph formatting options.

It is relatively easy to obtain an instance of ITextDocument from a RichTextBox control. One simply sends the EM_GETOLEINTERFACE message to the control. There are a number of ways that you can interact with the object that gets returned:

  1. Use the dynamic keyword to access the object’s members – this requires the least programming effort, but introduces the possibility for errors and has a number of overheads.
  2. Define the TOM interfaces in managed code and use COM interop – this allows you to work in a strongly-typed manner, but still has the overheads associated with marshalling. Also, the interfaces themselves are not terribly .NET-friendly, and they lack some of the conveniences that would be familiar to most .NET developers.
  3. Write wrapper classes for each of the TOM interfaces – while this approach requires the most coding, the end result is a fast, efficient way of accessing the TOM functionality. The calling code does not have to use COM interop and the wrapper classes can translate unfriendly constructs into more .NET-friendly ones.

My solution

Needless to say, I settled for the 3rd option. I decided to implement the wrapper classes in a C++/CLI assembly, because this allowed me to access the TOM interfaces using native code, while ultimately exposing a set of managed classes. Well-written native code (that explicitly marshals data to managed code) will always be faster than COM interop, where the runtime has to apply more checks and is unable to make as many assumptions about the types it is operating on.

Some of the benefits offered by my wrappers include:

  • Managed enumerations to replace integer constants and flags
  • Properties to replace Get and Set methods
  • Translation of HRESULT codes into managed exceptions
  • Returning objects instead of pointers
  • Omission of TOM functionality not supported by the RichEdit control (e.g. text shadows, animation)
  • Use of more .NET-friendly types and nomenclature; e.g. IEquatable, ToString, IDataObject, Color and Point

The top-level class in my implementation is TextDocument, which wraps ITextDocument. It includes a static method which creates an instance from a RichTextBox control. Once created, you have full access to the Text Object Model functionality to manipulate the text inside the control.

Usage

Once obtaining a TextDocument instance from a RichTextBox control, working with formatted text is easy:

// create a RichTextBox control in the usual way
RichTextBox rtb = new RichTextBox();

// create a TOM document object (and enable advanced typography on the control)
TextDocument doc = TextDocument.FromRichTextBox(rtb, true);
TextRange range = doc.EntireRange;

// set some text
range.Text = "This is a piece of rich text.";
range.Font.Name = "Calibri";
range.Font.Size = 16f;

// find a word and apply formatting
range.FindText("piece");
range.Font.Bold = true;
range.Font.ForeColor = Color.Red;

// insert a tab
range.Collapse(RangePosition.End);
range.Text = "\t";

// resize range and apply more formatting
range.FindText("rich");
range.MoveEnd(TextUnit.Word, 2);
range.Font.UnderlineStyle = TextUnderlineStyle.Wave;
range.Font.UnderlineColor = TextUnderlineColor.Blue;

// append raw RTF using IDataObject
range.MoveEnd(TextUnit.Story, 1);
range.Collapse(RangePosition.End);
range.SetDataObject(new DataObject(
    DataFormats.Rtf, 
    @"{\rtf1\ansi\deff0\pard \par Here is some \ul more\ul0  rich text.\par}"
));

// set paragraph formatting
doc.EntireRange.Para.Alignment = TextAlignment.Center;

Download

Go to the project page here: TOM Classes for .NET

Final words

This set of managed wrapper classes for the Text Object Model provides a fast and efficient way to manipulate the text in a RichTextBox control. Importantly, it solves the limitation of only being able to apply styles to the selected text in the control. As an added bonus, it also provides access to a wider range of character and paragraph formats, as well as a number of convenience methods for working with ranges of formatted text. I hope you find it useful in your own rich text applications.

Additional resources

Text Object Model on MSDN
Rich Text Format Specification

2 thoughts on “Managed Wrapper for the Text Object Model (TOM)

  1. +1
    vote

    Fantastic! I was so tired of using endless SendMessage commands and keeping track of all the windows constants to get the richtextbox to do what the richedit control does in win32. This wraps up the richtextbox nicely.

    Reply
  2. Pingback: Managed Wrapper for TOM 2 / RichEdit 8 | Brad Smith's Coding Blog

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required