Building a Better Tag Cloud

Tag clouds are a mechanism for showing a list of topics/categories along with their relative frequencies of occurrence. This can useful for finding content on blogs, news sites, discussion boards, information systems and more. They can be textual or graphical, although the latter gives much greater flexibility and is the focus of this article.

Much thought can be given to the layout of a tag cloud. You will notice that the default WordPress tag cloud used on this blog simply lists the tags in alphabetical order and uses progressively larger font sizes to denote the more frequently-used ones. More sophisticated layouts try to position the most frequently-used tags in the center, with the least-used being furthest away. I have even explored force-directed algorithms for this purpose, although I later concluded that they were not well-suited to the problem.

The solution I ultimately adopted takes an iterative approach and uses fitness criteria to arrive at an optimal layout. It places each tag on the diagram in order from highest to lowest frequency. The position of a tag is determined only by the tags which preceded it. Starting at the center and working outwards, a position is chosen such that the tag does not collide with any previously-placed tag. Several criteria are used to compute the fitness of the solution if the tag were to remain at that position. This process is repeated for all directions (360 degrees). Each time a more optimal solution is found, the tag is moved to that position. When all angles have been exhausted, the tag will be at its most optimal position and control moves to the next tag.

The fitness criteria are as follows:

The smaller the total area of the tag cloud, the better
The more square* (i.e. least difference between width and height) of the tag cloud, the better
The less distance from the center of the tag cloud, the better

* Or some other aspect ratio

The Algorithm

Sort the collection of tags by frequency in descending order.
Let totalBounds represent the used bounds of the tag cloud, and be initialised at zero.
For each tag in the collection:
1. Let bestBounds represent the bounds of the most optimal solution so far, and be initialised at zero.
2. Let bestDist represent the distance from the center of the tag cloud for the most optimal solution, and be initialised at zero.
3. For each angle from 0 to 360 degrees:
  1. Let tagBounds represent the bounds (position and size) of tag, and be initialised at the center of the tag cloud.
  2. Let tagDist represent the distance from the center of the tag cloud, and be initialised at zero.
  3. Repeat the following:
    1. Move tagBounds to a point tagDist units from the center of the tag cloud at angle.
    2. Check whether tagBounds collides (intersects) with the bounds of any tags previously placed.
    3. If there are collisions, increment tagDist. Otherwise, exit the loop.
  4. Let isBest represent whether all of the following are true:
    - The area of (totalBounds ∪ tagBounds) is less than the area of bestBounds (or bestBounds is empty)
    - The difference between the width and height of (totalBounds ∪ tagBounds) is less than the difference between the width and height of bestBounds (or bestBounds is empty)
    - tagDist is less than bestDist (or bestDist is zero)
  5. If isBest is true:
    1. Set bestBounds to (totalBounds ∪ tagBounds).
    2. Set bestDist to tagDist.
    3. Move tag to the location of bestBounds.
4. Set totalBounds to bestBounds.

Implementation

My implementation of the algorithm above takes the form of a C# class named TagCloud, which represents a tag cloud and offers methods to apply the layout and render to a screen or bitmap.

Included is the ability to specify basic visual properties such as the font face, colour and style, as well as the gradient between the font size of the tags with the highest and lowest frequencies. I also provided a way to favour a particular aspect ratio (as not everyone wants a perfectly square tag cloud), although it should be noted that the aspect ratio of the final tag cloud will fall somewhere between the desired value and 1:1 (square).

Tags are represented by the TagItem class, which encapsulates the name and frequency. They are added to an order-unimportant collection, to be sorted during the layout process.

After calling the Arrange() method, the tag cloud can be rendered to a GDI+ Graphics object representing the screen, a bitmap or a printer. The Draw() method accepts a target rectangle within which the tag cloud will be automatically scaled and centered.

To facilitate interactivity, a HitTest() method is also provided to determine the tag at the specified coordinates.

Observations

Tags with the highest frequency occupy the center of the cloud as intended, and tend to stack vertically at first (as this fits the criteria for “squareness”).
Subsequent tags occupy the surrounding space and progressively move outward from the center as intended.
Shorter and lower-frequency tags sometimes fill in gaps between larger tags. This happens more as the difference in font size between the tags becomes more extreme.
Performance begins to degrade beyond about 128 tags. This seems acceptable when you consider how readability decreases as the number of tags increases.
The distribution of the tag frequencies affects the performance of the layout algorithm. When randomly generated using a normal distribution, it runs slower than a more typical distribution (where a few tags dominate and there are many tags with a very low frequency).

Performance

Although the algorithm contains 3 nested loops, the number of angles in the second level remains constant and thus the performance is O(n²). This can be demonstrated by progressively increasing the number of tags and measuring the number of cycles of the innermost loop:

Further Optimisations

By trying angles in a random (shuffled) order, the tags better fill the available space
Rather than trying all 360 directions, similar results can be achieved by choosing the first 90 from the shuffled set
Rather than increasing by 1 each time, distances can increase in increments comparable to the size of each tag
If the current most optimal distance has been exceeded, we can stop incrementing the distance and try another angle
If placing a tag does not cause the total bounds of the tag cloud to increase at all, we can skip over the remaining angles and place the next tag

Example Usage

TagCloud cloud = new TagCloud();

// add some tags
cloud.Items.Add(new TagItem("orange", 2));
cloud.Items.Add(new TagItem("red", 4));
cloud.Items.Add(new TagItem("green", 12));
cloud.Items.Add(new TagItem("pink", 96));
cloud.Items.Add(new TagItem("black", 1));
cloud.Items.Add(new TagItem("brown", 50));
cloud.Items.Add(new TagItem("yellow", 45));
cloud.Items.Add(new TagItem("purple", 32));
cloud.Items.Add(new TagItem("gold", 8));
cloud.Items.Add(new TagItem("silver", 7));

// apply layout
cloud.Arrange();

using (Bitmap bmp = new Bitmap(512, 512)) {
   // render to bitmap
   cloud.DrawToBitmap(bmp);

   // save bitmap
   bmp.Save("test.png", ImageFormat.Png);
}

Download

The source and binaries for this project are available on GitHub.

Final Thoughts

This implementation produces good results with reasonably good performance. I see potential for further optimisation (being able to achieve O(n log n) performance would be highly desirable!) and additional features, such as being able to vary text colour and other visual properties on a per-tag basis.

Share/Bookmark

Brad Smith’s Coding Blog

Windows apps, business systems and games in C# and C++