Excel Automation provides a wealth of possibilities when developing reporting & printing features for RAD business apps. By taking advantage of the fact that most of your end-users will be running the Microsoft Office suite, you can quickly add functionality to:
- Export tabular data from your application into a flexible medium
- Transform tabular data into chart form
- Harness spreadsheets as a data entry mechanism for your application
- Read and write data in the Excel file format(s)
Although Excel Automation can be very powerful, it also has a number of common pitfalls; the least of which can leave unwanted background processes behind and the worst can result in severe performance bottlenecks when exporting data. I’ve collected what I believe to be the best-practice methods for working with Excel, as well as some undocumented (or hard to find) tips and tricks to get the most out of this technology.
Assemblies, Namespaces and Types
In order to begin, you need to have Microsoft Office installed on the development machine, along with the Microsoft Office Primary Interop Assemblies. You’ll need to add references to Microsoft.Office.Core and Microsoft.Office.Interop.Excel to your project.
It’s enormously useful to use a namespace alias to shorten the fully-qualified names of the Excel classes and interfaces to something more readable. I suggest:
using Excel = Microsoft.Office.Interop.Excel;
The PIAs are wrappers for their equivalent COM Type Libraries and thus you tend to work with interfaces rather than classes; with the exception of the root-level Application object, you do not instantiate Excel types using constructors. You may also find that the objects returned through the API implement multiple interfaces (although this tends to feature more in other Office applications, particularly Outlook). A number of methods and properties in the API return unspecified object types, which you must cast into the appropriate interface (the MSDN reference will help you to identify the actual return values).
The types you will use most often in Excel Automation are:
- Application – controlling the instance of EXCEL.EXE, showing and hiding the application, creating/opening files and toggling performance-boosting options. Instantiated by constructing a new ApplicationClass object.
- Workbook – the collection of spreadsheets and charts in the file you are working with. Usually used as a stepping stone to a single sheet. Stored in the Application.Workbooks collection and created using Application.Workbooks.Add().
- Worksheet – provides access to the spreadsheet content, formatting, page setup, printing and saving. Stored in the Workbook.Worksheets collection and created using Workbook.Worksheets.Add().
- Range – represents a range of cells. Can be thought of as a selection. Cells are not accessed individually, therefore you must use this type to read/write cell values, apply formatting, copy/paste, etc. There are literally dozens of ways to get a Range object, for example via Worksheet.Cells.
Working with Application
As soon as you instantiate ApplicationClass, Excel is loaded into memory and your application connects to its OLE server. Be mindful of this when choosing when to construct that first automation object. Right from the point when you do this, there are some things to be aware of:
- Excel will keep on running until you properly dispose of the Application object. If you do not do this correctly, the process may linger until your app stops running or even beyond that point. You may end up with multiple instances of Excel too.
- Unless you instruct it otherwise, the Excel application will be visible to the user, potentially interruptible and will recalculate all formulae on the spreadsheet every time you change a cell’s value. Also, events on Excel types will not be fired unless you request otherwise.
- Any alterations to the above will persist after your code stops running.
You must, therefore, take care and ensure that you properly initialise before performing any automation functionality and then clean up immediately after you finish. You must take extra care to handle any exceptions that may be thrown during execution and run the clean-up code no matter what happens.
Simple Initialisation and Clean-up Example
using System;
using System.Runtime.InteropServices;
using Excel = Microsoft.Office.Interop.Excel;
public class ExcelDemo : IDisposable {
private Excel.Application mApplication;
public ExcelDemo() {
mApplication = new Excel.ApplicationClass(); // loads the Excel process into memory
mApplication.EnableEvents = true; // allows us to attach event handlers to Excel objects
mApplication.Visible = false; // hides the Excel application
mApplication.Interactive = false; // stops Excel from responding to user input
mApplication.ScreenUpdating = false; // boosts performance by preventing Excel from refreshing the GUI state
mApplication.Calculation
= Excel.XlCalculation.xlCalculationManual; // disables automatic calculation of formulae
}
~ExcelDemo() {
Dispose();
}
public void Dispose() {
if (mApplication != null) {
// restore interactivity, GUI updates and calculation
mApplication.Interactive = true;
mApplication.ScreenUpdating = true;
mApplication.Calculation = Excel.XlCalculation.xlCalculationAutomatic;
// exit without saving
mApplication.ActiveWorkbook.Close(false, Type.Missing, Type.Missing);
mApplication.Quit();
// release COM object
Marshal.FinalReleaseComObject(mApplication);
mApplication = null;
GC.Collect();
}
}
public void Go() {
// automation code goes here
}
}
In the above example, Excel is launched and prepared for efficient use in the constructor. Automation code can then be run inside the Go() method. When all operations have completed, the class can be disposed and will clean up after itself.
Working with Range
The Range interface is the mechanism that you will use to read/write cell values and apply formatting. There is no way to reference cells separately, other than to operate on a 1×1 Range object.
IMPORTANT: Retrieving Range objects is costly in terms of resources and processor time! You should write automation code that retrieves the minimum number of ranges required to complete the operation and include as many relevant cells in a range as possible. As soon as you begin to read/write non-trivial numbers of cells, you will see that accessing large numbers of small ranges creates a catastrophic bottleneck in your app. And it’s not just retrieving ranges that is costly, just about every operation you can perform on a range (e.g. writing values, merging cells, etc) has a significant overhead. Minimise the number of operations performed on Range objects.
To give you an example of the difference in performance possible if you reduce range retrievals, reuse range variables and minimise range operations, I was able to reduce the export time of a 16,000 row table from over 10 minutes to under 20 seconds.
Retrieving a Range
There are three main starting points to retrieving a range; you can work in absolute references (knowing a particular span of rows/columns), you can get a range that is relative to the position of another range or you can get a range that is relative to the size of another range:
Excel.Worksheet sheet = mApplication.ActiveSheet;
// absolute range (3x10 cells)
Excel.Range absolute = sheet.get_Range("A1:C10", Type.Missing);
// relative to position (3x10 cells, starting at D1)
Excel.Range relativeToPos = absolute.get_Offset(3, 0);
// relative to size (2x2 cells, starting at D1)
Excel.Range relativeToSize = relativeToPos.get_Resize(2, 2);
The Worksheet.Cells property points to a range representing the superset of all cells in the sheet. (Note that every time you access this property, you are retrieving a new Range object and thus incurring overheads!)
In the rare situation where you only want a 1×1 range, you can use the indexer on the Range interface:
IMPORTANT: Like most Office applications, Excel numbers its arrays from 1, rather then zero!
// indexer (top-left cell)
Excel.Range singleCell = (Excel.Range)sheet.Cells[1,1];
Reading and Writing Values
Since we want to minimise the number of ranges and range operations, it’s fortunate that we can read/write multiple cell values in just a single statement using the Range.Value2 property (if you’re wondering why it’s called Value2, there is a Value property which uses native types). For a 1×1 range (which you shouldn’t be using), you can get/set a scalar value. For all multi-cell ranges, cell values come in the form of a 2-dimensional object array:
// getting values
object[,] values = absolute.Value2;
for (int row = 0; row < values.GetLength(0); row++) {
for (int col = 0; col < values.GetLength(1); col++) {
Console.Write("{0} ", values[row, col]);
}
Console.WriteLine();
}
// setting values
values = new object[,] {
{1, "Blah"},
{"3", 4.5}
};
relativeToSize.Value2 = values;
Excel stores cell values as variant types, and will perform parsing/type-interpretation based on the format of each cell, so a string literal containing only an integer will be treated in the same way as a constant integer.
Formatting
As in the Excel application, it is best to set cell formatting before writing values to the spreadsheet – this way, you can coerce values to be interpreted as specific data types, or force values to be interpreted as text. Most basic text formatting is self-explanatory for the Range interface (e.g. Range.Font controls the font style, size and colour, Range.Interior determines the fill, etc), however I will give special mention to some areas which are not so well-documented:
Colours
All colours in Excel use a 32-bit RGBA representation, expressed as an Int32. In contrast, all colours in GDI+ (and hence the rest of the .NET Framework) use the ARGB representation. In order to set cell colours or make comparisons with the System.Drawing.Color class, you will need to write a wrapper:
public struct ExcelColor {
byte R;
byte G;
byte B;
byte A;
public ExcelColor(byte r, byte g, byte b, byte a) {
A = a;
R = r;
G = g;
B = b;
}
public static implicit operator System.Drawing.Color(ExcelColor that) {
return System.Drawing.Color.FromArgb(that.A, that.R, that.G, that.B);
}
public static implicit operator ExcelColor(System.Drawing.Color that) {
return new ExcelColor(that.R, that.G, that.B, that.A);
}
public static explicit operator int(ExcelColor that) {
return BitConverter.ToInt32(new byte[] { that.R, that.G, that.B, that.A }, 0);
}
public static explicit operator ExcelColor(int that) {
byte[] bytes = BitConverter.GetBytes(that);
return new ExcelColor(bytes[0], bytes[1], bytes[2], 255); // Excel ignores the alpha component
}
}
You can then convert easily between each colour type:
// GDI+ colour to Excel colour
absolute.Font.Color = (int)(ExcelColor)System.Drawing.Color.Red;
// Excel colour to GDI+ colour
System.Drawing.Color c = (ExcelColor)(int)singleCell.Font.Color;
Console.WriteLine("{0}", c);
Finally, to set a cell’s interior colour to “No Fill”, you must assign the value -4142 to Range.Interior.ColorIndex instead (otherwise the colour will be interpreted as black instead of transparent).
Number Formats
To set the number format (or data type) for a range, you can pass a string value to the Range.NumberFormat property:
- For integral, decimal and currency values, pass a format string in the same format as used by the Excel application, e.g. #,##0.00 for a number with 2 decimal places and a thousands separator.
- For date and time values, pass a date format string, e.g. d MMM yyyy for something like 13 Mar 2010.
- To force the “Text” format (i.e. to force all values to be treated as text), pass the at-sign (@).
- To reset to the default “General” format, pass General or an empty string.
Other Range Operations
You can use Range.Insert() to insert a new row or column, or to paste-insert a range that has been copied using Range.Copy():
// insert a new row above A15
Excel.Range insertPos = (Excel.Range)sheet.Cells[1, 15];
insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);
// duplicate the first 15 rows
sheet.get_Range("$1:$15").Copy(Type.Missing);
insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);
You can use Range.Find() and Range.FindNext() to locate cells that match a particular search term (note that FindNext will loop back to the start of the range):
string firstMatch = null;
// set find options and get the first match
Excel.Range result = sheet.Cells.Find(
"search term",
Type.Missing,
Type.Missing,
Excel.XlLookAt.xlWhole,
Type.Missing,
Excel.XlSearchDirection.xlNext,
true,
Type.Missing,
Type.Missing
);
while (result != null) {
// get the cell address
string address = result.get_AddressLocal(true, true, Excel.XlReferenceStyle.xlA1, Type.Missing, Type.Missing);
// note the address of the first result. if we have returned to the start, break out of the loop
if (firstMatch == null)
firstMatch = address;
else if (firstMatch == address)
break;
Console.WriteLine(address); // show that this is one of the cells that matches the search term
// advance to the next result
result = sheet.Cells.FindNext(result);
}
Merged Cells
Some operations in Excel will throw an exception if performed on a range that contains merged cells. You can test the Range.MergeCells property to determine if the range contains merged cells, and if working with more than just a single cell, Range.MergeArea to retrieve a range representing the merged cells. When reading/writing values or applying formatting to a range with merged cells, operate on only the upper-left cell of the merge area.
Working with Worksheet
In addition to the Cells property, Worksheet also allows access to the drawing canvas via the Shapes property, as well as a range of printing/pagination settings. Use Worksheet.Activate() to change the active worksheet.
The worksheet also provides access to these key operations:
- SaveAs() – to save or export the sheet and/or workbook.
- PrintOutEx() – to print the sheet.
- PrintPreview() – to display the print preview dialog (modal) for the sheet.
Among the properties of Worksheet.PageSetup are the header/footer properties (e.g. CenterHeader, LeftFooter, etc) Each of these is a string value, and uses a set of escape codes to apply formatting and insert dynamic content:
- && – to insert a single ampersand
- &n (e.g. &9) – to set the font size
- &[Page] – page number
- &[Pages] – number of pages
- &[Date]
- &[Time]
- &”Font name”
- &B – bold
- &I – italic
- &U – underline
Final Words
This is by no means a comprehensive guide to Excel Automation, but it should provide you with the tools necessary to get started – and – to avoid falling into those common pitfalls I mentioned. In future posts, I hope to share some real-world applications for Excel Automation. The bulk of my experience in this area comes from building a template-driven reporting system that uses Excel for its presentation. Although I cannot provide much from that system (it is bound by intellectual property), I will be able to offer up some tasty tidbits 🙂