Excel Automation provides a wealth of possibilities when developing reporting & printing features for RAD business apps. By taking advantage of the fact that most of your end-users will be running the Microsoft Office suite, you can quickly add functionality to:

  • Export tabular data from your application into a flexible medium
  • Transform tabular data into chart form
  • Harness spreadsheets as a data entry mechanism for your application
  • Read and write data in the Excel file format(s)

Although Excel Automation can be very powerful, it also has a number of common pitfalls; the least of which can leave unwanted background processes behind and the worst can result in severe performance bottlenecks when exporting data. I’ve collected what I believe to be the best-practice methods for working with Excel, as well as some undocumented (or hard to find) tips and tricks to get the most out of this technology.

Assemblies, Namespaces and Types

In order to begin, you need to have Microsoft Office installed on the development machine, along with the Microsoft Office Primary Interop Assemblies. You’ll need to add references to Microsoft.Office.Core and Microsoft.Office.Interop.Excel to your project.

It’s enormously useful to use a namespace alias to shorten the fully-qualified names of the Excel classes and interfaces to something more readable. I suggest:

using Excel = Microsoft.Office.Interop.Excel;

The PIAs are wrappers for their equivalent COM Type Libraries and thus you tend to work with interfaces rather than classes; with the exception of the root-level Application object, you do not instantiate Excel types using constructors. You may also find that the objects returned through the API implement multiple interfaces (although this tends to feature more in other Office applications, particularly Outlook). A number of methods and properties in the API return unspecified object types, which you must cast into the appropriate interface (the MSDN reference will help you to identify the actual return values).

The types you will use most often in Excel Automation are:

  • Application – controlling the instance of EXCEL.EXE, showing and hiding the application, creating/opening files and toggling performance-boosting options. Instantiated by constructing a new ApplicationClass object.
  • Workbook – the collection of spreadsheets and charts in the file you are working with. Usually used as a stepping stone to a single sheet. Stored in the Application.Workbooks collection and created using Application.Workbooks.Add().
  • Worksheet – provides access to the spreadsheet content, formatting, page setup, printing and saving. Stored in the Workbook.Worksheets collection and created using Workbook.Worksheets.Add().
  • Range – represents a range of cells. Can be thought of as a selection. Cells are not accessed individually, therefore you must use this type to read/write cell values, apply formatting, copy/paste, etc. There are literally dozens of ways to get a Range object, for example via Worksheet.Cells.

Working with Application

As soon as you instantiate ApplicationClass, Excel is loaded into memory and your application connects to its OLE server. Be mindful of this when choosing when to construct that first automation object. Right from the point when you do this, there are some things to be aware of:

  • Excel will keep on running until you properly dispose of the Application object. If you do not do this correctly, the process may linger until your app stops running or even beyond that point. You may end up with multiple instances of Excel too.
  • Unless you instruct it otherwise, the Excel application will be visible to the user, potentially interruptible and will recalculate all formulae on the spreadsheet every time you change a cell’s value. Also, events on Excel types will not be fired unless you request otherwise.
  • Any alterations to the above will persist after your code stops running.

You must, therefore, take care and ensure that you properly initialise before performing any automation functionality and then clean up immediately after you finish. You must take extra care to handle any exceptions that may be thrown during execution and run the clean-up code no matter what happens.

Simple Initialisation and Clean-up Example

using System;
using System.Runtime.InteropServices;
using Excel = Microsoft.Office.Interop.Excel;

public class ExcelDemo : IDisposable {

    private Excel.Application mApplication;

    public ExcelDemo() {
        mApplication = new Excel.ApplicationClass();        // loads the Excel process into memory
        mApplication.EnableEvents = true;                   // allows us to attach event handlers to Excel objects
        mApplication.Visible = false;                       // hides the Excel application
        mApplication.Interactive = false;                   // stops Excel from responding to user input
        mApplication.ScreenUpdating = false;                // boosts performance by preventing Excel from refreshing the GUI state
        mApplication.Calculation
            = Excel.XlCalculation.xlCalculationManual;      // disables automatic calculation of formulae
    }

    ~ExcelDemo() {
        Dispose();
    }

    public void Dispose() {
        if (mApplication != null) {
            // restore interactivity, GUI updates and calculation
            mApplication.Interactive = true;
            mApplication.ScreenUpdating = true;
            mApplication.Calculation = Excel.XlCalculation.xlCalculationAutomatic;

            // exit without saving
            mApplication.ActiveWorkbook.Close(false, Type.Missing, Type.Missing);
            mApplication.Quit();

            // release COM object
            Marshal.FinalReleaseComObject(mApplication);
            mApplication = null;
            GC.Collect();
        }
    }

    public void Go() {
        // automation code goes here
    }
}

In the above example, Excel is launched and prepared for efficient use in the constructor. Automation code can then be run inside the Go() method. When all operations have completed, the class can be disposed and will clean up after itself.

Working with Range

The Range interface is the mechanism that you will use to read/write cell values and apply formatting. There is no way to reference cells separately, other than to operate on a 1×1 Range object.

IMPORTANT: Retrieving Range objects is costly in terms of resources and processor time! You should write automation code that retrieves the minimum number of ranges required to complete the operation and include as many relevant cells in a range as possible. As soon as you begin to read/write non-trivial numbers of cells, you will see that accessing large numbers of small ranges creates a catastrophic bottleneck in your app. And it’s not just retrieving ranges that is costly, just about every operation you can perform on a range (e.g. writing values, merging cells, etc) has a significant overhead. Minimise the number of operations performed on Range objects.

To give you an example of the difference in performance possible if you reduce range retrievals, reuse range variables and minimise range operations, I was able to reduce the export time of a 16,000 row table from over 10 minutes to under 20 seconds.

Retrieving a Range

There are three main starting points to retrieving a range; you can work in absolute references (knowing a particular span of rows/columns), you can get a range that is relative to the position of another range or you can get a range that is relative to the size of another range:

Excel.Worksheet sheet = mApplication.ActiveSheet;

// absolute range (3x10 cells)
Excel.Range absolute = sheet.get_Range("A1:C10", Type.Missing);

// relative to position (3x10 cells, starting at D1)
Excel.Range relativeToPos = absolute.get_Offset(3, 0);

// relative to size (2x2 cells, starting at D1)
Excel.Range relativeToSize = relativeToPos.get_Resize(2, 2);

The Worksheet.Cells property points to a range representing the superset of all cells in the sheet. (Note that every time you access this property, you are retrieving a new Range object and thus incurring overheads!)

In the rare situation where you only want a 1×1 range, you can use the indexer on the Range interface:

IMPORTANT: Like most Office applications, Excel numbers its arrays from 1, rather then zero!

// indexer (top-left cell)
Excel.Range singleCell = (Excel.Range)sheet.Cells[1,1];

Reading and Writing Values

Since we want to minimise the number of ranges and range operations, it’s fortunate that we can read/write multiple cell values in just a single statement using the Range.Value2 property (if you’re wondering why it’s called Value2, there is a Value property which uses native types). For a 1×1 range (which you shouldn’t be using), you can get/set a scalar value. For all multi-cell ranges, cell values come in the form of a 2-dimensional object array:

// getting values
object[,] values = absolute.Value2;
for (int row = 0; row < values.GetLength(0); row++) {
    for (int col = 0; col < values.GetLength(1); col++) {
        Console.Write("{0} ", values[row, col]);
    }
    Console.WriteLine();
}

// setting values
values = new object[,] {
    {1, "Blah"},
    {"3", 4.5}
};
relativeToSize.Value2 = values;

Excel stores cell values as variant types, and will perform parsing/type-interpretation based on the format of each cell, so a string literal containing only an integer will be treated in the same way as a constant integer.

Formatting

As in the Excel application, it is best to set cell formatting before writing values to the spreadsheet – this way, you can coerce values to be interpreted as specific data types, or force values to be interpreted as text. Most basic text formatting is self-explanatory for the Range interface (e.g. Range.Font controls the font style, size and colour, Range.Interior determines the fill, etc), however I will give special mention to some areas which are not so well-documented:

Colours

All colours in Excel use a 32-bit RGBA representation, expressed as an Int32. In contrast, all colours in GDI+ (and hence the rest of the .NET Framework) use the ARGB representation. In order to set cell colours or make comparisons with the System.Drawing.Color class, you will need to write a wrapper:

public struct ExcelColor {
    byte R;
    byte G;
    byte B;
    byte A;

    public ExcelColor(byte r, byte g, byte b, byte a) {
        A = a;
        R = r;
        G = g;
        B = b;
    }

    public static implicit operator System.Drawing.Color(ExcelColor that) {
        return System.Drawing.Color.FromArgb(that.A, that.R, that.G, that.B);
    }

    public static implicit operator ExcelColor(System.Drawing.Color that) {
        return new ExcelColor(that.R, that.G, that.B, that.A);
    }

    public static explicit operator int(ExcelColor that) {
        return BitConverter.ToInt32(new byte[] { that.R, that.G, that.B, that.A }, 0);
    }

    public static explicit operator ExcelColor(int that) {
        byte[] bytes = BitConverter.GetBytes(that);
        return new ExcelColor(bytes[0], bytes[1], bytes[2], 255); // Excel ignores the alpha component
    }
}

You can then convert easily between each colour type:

// GDI+ colour to Excel colour
absolute.Font.Color = (int)(ExcelColor)System.Drawing.Color.Red;

// Excel colour to GDI+ colour
System.Drawing.Color c = (ExcelColor)(int)singleCell.Font.Color;
Console.WriteLine("{0}", c);

Finally, to set a cell’s interior colour to “No Fill”, you must assign the value -4142 to Range.Interior.ColorIndex instead (otherwise the colour will be interpreted as black instead of transparent).

Number Formats

To set the number format (or data type) for a range, you can pass a string value to the Range.NumberFormat property:

  • For integral, decimal and currency values, pass a format string in the same format as used by the Excel application, e.g. #,##0.00 for a number with 2 decimal places and a thousands separator.
  • For date and time values, pass a date format string, e.g. d MMM yyyy for something like 13 Mar 2010.
  • To force the “Text” format (i.e. to force all values to be treated as text), pass the at-sign (@).
  • To reset to the default “General” format, pass General or an empty string.

Other Range Operations

You can use Range.Insert() to insert a new row or column, or to paste-insert a range that has been copied using Range.Copy():

// insert a new row above A15
Excel.Range insertPos = (Excel.Range)sheet.Cells[1, 15];
insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);

// duplicate the first 15 rows
sheet.get_Range("$1:$15").Copy(Type.Missing);
insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);

You can use Range.Find() and Range.FindNext() to locate cells that match a particular search term (note that FindNext will loop back to the start of the range):

string firstMatch = null;

// set find options and get the first match
Excel.Range result = sheet.Cells.Find(
    "search term",
    Type.Missing,
    Type.Missing,
    Excel.XlLookAt.xlWhole,
    Type.Missing,
    Excel.XlSearchDirection.xlNext,
    true,
    Type.Missing,
    Type.Missing
);

while (result != null) {
    // get the cell address
    string address = result.get_AddressLocal(true, true, Excel.XlReferenceStyle.xlA1, Type.Missing, Type.Missing);

    // note the address of the first result. if we have returned to the start, break out of the loop
    if (firstMatch == null)
        firstMatch = address;
    else if (firstMatch == address)
        break;

    Console.WriteLine(address);    // show that this is one of the cells that matches the search term

    // advance to the next result
    result = sheet.Cells.FindNext(result);
}

Merged Cells

Some operations in Excel will throw an exception if performed on a range that contains merged cells. You can test the Range.MergeCells property to determine if the range contains merged cells, and if working with more than just a single cell, Range.MergeArea to retrieve a range representing the merged cells. When reading/writing values or applying formatting to a range with merged cells, operate on only the upper-left cell of the merge area.

Working with Worksheet

In addition to the Cells property, Worksheet also allows access to the drawing canvas via the Shapes property, as well as a range of printing/pagination settings. Use Worksheet.Activate() to change the active worksheet.

The worksheet also provides access to these key operations:

  • SaveAs() – to save or export the sheet and/or workbook.
  • PrintOutEx() – to print the sheet.
  • PrintPreview() – to display the print preview dialog (modal) for the sheet.

Among the properties of Worksheet.PageSetup are the header/footer properties (e.g. CenterHeader, LeftFooter, etc) Each of these is a string value, and uses a set of escape codes to apply formatting and insert dynamic content:

  • && – to insert a single ampersand
  • &n (e.g. &9) – to set the font size
  • &[Page] – page number
  • &[Pages] – number of pages
  • &[Date]
  • &[Time]
  • &”Font name”
  • &B – bold
  • &I – italic
  • &U – underline

Final Words

This is by no means a comprehensive guide to Excel Automation, but it should provide you with the tools necessary to get started – and – to avoid falling into those common pitfalls I mentioned. In future posts, I hope to share some real-world applications for Excel Automation. The bulk of my experience in this area comes from building a template-driven reporting system that uses Excel for its presentation. Although I cannot provide much from that system (it is bound by intellectual property), I will be able to offer up some tasty tidbits 🙂

18 thoughts on “A Guide to Excel Automation

  1. vote

    Hello,

    Thank you for your very informative articles on Excel automation. I have a VB (pre-.Net) background, but no C#, and I’m trying to automate Excel using C#. One thing that I’m trying to do is get/set a worksheet’s tab color. Actually, I was able to set the color using this code:

    LastSheet.Tab.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.ForestGreen);

    But I was not able to retrieve the tab’s color value into a variable. I tried at least a dozen different ways. Finally, I came across your article on Excel colors (or colours!) and I thought my prayers were answered. Not so, at least not yet…

    I implemented your struct, and then I first tried it to set the tab color (even though I could already do that):

    LastSheet.Tab.Color = (int)(ExcelColor)System.Drawing.Color.ForestGreen;

    And this works fine!

    Then I tried to get the color:

    System.Drawing.Color currentColor;
    currentColor = (ExcelColor)(int)LastSheet.Tab.Color;
    MessageBox.Show(“Current color is : ” + currentColor.ToString());

    But this does not work – it compiles fine, but at runtime I get the error “Specified cast is not valid”.

    Can you tell me what I’m doing wrong here?

    Thanks again.
    John

    Reply
    • +2
      vote

      John: It’s possible that the Color property of the Tab is null; in which case it will not cast to int, and therefore will not cast to ExcelColor either. You should handle for a situation where the Tab’s Color is null (which I presume means it’s using the default color).

      Reply
  2. -1
    vote

    Hi Brad,
    i’m trying to convert the excel to PDF and using PDF factory pro (4.8.1) for same. Below is the c# code i’m using
    var workbook = workbooks.Open(fileToConvert, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing);

    workbook.PrintOut(Type.Missing, Type.Missing, 1, false, Type.Missing, false, Type.Missing,
    Type.Missing);

    workbook.PrintOutEx(Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, true, Type.Missing, path);

    Default printer is PDF Factory pro for the system.

    Issue i’m facing is that PDF factory pro is reading the excel file as 0 byte and hence doesn’t performs the conversion. I see that in the printer tray by looking at the submitted print job’s properties.

    Any idea why is it happening?

    Thanks,
    Mrids

    Reply
    • -2
      vote

      Unfortunately i’ve never worked with PDF Factory Pro, so there’s not much advice I can offer. However, when you’re calling PrintOutEx(), you are instructing Excel to print to a file. Try this technique instead:

      workbook.PrintOutEx(Type.Missing, Type.Missing, 1, false, Type.Missing, false, Type.Missing, Type.Missing, Type.Missing);

      Reply
  3. vote

    Hi,
    I wondering if you know a way to disable all ribbons but not mine or how to set my ribbon as default when excel begins?

    Reply
    • +1
      vote

      To achieve this, you’ll need to create a ribbon XML file using Visual Studio Tools for Office. This link explains the process in more detail (the principles are the same for all Office applications with the ribbon interface).

      Reply
  4. -1
    vote

    Hi, your post has been of most use to me. But I have a question, I’m trying to get data from a range, which I did, but the data comes without the format, just the cell value.
    Is there a way to retrieve only the formatted value of a cell or a range?
    Thanks for the amazing post and thanks in advance for your attention.

    Reply
    • vote

      If you use the Value2 property on a Range object, you will only get the cell value(s). Depending on what formatting you are interested in, the individual settings can be found in the Font, Interior, NumberFormat and other properties of the Range. If you want to get the cell values and formatting in a format that can be readily used outside of Excel (or the object model), you could copy a Range to the clipboard (using Range.Copy()) and probe the clipboard for a format that includes formatting, such as HTML or RTF.

      Reply
  5. vote

    Hi,

    Do you have any ideas on how to get Excel to improve calculation speed when using C# automation viz. a viz when running a workbook as per normal.

    So I want to open an Excel instance, calculate my entire workbook and record time taken. Then I want to do the same calculation calling it via C# interop and also record the time. What I am hoping for is to get a performance gain from doing it via C# interop.

    Other than screenupdating = false, visibility = false, etc. can one get improved calculation time from using C# automation? If so how?

    Reply
    • -1
      vote

      I’m afraid calculation via interop is never going to be faster than doing the calculation natively within Excel; there are huge overheads associating with marshalling data from .NET to OLE, interprocess communication, etc. Automation should be the technique you use when no other alternatives exist (e.g. OpenXML SDK or third-party Excel libraries).

      Reply
  6. vote

    Hi Brad,

    I just want to consult on my problem with my code. I’m getting a COM exception on this specific line -> oWB.XmlImport(xmlDataFile, out xmlMap, true, Type.Missing). Any thoughts on why this is happening? I’m using Excel 2016 which is activated but on other excel versions I’m not getting that exception.

    Here’s my code:

    private ExcelInterop._Workbook oWB = null;
    private ExcelInterop.Application oXL = null;
    private ExcelInterop.Workbooks oWBs = null;
    private ExcelInterop._Workbook oWB = null;
    oXL = new ExcelInterop.Application();
    oXL.Visible = false;
    oXL.DisplayAlerts = false;

    // open
    oWBs = oXL.Workbooks;
    oWB = oWBs.Open(report.TemplatePath,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing);

    ExcelInterop.XmlMap xmlMap = null;
    try
    {
    foreach (string xmlDataFile in report.XmlDataFiles)
    {
    bool failed = false;

    do
    {
    try
    {
    oWB.XmlImport(xmlDataFile, out xmlMap, true, Type.Missing);
    failed = false;
    }
    catch (System.Runtime.InteropServices.COMException e)
    {
    failed = true;
    }
    System.Threading.Thread.Sleep(10);
    } while (failed);

    }
    Thread.Sleep(report.ImportDelay);
    oWB.RefreshAll();
    Thread.Sleep(report.ImportDelay);
    }
    finally
    {
    release(xmlMap);
    }

    Reply
  7. vote

    hi, i want to use Range.MergeArea
    in vb net., but i don´t know how to use Range.MergeArea.
    Can you give me an example?
    Thanks..

    Reply
    • vote

      If you get the value of the MergeArea property for a Range object that contains any cells that are merged, it will return a new Range object that represents all of the cells that are merged. For example, say that cells A2, A3, B2 and B3 are all merged:

      (I don’t use VB.NET so the following example is in C#)

      Excel.Worksheet sheet = Application.ActiveSheet;
      
      // this will output 'The merge area consists of 4 cells.'
      Excel.Range cellThatIsMerged = sheet.get_Range("A2");
      Console.WriteLine("The merge area consist of {0} cells.", cellThatIsMerged.MergeArea.Cells.Count);
      
      // this will output 'The merge area consists of 1 cell.'
      Excel.Range cellThatIsNotMerged = sheet.get_Range("A1");
      Console.WriteLine("The merge area consist of {0} cell.", cellThatIsNotMerged.MergeArea.Cells.Count);
      
      Reply
  8. vote

    Hi ,

    I would like to refresh Excel with a web query. Web query uses basic authentication.
    Once I open excel programmatically windows security box appears which takes Username and password.
    how should I add my credentials to windows security dialogue?

    Many Thanks

    Reply

Leave a reply to Ben Cancel reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required