During the development of an Excel-based report generation system, I frequently had to grapple with the appaulingly slow speed of Office Automation using VSTO. Virtually every call to a method or property on one of the COM objects exposed by VSTO carries with it a significant performance cost. Thankfully, it is possible to read/write cell values using arrays, allowing huge amounts of data to be ferried between the CLR and Excel with minimal calls. Unfortunately, however, there are some operations that still require a lot of method calls.

One such example is the task of inserting a large number of rows in an existing worksheet. I found myself in a situation where we had to ‘finish’ pro-forma spreadsheets with dynamic data by inserting the appropriate number of rows into the sheet (maintaining the formatting in the first ‘pro-forma row’) and then populating the values. The latter part is simple, using the array technique mentioned earlier, but the process of inserting the rows – which can number in the thousands in some cases – was causing a huge performance hit.

It’s not often that algorithms have to be designed specifically to minimise the number of method calls, but I think i’ve come up with a winner. I exploit the fact that Excel is able to insert multiple rows from the clipboard, and make use of the efficiency gained by using powers of 2. The algorithm is structured thusly:

  1. Let index represent the number of the row we most recently inserted.
  2. Let step represent the number of rows we can insert per method call (will always be a power of 2).
  3. While indexis less than the total number of rows to be inserted:
    1. Copy the range of rows starting at the insert position and spanning step rows.
    2. Insert the copied rows at the insert position.
    3. Increment index by step.
    4. Let diff represent the number of rows remaining to be inserted.
    5. If diff is greater than the number of rows represented by twice the value of step, double the value of step.
    6. Otherwise, set step to the value of diff.

So, essentially, the number of rows inserted doubles on each iteration (taking advantage of the fact that there are more and more rows that can be copied each time) – until we pass the half-way point, then only one further insertion occurs. As a result, the number of method calls on Excel objects approximates to O(log n).

Source Code

// using Excel = Microsoft.Office.Interop.Excel;

void InsertRows(Excel.Range firstRow, int rowCount) {
    Excel.Worksheet worksheet = firstRow.Worksheet;
    Excel.Range insertPos = worksheet.Cells[1, firstRow.Row + 1].EntireRow;
    int i = 1;
    int step = 1;
    while (i < rowCount) {
        // copy the existing row(s)
        worksheet.get_Range(
            "$" + (firstRow.Row) + ":$" + (firstRow.Row + step - 1),
            Type.Missing
        ).Copy(Type.Missing);

        // insert copied rows
        insertPos.Insert(Excel.XlInsertShiftDirection.xlShiftDown, Type.Missing);

        i += step;

        int diff = (rowCount - i);
        if (diff > (step * 2))
            step *= 2;
        else
            step = diff;
    }
}

4 thoughts on “Efficiently Inserting Many Rows into an Excel Spreadsheet

  1. +1
    vote

    Thanks Bradley. Really useful!
    I just had a problem: line 5 x and y seem to be interchanged.
    I modified with that and it was perfect:
    Excel.Range insertPos = worksheet.Cells[1, firstRow.Row + 1].EntireRow;

    Reply

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required