Technical Article

Large Excel Workbook Performance in Delphi with HotXLS

When a 300,000-row export blows through its memory budget, the row count typically gets the blame — and the row count is typically innocent. The expensive parts of a big workbook are the ones created as a side effect: a style pool that grows by one entry per cell because formatting was added within the loop, worksheet XML assembled as a single giant string at save time, a million identical formula bodies stored one by one. HotXLS, losLab's native Delphi library for XLS and XLSX files, gives you a specific lever for each of these costs. None of them is enabled by default, because each changes a trade-off, so knowing which lever matches which symptom is the actual performance skill.

UK teams should align this hotxls large workbook performance delphi workflow with local governance, audit, and data quality requirements before production release

Where a large workbook spends memory

There are two distinct memory regimes to reason about. During generation, the in-memory cell model grows with each cell you touch — values, formats, formulas all become objects or pool entries. During save, the default XLSX path additionally renders each worksheet's XML into a wide string before compressing it into the zip container, so peak usage is the model plus the largest sheet's serialised form. A job that survives the build loop and then dies within SaveAs is hitting the second regime, not the first — and the fix for one does nothing for the other.

File size follows a related rule: cells are only one contributor, alongside styles, shared strings, formulas, images and comments. An audit pass with ForEachCell and the per-sheet collection counts tells you which resource actually dominates a problem file before you optimise the wrong one. One measurement subtlety: Sheet.Cells.Count on the XLSX side reports the number of instantiated cells in the sparse store, not the area of the used range — a sheet whose data occupies a 1000-by-50 rectangle with half the cells empty counts roughly 25,000, not 50,000. That distinction matters when you compare a customer's 'huge' file against your fixtures, because used-range area and actual cell population can differ by an order of magnitude in sparse financial layouts.

StreamingWrite fixes the save path, not the build path

Setting TXLSXWorkbook.StreamingWrite := True switches SaveAs to a streaming serializer that writes worksheet XML directly into the zip stream, eliminating the per-sheet string intermediate. It defaults to False for behavioural compatibility, and turning it on is a one-line change:

Book := TXLSXWorkbook.Create;
try
  Sheet := Book.Sheets.Add('Bulk');
  for R := 1 to 100000 do
  begin
    Sheet.Cells[R, 1].Value := R;
    Sheet.Cells[R, 2].Value := 'Row ' + IntToStr(R);
    Sheet.Cells[R, 3].Value := R * 1.5;
  end;
  Book.StreamingWrite := True;   // sheet XML streams into the zip container
  Book.SaveAs('bulk.xlsx');
finally
  Book.Free;
end;

Be precise about what this buys: the cell model built by the loop occupies exactly as much memory as before. StreamingWrite flattens the save-time spike, which is the difference between a batch job that completes and one that fails at the 95% mark — but if the build loop itself exhausts memory, the levers you need are the next two.

Style pools: add once, reuse the index

XLSX formatting in HotXLS is pool-based: Book.Fonts.Add(...), Fills.AddSolid(...) and Borders.Add(...) return a 0-based pool index that cells reference. Calling Fonts.Add with identical parameters within a loop is deduplicated, so it wastes time instead of space — but Alignments.Add returns a fresh object per call, so per-cell alignment creation grows the pool linearly with row count. The robust habit covers both: resolve every pool index once, outside the loop, and assign indices within it.

// hoist pool lookups out of the hot loop
HeaderFont := Book.Fonts.Add('Calibri', 11, True, False);   // 0-based pool index
for C := 1 to 24 do
  Sheet.Cells[1, C].FontIndex := HeaderFont + 1;            // cells store 1-based; 0 = default

The + 1 is not a typo and forgetting it is the classic symptom-generating bug here: the pools hand out 0-based indices, whilst the cell-side properties treat 0 as 'default', so every pool index must be shifted by one on assignment. Get it wrong by omission and your headers silently render in the workbook's default font — a defect nobody notices until branding review.

Replace per-cell Variant traffic with row callbacks

Every Sheet.Cells[R, C].Value := X involves a cell lookup-or-create plus a Variant assignment. At a few hundred thousand cells, that per-access overhead becomes measurable in profiles. HotXLS provides bulk callback APIs on both facades — ForEachCell and ForEachRow for reading, WriteCells and WriteRows for writing — that move the iteration within the engine and hand your code whole rows at a time:

procedure TLedgerExport.FillRow(Sender: TObject;
  SheetIndex, Row, FirstCol, LastCol: Integer;
  var Values: Variant; var Skip: Boolean; var Cancel: Boolean);
begin
  if Row > FCount then
  begin
    Cancel := True;     // stop the whole write
    Exit;
  end;
  Values := VarArrayOf([FRows[Row - 1].Account,
                        FRows[Row - 1].PostedOn,
                        FRows[Row - 1].Amount]);
end;

// one engine call instead of hundreds of thousands of property hits
Sheet.WriteRows(1, 1, FCount, 3, FillRow);

The callback's Skip flag leaves a row untouched without aborting, and Cancel ends the operation early — useful when the source is a reader whose length you discover as you go. Pair WriteRows for the build with StreamingWrite for the save and the generation path has no remaining per-cell hot spot.

Read-side levers on the XLS facade

Large legacy .xls files have their own toolkit. _DisableGraphics := True before Open skips parsing the drawing layer entirely, which speeds up loading workbooks that carry years of accumulated shapes and embedded pictures — with one hard restriction: the drawing layer is then absent from the model, so saving such a workbook writes a file without its drawings. Reserve this flag for read-only analysis jobs. SetTempDir redirects the BIFF writer's temporary files, which matters on servers where the default temp location has a quota or sits on slow storage. UseSharedFormulas groups repeated formula bodies into shared-formula records, shrinking files where a formula column repeats down sixty thousand rows.

Read loops over XLS data have an indexing trap worth flagging because it doubles work when handled defensively and corrupts results when missed: UsedRange reports its FirstRow, LastRow, FirstCol and LastCol boundaries 0-based, whilst Cells.Item[Row, Col] is 1-based. A scan that walks the used range must add one to each coordinate at the cell access — Cells.Item[Row + 1, Col + 1] — or it reads a grid shifted diagonally by one cell, silently dropping the last row and column and including a phantom first one. The ForEachCell callback sidesteps the mismatch entirely, which is one more reason to prefer it for whole-sheet scans.

Probe files before loading them

The cheapest large-workbook operation is the one you avoid. GetSheetNames on both facades lists a file's worksheets without loading cell data — the XLSX implementation reads only the workbook manifest within the zip and explicitly leaves the workbook instance unpopulated, and the XLS facade stops scanning at the first substream boundary. That makes it the right pre-flight check for 'which sheet should this import job target', and CanReadEncrypted answers 'is this an encrypted container' before a doomed Open attempt.

Names := TStringList.Create;
Book := TXLSXWorkbook.Create;
try
  if Book.GetSheetNames('big-unknown.xlsx', Names) <= 0 then
    raise Exception.Create('cannot enumerate sheets');   // failure clears the list
  // pick the target sheet, then decide whether a full Open is worth it
finally
  Book.Free;
  Names.Free;
end;

Note the return-code convention: these probing functions signal failure with values at or below zero and empty the output list, so test <= 0 instead of comparing against one specific success value.

Sizing the approach to the job

For unattended pipelines that generate many large files in sequence, two more habits round out the picture. Workbook objects are not thread-safe for sharing, but nothing stops one independent workbook per worker thread, which parallelizes batch conversion cleanly. And when output goes to HTTP instead of disk, the TStream save overloads combine with StreamingWrite so a large response never materializes as a temp file — with one operational footnote: the stream save writes from the current position without rewinding, so set Position := 0 before handing the stream to the response framework. The streaming write and batch job article develops that server-side pattern, and the database export article shows where these levers slot into a dataset-driven report.

Finally, keep one worst-case fixture per report family and time it in CI. Performance regressions in document generation rarely announce themselves — a style added within a loop or a probe replaced with a full Open changes nothing functionally, and the nightly batch simply takes forty minutes longer. A timed test on a representative half-million-cell fixture turns that drift into a red build instead of an operations incident.

Evaluation builds, demo projects with a bulk-generation example, and the full API reference are available on the HotXLS Component page.