Technical Article

Listing Sheet Names Fast in Delphi with HotXLS GetSheetNames

An upload endpoint receives a 180 MB .xlsx from a customer, and the only question the intake code needs answered is whether the workbook contains a sheet named "Mapping". Fully opening that file means parsing the shared string table, the style part, and every worksheet's XML - hundreds of megabytes of allocations and double-digit seconds of CPU - to read a list that lives in a single small XML part. HotXLS, losLab's native Delphi spreadsheet library, exposes that list directly: GetSheetNames returns the worksheet names, in workbook order, without materializing a single cell. Used as an intake gate, it turns "open everything and see" into a millisecond decision.

Where sheet names actually live in each format

Both spreadsheet formats were designed with their table of contents up front. An OOXML package keeps the sheet catalog in xl/workbook.xml - a part that stays a few kilobytes whether the workbook holds ten rows or ten million. A BIFF8 .xls stores BoundSheet records near the start of the workbook globals stream, before any cell data. A full Open ignores that gift: it inflates the shared string table, decodes the style records, and walks every sheet substream, because it has to assume you will touch any cell. A listing call reads just the catalog and stops.

The asymmetry is worth quantifying when you argue for the extra code path: on a large workbook the difference between listing and loading is three to four orders of magnitude in both bytes touched and allocations made, and unlike a full load, the listing cost does not grow with row count.

One call, four container formats

TXLSWorkbook.GetSheetNames on the XLS facade accepts not only .xls but also the zip-based .xlsx, .xlsm, .xltx, and .xltm, where it reads only workbook.xml out of the archive. For .xls input it scans BoundSheet records and stops at the first EOF record of the globals substream, so even a very large binary file costs only its first few kilobytes. On the XLSX facade, TXLSXWorkbook.GetSheetNames is documented with a guarantee that matters for service code: the workbook instance is neither reset nor populated by the call, so an instance that already holds an open document can safely probe other files. GetODSSheetNames extends the same idea to OpenDocument packages, and stream overloads of all of these let you probe an upload that never touches disk.

var
  Book: TXLSXWorkbook;
  Names: TStringList;
  I: Integer;
begin
  Names := TStringList.Create;
  Book := TXLSXWorkbook.Create;
  try
    if Book.GetSheetNames('upload-7f3a.xlsx', Names) <= 0 then
      raise Exception.Create('unreadable workbook package');
    if Names.IndexOf('Mapping') < 0 then
      raise Exception.Create('required Mapping sheet is missing');
    for I := 0 to Names.Count - 1 do
      Writeln(Format('sheet %d: %s', [I, Names[I]]));
  finally
    Book.Free;
    Names.Free;
  end;
end;

The same call backs a desktop import dialog nicely: list the sheets, let the user pick one, and only then pay for the full open. Users with fifty-sheet workbooks notice the difference between a picker that appears instantly and one that appears after the whole file loads.

The extension coverage has a quiet operational benefit too: macro-enabled .xlsm files and template formats list exactly like plain .xlsx, because the sheet catalog lives in the same workbook.xml part regardless of whether a vbaProject.bin rides along in the package. An intake pipeline can therefore enumerate the sheets of a macro workbook for routing purposes without ever touching - or executing anything related to - the macro payload, and defer the macro policy decision to the stage that actually opens the file.

Reading the return value without fooling yourself

HotXLS return conventions are not uniform across the library - some calls return 1 for success, others return counts - so the robust check for the listing functions is <= 0 means failure, in which case the string list has been cleared. Do not special-case an empty list as "workbook with no sheets": both ECMA-376 and the BIFF8 specification require at least one sheet in a valid workbook, so zero names is always the failure path, never a degenerate success.

Failure here is also a useful signal in its own right. A file with an .xlsx extension that fails the listing call is either truncated, not actually an OOXML package (mislabeled CSV exports from other systems are a perennial source), or an encrypted container - and distinguishing those cases is exactly what the next check is for. Logging the failed file's first bytes alongside the rejection makes the support conversation a one-message exchange instead of a guessing game.

Encrypted containers: detect, do not guess

An encrypted .xlsx is not a zip at all. Per the Microsoft Office cryptography specification it is an OLE compound file wrapping EncryptionInfo and EncryptedPackage streams, which means GetSheetNames cannot see inside it and will simply fail. CanReadEncrypted probes for that container shape so intake can route the file deliberately instead of logging a generic read error from deep inside a worker:

type
  TIntakeRoute = (irNormal, irNeedsPassword, irUnreadable);

function ClassifyUpload(const FileName: string; Names: TStrings): TIntakeRoute;
var
  Book: TXLSXWorkbook;
begin
  Book := TXLSXWorkbook.Create;
  try
    // Encrypted OOXML is an OLE container, not a zip: check first,
    // because the listing calls cannot look inside it.
    if Book.CanReadEncrypted(FileName) then
      Exit(irNeedsPassword);
    if SameText(ExtractFileExt(FileName), '.ods') then
    begin
      if Book.GetODSSheetNames(FileName, Names) <= 0 then
        Exit(irUnreadable);
    end
    else if Book.GetSheetNames(FileName, Names) <= 0 then
      Exit(irUnreadable);
    Result := irNormal;
  finally
    Book.Free;
  end;
end;

Routing encrypted files deserves precision, because HotXLS is asymmetric here. Legacy .xls encryption (RC4, RC4 CryptoAPI, XOR) is readable - TXLSWorkbook.Open(FileName, Password) decrypts with a stored password, so those files can stay in the automated path. Encrypted OOXML packages, by contrast, can be produced by HotXLS but not read back: OpenEncrypted raises EXlsxEncryptionNotImplemented for an encrypted package. The honest intake design sends encrypted .xlsx to a human with Excel, and the password-bearing .xls to code.

Worth noting for batch operators: the classifier above runs comfortably over an entire incoming directory before any worker starts real processing, because each probe costs roughly a file-open and a few kilobytes of reads. Front-loading the classification turns "the 3 a.m. batch died on file 412 of 600" into "412 files queued, 5 rejected at intake with reasons" - same library calls, very different operational story.

What the lightweight probe cannot tell you

Names and order are the entire payload. The listing calls do not report visibility (hidden or very-hidden sheets arrive in the list like any other), used-range dimensions, cell counts, or document properties - and although docProps/core.xml is also a small part, there is currently no properties-only probe, so author and title metadata cost a full Open. Structure intake accordingly: let the cheap facts route every file, and gather the expensive facts only for files that survive routing. One performance note for the files that do proceed: a read-only deep scan of a big .xls goes noticeably faster with _DisableGraphics := True, which skips OfficeArt parsing - but never save from such an instance, because the drawing layer it skipped is gone.

Frequently asked questions

Does GetSheetNames load any cell data?

No. It reads workbook.xml from zip-based packages or the BoundSheet records from BIFF files, so its cost stays flat regardless of how much data the workbook holds.

Can I list the sheets of a password-protected file?

Not while it is encrypted. Detect the container with CanReadEncrypted; legacy .xls files can then be opened with their password, while encrypted .xlsx files must be decrypted in Excel before HotXLS can read them.

How do I tell whether a listed sheet is hidden?

You cannot from the listing call - visibility flags require a full Open. Treat the lightweight list as structure-routing input, not as a complete inventory.

Files that pass triage usually head into deeper analysis - the workbook audit and conversion workbench shows the per-sheet counters worth collecting once a full open is justified, and the large-workbook performance guide covers keeping that full open fast.

HotXLS is a native Object Pascal spreadsheet library for Delphi and C++Builder; the complete API surface, including the inspection calls shown here, is documented on the HotXLS Component product page.