Recently I had the occasion to solve another annoying Microsoft issue. Instead of just reading the content of each cell the Microsoft Jet Database Engine first guesses to determine the data type of the Excel column using the registry key variable:
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
TypeGuessRows was set to 8 on my machine, which means the meta data related to the first 8 rows of the spread sheet are scanned to guess what the data type should be. In my case the predominent value was null so null was the result returned in the query. Very annoying given that the result should have been the date that was in the 5th row.
One solution is to edit the spread sheet and add an additional bogus date in the column to trick the jet "algorithm", however this was precisely the current serendipitous solution that my client was trying to avoid. Another solution would be to edit the registry (the horror... the horror...) and increase the number of rows considered by TypeGuessRows, but this would not help in this case because dates were not the predominate data in the column. I could just envision explaining a registry edit on a production server to production control.
The solution to load mixed-datatype columns in Excel using SSIS was an addition to the OLE DB connection string:
"IMEX=1"
This tells the OLE DB driver to read mixed data type columns as text instead of using Excel meta data. This is the complete edit:
Original OLE DB ConnectionString:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\do_not_use_spread_sheets_to_load_production_data.xls;Extended Properties="Excel 8.0;HDR=Yes";
New OLE DB ConnectionString:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\do_not_use_spread_sheets_to_load_production_data.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";