]> Reading inconsistent data from Excel with ACE 🌐:aligrant.com

Reading inconsistent data from Excel with ACE

Alastair Grant | Tuesday 9 April 2019

I've had the unpleasant task of reading through an Excel spreadsheet of questionable data quality using the ACE OLEDB provider.

One of the challenges with reading data from Excel is data types.  In a conventional relational database you know that column X is always going to be of type Y.  But in Excel, all bets are off.  There may be a word in the first row, and a number in the second.  Or even more confusing a number as a string, and so on.

The ACE driver will guess the data type by reading a few rows until it finds data and then works out the type.  If you have lots of blanks, you can fudge this by putting a dummy row in first with example data of default values to allow the provider driver to work out the types correctly.

Another approach that I'm using fairly successfully at the moment is to read the data as an object and then attempt to convert it (not cast it), using this handy little extension method:

/// <summary>
/// Attempts to retrieve a value as the provided type.
/// </summary>
/// <remarks>
/// Reads the value as an object and then attempts to use IConvertable to convert into another data type.
/// Nullable<> types are supported, and will return NULL if there is no value.
/// </remarks>
/// <typeparam name="T">Type to attempt conversion on, including nullable types.</typeparam>
/// <param name="row">DataRow to read from.</param>
/// <param name="colName">Column Name to read from.</param>
/// <returns>Value or default</returns>
public static T GetSafish<T>(this DataRow row, string colName)
{
    if (colName == null) throw new ArgumentNullException(nameof(colName));

    if (row.IsNull(colName)) return default(T);

    return (T) Convert.ChangeType(row[colName], Nullable.GetUnderlyingType(typeof(T)) ?? typeof(T));
}

Instead of calling row.Field<double?>("MyCol"), you call row.GetSafish<double?>("MyCol").  If Excel is handling the data as a string, it'll get converted to a double just fine, and if the cell is blank, you'll get null.  You will get a System.FormatException when the conversion can't take place (such as those horrible blank cells that aren't blank).

Breaking from the voyeuristic norms of the Internet, any comments can be made in private by contacting me.