Pages

Thursday, May 24, 2012

Read and Import Excel Sheet using ADO.Net and C#


Here I am explaining how to read excel sheet and import it using ADO.Net and C#. Excel Workbook is just like database with sheets corresponding to tables. See the mapping below.

Database   <—————>    Excel Workbook
Sheet        <—————->    Table

Connection String for Excel 97-2003 Format (.XLS)
For Excel 97-2003 format Microsoft Jet OLEDB Driver 4.0 is used. A sample connection string as follows.

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Book1.xls;Extended Properties='Excel 8.0;HDR=Yes'"


 Connection String for Excel 2007 Format (.XLSX)

For Excel 2007 format the new Microsoft Ace OLEDB Driver 4.0 is used. A sample connection string as follows.
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Book1.xlsx;Extended Properties='Excel 8.0;HDR=Yes'"

Rest everything is same for both versions. One thing to note Microsoft Ace OLEDB Driver 12.0 works for both Excel 2003 and Excel 2007 formats.
You can specify whether your Excel file has Headers or not using the HDR property.
When HDR is set to Yes the First Row is considered as the Header of the Excel file.
For this tutorial I have used the following Excel Sheet


ID
Name
1
John
2
Smith
3
Rob


Establish a Connection

String strExcelConn = "Provider=Microsoft.Jet.OLEDB.4.0;"
"Data Source=Book1.xls;"
"Extended Properties='Excel 8.0;HDR=Yes'";

OleDbConnection connExcel = new OleDbConnection(strExcelConn);
OleDbCommand cmdExcel = new OleDbCommand();
cmdExcel.Connection = connExcel;

  
Accessing Sheets
 
connExcel.Open();
DataTable dtExcelSchema;
dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
connExcel.Close();

The dtExcelSchema contains all the Sheets present in your Excel Workbook

You access them in the following way

string sheetName = dtExcelSchema.Rows[0]["TABLE_NAME"];

This will give the name of the first sheet. i.e. Sheet1$

Running a Select Query by Specifying Column Names

DataSet ds = new DataSet();
string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
cmdExcel.CommandText = "SELECT ID, Name From [" + SheetName + "]";
da.SelectCommand = cmdExcel;
da.Fill(ds);

Note the above query will only work when Header Row is present in the Excel Sheet.

XML Output of the Query

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <Table>
    <ID>1</ID>
    <Name>John</Name>
  </Table>
  <Table>
    <ID>2</ID>
    <Name>Smith</Name>
  </Table>
  <Table>
    <ID>3</ID>
    <Name>Rob</Name>
  </Table>
</NewDataSet>


Running a Query without Specifying Column Names
string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
cmdExcel.CommandText = "SELECT * From [" + SheetName + "]";
da.SelectCommand = cmdExcel;
da.Fill(ds);
connExcel.Close();

Above query will work in both cases if Header row is specified and when not specified

Output of the Above Query with Header = Yes

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <Table>
    <ID>1</ID>
    <Name>John</Name>
  </Table>
  <Table>
    <ID>2</ID>
    <Name>Smith</Name>
  </Table>
  <Table>
    <ID>3</ID>
    <Name>Rob</Name>
  </Table>
</NewDataSet>

Output of the Above Query with Header = No

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <Table>
    <F2>Name</F2>
  </Table>
  <Table>
    <F1>1</F1>
    <F2>John</F2>
  </Table>
  <Table>
    <F1>2</F1>
    <F2>Rob</F2>
  </Table>
  <Table>
    <F1>3</F1>
    <F2>Smith</F2>
  </Table>
</NewDataSet>

Running a Query on a Range of Cells

string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
cmdExcel.CommandText = "SELECT * From [" + SheetName + "A3:B5]";
da.SelectCommand = cmdExcel;
da.Fill(ds);
connExcel.Close();

Many times you need to select all data from a range of Cells. The above query selects all the data within the cell range A3:B5

Output of the Above Query

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <Table>
    <F1>2</F1>
    <F2>Rob</F2>
  </Table>
  <Table>
    <F1>3</F1>
    <F2>Smith</F2>
  </Table>
</NewDataSet>

You can download the sample source code here.

No comments:

Post a Comment