Myth: Excel Connection Manager works only in 32-bit mode of SSIS

When we try to use Excel Source/Excel Destination as Data Flow component we have to use Excel Connection Manager. The limitations of using Excel Connection Manager is that we have to run SSIS Solution in 32 bit Runtime otherwise we will get below error.

Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.

The AcquireConnection method call to the connection manager “” failed with error code 0xC0209303. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

Error: The requested OLE DB provider Microsoft.Jet.OLEDB.4.0 is not registered. If the 64-bit driver is not installed, run the package in 32-bit mode. Error code: 0x00000000.

An OLE DB record is available. Source: “Microsoft OLE DB Service Components” Hresult: 0x80040154 Description: “Class not registered”.

Image:

My findings with Excel Connection Managers are as follows.

Excel Connection Manager works in 64 bit (Run64BitRuntime = True) in SQL SERVER 2012 Integration Services (SSIS 2012).

That’s not true. Many of us believe that in SQL Server 2012 Integration Services Microsoft has fixed the issue to run SSIS Solution in 64-bit using Excel Connection Manager. The reason it is not working because Microsoft provides Microsoft Visual Studio or SQL SERVER Data Tools (SSDT) as 32 bit application. Above solution I have created in SSIS 2012 and it proves that Excel Connection Manager fails if I run my solution in in 64 bit (Run64BitRuntime = True).

What is the alternative?
We believe that Office OLE DB Provider are 32 bit only which not the true case. We can download 64 bit version of Microsoft Access Database Engine 2010 Redistributable or Microsoft Access Database Engine 2013 Redistributable. As mentioned earlier Microsoft Visual Studio or SQL SERVER Data Tools are 32 bit applications only so we cannot use 64 bit OLEDB Providers so, In the development Environment we have to use 32 bit provider but we can use 64-bit Provider in Production.

After I installed Microsoft Access Database Engine 2010 Redistributable 32 bit in my development machine (SSIS 2012) and replaced Microsoft.Jet.OLEDB.4.0 to Microsoft.ACE.OLEDB.12.0. Run the solution in 32 bit (Run64BitRuntime = False) it works fine. After I change Run64BitRuntime = True, Build and Deployed it to Production where I have installed 64-Bit Microsoft Access Database Engine 2010 Redistributable it works fine.
1) Change settings in Excel Connection Manager as Microsoft.Jet.OLEDB.4.0 to Microsoft.ACE.OLEDB.12.0

2) Another alternative is we can use OLEDB Source and OLEDB Connection Provider as below.

Change the extended property to Excel 8.0;HDR=YES. It works fine in 32 bit in Development machine and 64 bit in production machine.

If you are connecting to Microsoft Office Excel Data, add the appropriate Extended Properties of the OLEDB Connection String based on the Excel File type:

File Type (Extension) Extended Properties.
Excel 97-2003 Workbook (.xls) “Excel 8.0”
Excel 2007-2010 Workbook (.xlsx) “Excel 12.0 Xml”
Excel 2007-2010 Macro-enabled workbook (.xlsm) “Excel 12.0 Macro”
Excel 2007-2010 Non-XML binary workbook (.xlsb) “Excel 12.0”

There is also a myth that we must have installed Microsoft Office on the box or on the machine or server where we running or the developing the SSIS Solution, that’s not the true case.

Conclusion: It is possible that we can load Microsoft Office Excel Files in 64 bit of SSIS.

SQL SERVER 2016 New Features in SQL SERVER Integration Services 2016

Microsoft hasn’t make any changes in SQL SERVER 2014 Integration Services but below features are added in SQL SERVER 2016.

Always On Support

The AlwaysOn Availability Groups feature Microsoft has introduced in SQL SERVER 2012 as a high-availability and disaster-recovery solution which provides an enterprise level alternative to database mirroring. A unique set of read-write user databases are setup as part of AlwaysOn Availability Groups are called availability databases. An Availability Group supports a failover environment for these availability databases that fail over together, thus maximizes the availability of a set of user databases for an enterprise.

Microsoft has introduced SSISDB catalog (i.e. SSISDB user database) in SQL SERVER 2012, which is the central point for working with SQL SERVER Integration Services (SSIS) projects that we have deployed to the Integration Services Server. I.e. we set projects and package parameters, configure environment to specify runtime values for packages, execute and troubleshoot packages and manages Integration Servers operations. SQL SERVER Integration Services Projects, Packages, Parameters, environments and operations history are stored in SSISDB catalog. In SQL SERVER 2016, in order to provide high availability for SQL SERVER Integration Services SSISDB database (contains Projects, Packages, Parameters, environments and operations history etc.), we can add the SSISDB Database to the AlwaysOn Availability Groups just like an any other user database. In the event of failover one of the secondary node becomes the primary node.

Incremental Package Deployment

With the help of this new Incremental Package Deployment feature we can deploy one or more package to the existing or new projects without deploying the whole project. We can incrementally deploy packages using: Deployment Wizard, SQL SERVER Management Studio (uses Deployment Wizard), stored procedures, and Management Object Model (MOM) API at this time.

Note: In SQL SERVER 2016, the Incremental Package Deployment feature is not available in SQL SERVER Data Tools for BI (SSDT-BI).

Project Upgrade

When we upgrade our SSIS Projects from previous version to current version, the project-level connection managers will continue to work as normal and the package layout or annotations are retained.

Lookup Transformation in SQL Server Integration Services

Lookup Transformation performs lookups by joining data in input columns with column in a referenced data set. Lookup Transformation are very useful in SQL Server Integrations Services ETL process in Data warehouse builds where we load data from various data source systems. Lookup transformations is a synchronous transformation. This means it does not block the data flow pipe line. Image: Lookup 1.0 Lookup_Image_1.0

Note: Lookup Transformation is case sensitive, so for any string match we need to covert the dataset to either lower or upper case in source SQL using UPPER() or LOWER() T-SQL functions or we could use character map transformation to convert the dataset to upper or lower case. Cache Mode: Lookup Transformations have below three cache mode options, which allows us to specify how referenced data set is stored in memory. Image: Lookup 2.0 Lookup_Image_2.0 Full cache: It is a default cache mode. In this mode data will be queried to database during the pre-execute phase of the data flow task. This mode used most memory and takes additional start up time for the data flow task, and all of the caching happens before any rows retrieved from the data flow source but it will be very fast during execution. One thing to remember that it will not swap memory to disk until the process (package execution) finished so it is important to manage memory resource. I have seen many people set full table/view name (i.e. do not use SELECT Column1 , Column2 From [TableName], Required Columns Only.) with 50 columns or more with more than 10 million rows which consume large amounts of memory about 5-8 GB RAM instead of required columns which consume hundreds of MB. Partial cache: In this cache mode Lookup cache is empty at the beginning of the data flow. When a new row comes in, the Lookup Transformation checks its cache for the matching values. If no match is found it queries the database. If match is found the values are cached they can be used the next time a matching rows comes in. No Cache: This cache mode is only keeps last match row in to memory, it means that in this mode Lookup Transformation query database for each row. Connection Type: We will have below to connection type options. Image: Lookup 3.0 Lookup_Image_3.0 Cache connection manager: Cache connection manager is useful to reduce data base load while using reuse cache. It is also useful to reuse cache between different data flow task and Lookup Transformations to manage memory more efficiently. We can reuse cache for role playing dimensions such as Date, time and other role playing dimensions in your data warehouse. We must add Lookup transformation in separate data flow task, this will allow cache transformation to populate Cache connection manager before the Lookup Transformation execution starts. The advantage of Cache connection manager is we can do Lookup Transformation against non OLE DB data sources. Note: Cache connection manager does not support Binary Large Object (BLOB) data types DT_TEXT, DT_NTEXT, and DT_IMAGE. If reference dataset contains a BLOB data type, the component will fail when we run the package. OLE DB connection manager: We can use OLE DB Connection manager for OLE DB data sources. Specify how to handle rows with no matching entries. This allows options for handling rows that do not match at least one entry in the dataset. Image: Lookup 4.0 Lookup_Image_4.0 Fail Component: This is default selected option. Package fails if any of the rows failed to match with referenced data set or raw caused error, for example data truncation. Note: People leave this option in production SSIS Package which is not idle because when we schedule a package for an overnight run and early morning we found schedule job failed. It is better to set default key for the failed row match and report in the morning or Redirect Rows to Error Output with some form of the reporting. Ignore Failure: This option sends any nonmatching rows and any rows that cause errors (such as truncation error) to the same output as matched rows, but any lookup values will be set to null. It is like a LEFT OUTER JOIN in SQL query but not returning any duplicate rows. Redirect Rows to Error Output: This option sends any nonmatching rows and any rows that cause an error to the error output (output with red arrow). Redirect Rows to no Match Output: This option sends any nonmatching rows output to the no match output and error rows to the error output (output with red arrow). We can load data to the Fact table load with below two options. We will try and compare which option is faster, uses less resources and required less development and maintenance efforts. I am not going to go through each steps, but through the below print screens I wanted to prove an effective way to use Lookup transformations in fact table load which has about 15-20 dimensions or more. Option 1: In Image Lookup 5.0 we can see that I have redirected the Lookup transformation LKP_OrganizationKey’s no match output “Nonmatched OrganizationKey” to “DER_OrganzationKey” Derived Column transformation and match output to “UnionALL_OrganizationKey” Union All transformation. Same as, LKP_AccountKey Lookup Transformation’s non matched output “NonMatched AccountKey” to DER_AccountKey Derived Column transformation and matched output “Matched Accountkey” to DER_AccountKey Derived column transformation. Many people uses option 1 with lookup transformation, if match is not found set default value in Derived column transformation and merge match output with Union ALL Transformation. Image: Lookup 5.0 Lookup_Image_5.0 Option 2: In Image Lookup 6.0 I have selected, ignore failure option to specify how to handle rows with no matching entries. So all the match output, no match output rows with NULL Lookup column results and any error rows are redirected to the same match output pipeline for LKP_OrganizationKey and LKP_AccountKey Lookup transformations. Then in DER_DefaultKey derived column transformation I have added expression in Derived column transformation editor as per Image Lookup 7.0. Image: Lookup 6.0 Lookup_Image_6.0 Image: Lookup 7.0 Lookup_Image_7.0 Conclusion: As we have seen that Lookup transf-ormation is non-blocking transformation but if we use it with Union ALL transformation (Option1) which is partial blocking transformation so the whole transformation becomes partial blocking transformations. Also, if we have more than 15-20 dimension then we will have to spend more time in development. Option 2 is much faster because and less resource intensive because Lookup Transformation and Derived Column Transformation both are non-blocking transformation. Also, option 2 is best for best practices.