SSIS Merge JOIN using multiple tables

Tutorial to use the SSIS Merge Join transformation with two SQL Server tables into one target table.

Use the Merge Join with the SSIS component allows you to perform a join and merge two or more data sources based on a common key. In other words, the SSIS Merge Join component allows you to merge different heterogeneous or non-heterogeneous sources into one, such as flat files, XML documents, Excel files, or SQL Server tables.

This SSIS tutorial explains how to use the SSIS component that merges two SQL Server source tables into one target table. With SSIS, whether for on-premises versions or on the Microsoft Azure Cloud, the component is useful for managing distinct tables without using T-SQL or .NET code.

1. Prepare the source and target tables to join with SSMS

From an SSMS tab, connect to the target database and run the following SQL code to create two source tables to merge into a target table. The source tables are:

  • The sales table: dbo.Sales
  • And the customers table: dbo.Customers

1.1 Create the sales source table

The first source table is the sales table, it will be enriched in an SSIS data flow with data from the customers table, first and last name.

-- Create the Sales table
CREATE TABLE [dbo].[Sales](
  [CustomerID]  [int] NOT NULL,
  [MonthID]     [nvarchar](20) NOT NULL,
  [YearID]      [nvarchar](20) NOT NULL,
  [Qty]         [numeric](10,2) NOT NULL,
  [Price]       [numeric](10,2) NOT NULL,
  [Sales]       [numeric](10,2) NOT NULL,
  [ProductID]   [int] NOT NULL,
  [ProductName] [nvarchar](100) NOT NULL,
  CONSTRAINT [SalesPK] 
    PRIMARY KEY CLUSTERED (
      [CustomerID], [MonthID], [YearID], [ProductID] ASC
    )
);

-- Insert data into the Sales table
INSERT [dbo].[Sales] VALUES (1, N'1', N'2022', 10.00, 879.20, 8792.00, 31, N'Babka');
INSERT [dbo].[Sales] VALUES (1, N'3', N'2021', 5.00, 312.00, 1560.00, 40, N'Unreal');
INSERT [dbo].[Sales] VALUES (1, N'6', N'2021', 9.00, 627.00, 5643.00, 27, N'Qrunch');
INSERT [dbo].[Sales] VALUES (1, N'7', N'2022', 6.00, 621.70, 3730.20, 10, N'Ambrosi');
INSERT [dbo].[Sales] VALUES (1, N'7', N'2022', 4.00, 653.10, 2612.40, 47, N'Quickick');
INSERT [dbo].[Sales] VALUES (1, N'9', N'2021', 2.00, 734.90, 1469.80, 47, N'Quickick');
INSERT [dbo].[Sales] VALUES (2, N'2', N'2021', 9.00, 895.60, 8060.40, 47, N'Quickick');
INSERT [dbo].[Sales] VALUES (2, N'3', N'2022', 11.00, 51.80, 569.80, 49, N'Menudito');
INSERT [dbo].[Sales] VALUES (2, N'7', N'2021', 11.00, 112.20, 1234.20, 19, N'Chiavie');
INSERT [dbo].[Sales] VALUES (2, N'9', N'2022', 7.00, 581.80, 4072.60, 33, N'Jans');
INSERT [dbo].[Sales] VALUES (3, N'2', N'2022', 4.00, 576.60, 2306.40, 5, N'Amoy');
INSERT [dbo].[Sales] VALUES (3, N'6', N'2021', 6.00, 551.10, 3306.60, 8, N'Gilda');
INSERT [dbo].[Sales] VALUES (4, N'0', N'2021', 7.00, 15.40, 107.80, 26, N'Lasco');
INSERT [dbo].[Sales] VALUES (4, N'1', N'2022', 2.00, 706.60, 1413.20, 27, N'Qrunch');
INSERT [dbo].[Sales] VALUES (4, N'1', N'2022', 9.00, 491.70, 4425.30, 30, N'Paldo');
INSERT [dbo].[Sales] VALUES (4, N'3', N'2022', 4.00, 810.20, 3240.80, 22, N'Bourbon');
INSERT [dbo].[Sales] VALUES (4, N'4', N'2021', 9.00, 648.90, 5840.10, 42, N'Barbacoa');
INSERT [dbo].[Sales] VALUES (4, N'5', N'2021', 7.00, 284.90, 1994.30, 5, N'Amoy');
INSERT [dbo].[Sales] VALUES (5, N'11', N'2021', 3.00, 914.90, 2744.70, 45, N'Predue');
INSERT [dbo].[Sales] VALUES (5, N'11', N'2022', 9.00, 903.80, 8134.20, 44, N'Kemps');
INSERT [dbo].[Sales] VALUES (5, N'4', N'2022', 10.00, 133.90, 1339.00, 6, N'Haystack');
INSERT [dbo].[Sales] VALUES (5, N'6', N'2021', 10.00, 940.20, 9402.00, 34, N'Amazin');
INSERT [dbo].[Sales] VALUES (5, N'7', N'2021', 7.00, 352.90, 2470.30, 32, N'Maverik');
INSERT [dbo].[Sales] VALUES (5, N'7', N'2022', 3.00, 116.40, 349.20, 7, N'Kalamata');
INSERT [dbo].[Sales] VALUES (6, N'1', N'2021', 4.00, 283.30, 1133.20, 23, N'Oatsnack');
INSERT [dbo].[Sales] VALUES (6, N'4', N'2021', 4.00, 604.90, 2419.60, 10, N'Ambrosi');
INSERT [dbo].[Sales] VALUES (6, N'4', N'2022', 5.00, 184.30, 921.50, 6, N'Haystack');
INSERT [dbo].[Sales] VALUES (6, N'5', N'2022', 9.00, 464.40, 4179.60, 18, N'Lurpak');
INSERT [dbo].[Sales] VALUES (6, N'8', N'2021', 6.00, 836.10, 5016.60, 40, N'Unreal');
INSERT [dbo].[Sales] VALUES (6, N'9', N'2022', 4.00, 254.80, 1019.20, 1, N'Boncora');
INSERT [dbo].[Sales] VALUES (7, N'0', N'2021', 7.00, 100.90, 706.30, 45, N'Predue');
INSERT [dbo].[Sales] VALUES (7, N'10', N'2021', 4.00, 161.50, 646.00, 10, N'Ambrosi');
INSERT [dbo].[Sales] VALUES (7, N'10', N'2022', 7.00, 484.50, 3391.50, 50, N'Armanino');
INSERT [dbo].[Sales] VALUES (7, N'3', N'2022', 7.00, 748.80, 5241.60, 38, N'Exo');
INSERT [dbo].[Sales] VALUES (7, N'6', N'2021', 1.00, 623.50, 623.50, 45, N'Predue');
INSERT [dbo].[Sales] VALUES (7, N'7', N'2021', 6.00, 185.00, 1110.00, 28, N'Basilico');
INSERT [dbo].[Sales] VALUES (7, N'8', N'2022', 7.00, 625.50, 4378.50, 4, N'Sartori');
INSERT [dbo].[Sales] VALUES (7, N'8', N'2022', 3.00, 125.40, 376.20, 38, N'Exo');
INSERT [dbo].[Sales] VALUES (8, N'1', N'2021', 8.00, 949.80, 7598.40, 16, N'Homekist');
INSERT [dbo].[Sales] VALUES (8, N'10', N'2021', 9.00, 944.30, 8498.70, 41, N'Baked');
INSERT [dbo].[Sales] VALUES (8, N'10', N'2022', 7.00, 578.10, 4046.70, 2, N'Master');
INSERT [dbo].[Sales] VALUES (8, N'12', N'2022', 3.00, 309.20, 927.60, 42, N'Barbacoa');
INSERT [dbo].[Sales] VALUES (8, N'2', N'2022', 4.00, 417.80, 1671.20, 25, N'Scrapple');
INSERT [dbo].[Sales] VALUES (8, N'4', N'2021', 7.00, 509.20, 3564.40, 4, N'Sartori');

1.2 Create the customers source table

The second source table is the customers table, which stores the customer number, first name, last name, city, and country. It contains 8 rows.

-- Create the Customers table
CREATE TABLE [dbo].[Customers](
   [CustomerID] [int] NOT NULL,
   [FirstName] [nvarchar](20) NULL,
   [LastName] [nvarchar](20) NULL,
   [City] [nvarchar](20) NULL,
   [Country] [nvarchar](50) NULL,
   CONSTRAINT [CustomersPrimaryKeyCustomerID] PRIMARY KEY CLUSTERED ([CustomerID] ASC)
);
GO

-- Insert 8 lines into the Customers table
INSERT INTO [dbo].[Customers] VALUES (1, N'Ali', N'Ahmed', N'Cairo', N'Egypt');
INSERT INTO [dbo].[Customers] VALUES (2, N'Johnny', N'John', N'Toronto', N'Canada');
INSERT INTO [dbo].[Customers] VALUES (3, N'John', N'Doe', N'Mexico City', N'Mexico');
INSERT INTO [dbo].[Customers] VALUES (4, N'Shu', N'Abbas', N'Paris', N'France');
INSERT INTO [dbo].[Customers] VALUES (5, N'Jeane', N'Raffin', N'Liushutun', N'China');
INSERT INTO [dbo].[Customers] VALUES (6, N'Legra', N'Leate', N'Błaszki', N'Poland');
INSERT INTO [dbo].[Customers] VALUES (7, N'Sullivan', N'Goadby', N'Xiaoguwei', N'China');
INSERT INTO [dbo].[Customers] VALUES (8, N'Tom', N'Ellams', N'Lop Buri', N'Thailand');

1.3 Create the enriched sales target table

The target table is an enriched sales table with the first and last names of customers: dbo.SalesEnriched. The enriched table has the same structure, but with the following two additional columns:

  1. FirstName: Customer’s first name
  2. LastName: Customer’s last name
CREATE TABLE [dbo].[SalesEnriched](
  [CustomerID]  [int] NOT NULL,
  [MonthID]     [nvarchar](20) NOT NULL,
  [YearID]      [nvarchar](20) NOT NULL,
  [Qty]         [numeric](10,2) NOT NULL,
  [Price]       [numeric](10,2) NOT NULL,
  [Sales]       [numeric](10,2) NOT NULL,
  [ProductID]   [int] NOT NULL,
  [ProductName] [nvarchar](100) NOT NULL,
  [FirstName]   [nvarchar](20) NULL,
  [LastName]    [nvarchar](20) NULL,
  CONSTRAINT [SalesSalesEnrichedPK] 
    PRIMARY KEY CLUSTERED (
      [CustomerID], [MonthID], [YearID], [ProductID] ASC
    )
);

2. Create an SSIS data flow using Merge Join

Prior to this, create a connection to the SQL Server database from the SSIS package.

2.1 Prepare the source components

In an SSIS data flow, add the following components, then rename the components with the names of the source and target tables.

  • 2 OLE DB Source components
  • A Merge Join transformation component
  • An OLE DB destination
Add SSIS OLE DB Source, Merge Join and OLE DB Destination components
Add SSIS OLE DB Source, Merge Join and OLE DB Destination components

Choose the first source table for the SSIS join, i.e., the left table.

Select the first source table which is the Sales one
Select the first source table which is the Sales one

In the same way, select the second source table, the customers table.

Choose the table to the right of the SSIS join
Choose the table to the right of the SSIS join

Finally, connect the sources to the SSIS Merge Join transformation, and the following window appears. Choose the left input for the dbo.Sales table.

Choose the left SSIS join entry for the sales table
Choose the left SSIS join entry for the sales table

2.2 Configure the source data sorting

The package displays an SSIS error:

[Merge Join [2]] Error: “Merge Join.Inputs[Merge Join Left Input]” must be sorted. If possible, sort the data at the source and mark the data as sorted by setting the IsSorted and SortKeyPosition properties in the Advanced Editor. Otherwise, add a Sort Transformation to the path before the merge.

The message is quite explicit, so just sort the join columns using the following 2 properties, from the component’s advanced editor.

  1. IsSorted: to indicate that the source is sorted.
  2. SortKeyPosition: to indicate the order of the columns used for sorting.

Repeat these operations for both source tables.

From the OLE DB Source components:

  1. Right-click
  2. In the Input and Output Properties tab
  3. Select OLE DB Source Output
  4. Change the IsSorted property to True
Change the value of IsSorted in the advanced OLE DB editor
Change the value of IsSorted in the advanced OLE DB editor

Still in the advanced SSIS editor, select this time the column used as the join key, here it is CustomerID, and modify the SortKeyPosition property to 1. In a join with multiple columns, assign the same sort values to the corresponding columns in the two source tables.

Assign the SortKeyPosition to 1 for the CustomerID column
Assign the SortKeyPosition to 1 for the CustomerID column

3. Configure the SSIS Merge Join transformation

Open the Merge Join transformation component and configure the options as follows:

  1. Choose the type of SSIS join: Left outer join.
  2. Select all columns from the first table.
  3. Select only first name and last name from the second table.
Configure the join in the SSIS Merge Join component
Configure the join in the SSIS Merge Join component

4. Prepare the target table to store the SSIS Merge Join data

Configure the last component of the data flow and point it to the enriched sales table.

Select the Sales_Enriched table to store the result of the SSIS join
Select the Sales_Enriched table to store the result of the SSIS join

Verify column mapping from the Mappings tab. Columns from both source tables are properly mapped to the target table.

Map the columns from the SSIS join to the sales enriched table
Map the columns from the SSIS join to the sales enriched table

5. Run the SSIS Merge join and check the target data

Finally, run the SSIS package. The 44 sales data rows are successfully integrated into the enriched table with no errors. In this simple tutorial, unenriched rows are not handled and therefore not redirected.

44 rows are merged and enriched by the SSIS Merge Join transformation
44 rows are merged and enriched by the SSIS Merge Join transformation

Finally, click on View Data from the OLE DB Destination component to read the table and verify that the two columns are enriched with the names and first names of the clients.

The 2 additional columns enriched by the SSIS Merge Join
The 2 additional columns enriched by the SSIS Merge Join

6. Conclusion on the SSIS Merge Join component

The SSIS join allows you to group data from two tables without writing a single line of code. The only prerequisite is to sort the data beforehand, directly from the advanced editor or using the SSIS Sort Transformation.

Other tutorials on SQL Server and SSIS transformations

Be the first to comment

Leave a Reply

Your email address will not be published.


*