How to count distinct rows with SSIS?

How to calculate the number of distinct rows from a table or a file within a data flow using SSIS?

Tutorial to learn how to count distinct rows in a SQL Server table or data files with SSIS and store it in a text file. To achieve this, we use the SSIS Aggregate transformation. This operation is the SSIS equivalent of executing a COUNT DISTINCT command in SQL Server or in standard SQL. Indeed, the COUNT DISTINCT aggregation option is used to calculate the unique rows in a table or file, within a data flow.

1. Create an SSIS package to count distinct rows

One prerequisite is to have the Adventureworks sample database installed and configured in one database. To begin, in the Data Flow, place a source, the SSIS Aggregate component, and a destination. For this example, the source will be the SSIS OLE DB Source component and the Flat File Destination will serve as the destination for exporting the result to a text file. In order to calculate the number of unique rows with SSIS, add these 3 components to the data flow:

  1. The OLE DB Source
  2. The SSIS Aggregate transformation
  3. And the Flat File Destination
3 SSIS components needed to calculate the number of distinct lines with SSIS
3 SSIS components needed to calculate the number of distinct lines with SSIS

2. Point the OLE DB source component towards the SQL table

Next, it is necessary to configure the SSIS Data Flow. Visual Studio will now display error messages on the Integration Services components. Afterward, double-click on the OLE DB Source component and select a table. For this example, we are selecting the HumanResources.Employee table from the AdventureWorks database.

Configure the connection to the source table to count separate rows
Configure the connection to the source table to count separate rows

3. Configure the SSIS Aggregate Component with Count Distinct

Once the source has been selected and configured, it is time to configure the SSIS Aggregate component. In the Aggregations tab, select the columns where the COUNT DISTINCT should be performed, that is, the columns to aggregate. In the Operation column, choose COUNT DISTINCT.

Configure the SSIS aggregation with Count Distinct
Configure the SSIS aggregation with Count Distinct

4. Link the Aggregate transformation to the target file

Now, connect the SSIS Aggregate component to the final component, known as the Flat File Destination. Double-click on this component to select the destination file. Finally, in the Mapping tab, ensure that the columns are correctly selected.

Configure the SSIS output mapping to using the editor
Configure the SSIS output mapping to using the editor

After configuring all the SSIS package components, there will no longer be any errors in the Data Flow.

SSIS Data Flow with the three components linked together
SSIS Data Flow with the three components linked together

5. Execute the package to export the distinct row count

Next, execute the SSIS package. The exported flat file will contain the distinct row count for the selected columns. The target file will now consist of a single row and the number 290, which corresponds to the number of rows in the input table.

Successful execution of the SSIS Data Flow
Successful execution of the SSIS Data Flow

Conclusion on the SSIS Aggregation Component

This article demonstrates step by step how to perform a count distinct with SSIS, without writing any line of SQL code. For performance optimization, it is recommended to execute SQL operations directly on the SQL Server engine, thus reducing the data transfer between the SQL Server database and the SSIS server. Integration Services packages enable all operations available in T-SQL through a graphical interface and a simplified industrialisation process. For example, here is how to pivot the rows of a SQL table into columns with SSIS.

Pivot rows to columns with the SSIS Pivot transformation

Of course, it is also possible to perform the same pivot operation, which is also very handy but a bit more complex in terms of coding, using the SQL Server PIVOT query directly from SSMS and only with T-SQL code.

How to use the SQL Server PIVOT query to convert rows to columns ?

Leave a Comment

Your email address will not be published. Required fields are marked *