Primary keys are essential to relational databases. They ensure that rows in a database conform to a unique identifier. The following code snippet shows how to identify problematic rows.
Sometimes when loading into a different table there can be issues with duplicates on distinct columns when you thought there should be none.
Finding duplicates in a table
You might have a table in your database that doesn't contain any unique constraints. Let's assume the table is called TABLE1
and we want to check duplicates on COL1
and COL2
. The following code will check to see what rows are duplicated, based on certain columns.
SELECT *
FROM (
SELECT
T1.*,
COUNT(*) OVER (PARTITION BY COL1, COL2) ROW_COUNT
FROM TABLE1 T1
)
WHERE ROW_COUNT > 1
How this works
This query uses COUNT
as a window function to count what's given in the PARTITION BY
section. That gets appended to our TABLE
and from there we only keep the TABLE1.ROW_COUNT
s that are greater than 1
, leaving the duplicate rows behind from TABLE1
.
Using on a query rather than table
It's a more common occurrence, however, to determine whether a query you're running may contain duplicate records and will fail when loading to a table with primary key constraints.
In that case you can wrap the query in a WITH
clause (aka subquery factoring clause) and then reference that table in the snippet above:
WITH MY_TABLE AS (
SELECT COL1, COL2, COL3
FROM TABLE1
WHERE COL3 != 'SOMETHING'
)
SELECT *
FROM (
SELECT
T1.*,
COUNT(*) OVER (PARTITION BY COL1, COL2) ROW_COUNT
FROM MY_TABLE T1
)
WHERE ROW_COUNT > 1
Comments