Tutorial mediante ejemplos de las sentencias INSERT, UPDATE, MERGE, DELETE y TRUNCATE del lenguaje Crono SQL. En todos los casos, Crono SQL simplifica la codificación de estas instrucciones frente a su equivalente SQL.
En la introducción a Crono SQL se describe la sintaxis completa de la sentencia SELECT del lenguaje Crono SQL. En este artículo vamos a explicar la sintaxis de las siguientes sentencias de manipulación de datos:
INSERT
UPDATE
MERGE
DELETE
TRUNCATE
En el seguiente apartado veremos la sentencia LOAD, que unifica en una única instrucción todas las estrategias de carga habituales en un proyecto ETL/DHW. De todos modos, aunque habitualmente no vayamos a utilizar las sentencias descritas en este apartado, conviene conocerlas porque -junto a la sentencia SELECT- forman la base del lenguaje Crono SQL.
INSERT
Crono SQL soporta la sintaxis estándar de la sentencia INSERT:
INSERT INTO dwh.DimProducts(
ProductID,
Product,
ProductCategory,
ProductSubCategory,
ProductNumber,
ProductModel,
Color,
StandardCost,
ListPrice,
ProductSize,
SizeUnitMeasureCode,
Weight,
WeightUnitMeasureCode)
select
ProductID,
Product.Name,
ProductCategory.name,
ProductSubCategory.name,
ProductNumber,
ProductModel.name,
Product.Color,
Product.StandardCost,
Product.ListPrice,
Product.Size,
Product.SizeUnitMeasureCode,
Product.Weight,
Product.WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
Código SQL generado
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,ProductSize,SizeUnitMeasureCode,Weight,WeightUnitMeasureCode)
SELECT
ProductID,
Product.Name AS Name,
ProductCategory.name AS name,
ProductSubCategory.name AS name,
ProductNumber,
ProductModel.name AS name,
Product.Color AS Color,
Product.StandardCost AS StandardCost,
Product.ListPrice AS ListPrice,
Product.Size AS Size,
Product.SizeUnitMeasureCode AS SizeUnitMeasureCode,
Product.Weight AS Weight,
Product.WeightUnitMeasureCode AS WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID);
Sin embargo, especialmente cuando la tabla tiene muchas columnas, esta sintaxis es repetitiva e incómoda de mantener. Por ello, Crono SQL prescinde de la cláusula VALUES y asume que el nombre de los campos coincide con el alias de las columnas de la consulta de origen:
INSERT INTO dwh.DimProducts
select
ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost,
Product.ListPrice,
Product.Size,
Product.SizeUnitMeasureCode,
Product.Weight,
Product.WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost,
Product.ListPrice AS ListPrice,
Product.Size AS Size,
Product.SizeUnitMeasureCode AS SizeUnitMeasureCode,
Product.Weight AS Weight,
Product.WeightUnitMeasureCode AS WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost,ListPrice,Size,SizeUnitMeasureCode,Weight,WeightUnitMeasureCode)
SELECT
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost,
query.ListPrice,
query.Size,
query.SizeUnitMeasureCode,
query.Weight,
query.WeightUnitMeasureCode
FROM query;
Por supuesto, para definir el origen de datos del INSERT se pueden utilizar todas las características de la sentencia SELECT de Crono SQL. En la siguiente consulta, por ejemplo, se verifica que las relaciones sean correctas antes de realizar la inserción. Es decir, si las relaciones pierden o duplican registros, no se ejecutará el INSERT.
INSERT INTO dwh.DimProducts
select
ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost,
Product.ListPrice,
Product.Size,
Product.SizeUnitMeasureCode,
Product.Weight,
Product.WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
CHECK SNOWFLAKE
Código SQL generado
IF EXISTS (
SELECT count(*)
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
HAVING count(*) <> (SELECT count(*) FROM staging.Product)
) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost,
Product.ListPrice AS ListPrice,
Product.Size AS Size,
Product.SizeUnitMeasureCode AS SizeUnitMeasureCode,
Product.Weight AS Weight,
Product.WeightUnitMeasureCode AS WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost,ListPrice,Size,SizeUnitMeasureCode,Weight,WeightUnitMeasureCode)
SELECT
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost,
query.ListPrice,
query.Size,
query.SizeUnitMeasureCode,
query.Weight,
query.WeightUnitMeasureCode
FROM query;
También se puede utilizar, por ejemplo, la sentencia COMBINE o la funcionalidad de MATERIALIZE. En la siguiente consulta se verificará que todas las relaciones sean correctas, se crearán entonces las tablas temporales con la información de ventas y compras, y finalmente se unirán mediante un FULL JOIN en un único resultado a insertar.
INSERT INTO dwh.VentasVsCompras
COMBINE BY Product,ProductNumber
MATERIALIZE sales (
select
Product.Name Product,
Product.ProductNumber ProductNumber,
sum(PurchaseOrderDetail.LineTotal) Purchases
from staging.PurchaseOrderDetail
inner join staging.Product using ProductId
CHECK SNOWFLAKE),
MATERIALIZE purchases (
select
Product.Name Product,
Product.ProductNumber ProductNumber,
sum(SalesOrderDetail.LineTotal) Sales
from staging.SalesOrderDetail
inner join staging.Product using ProductId
CHECK SNOWFLAKE)
Código SQL generado
-- Materialized query: sales
SELECT
Product.Name AS Product,
Product.ProductNumber AS ProductNumber,
sum(PurchaseOrderDetail.LineTotal) AS Purchases
INTO #sales__14C85
FROM staging.PurchaseOrderDetail
INNER JOIN staging.Product ON (PurchaseOrderDetail.ProductId=Product.ProductId)
GROUP BY
Product.Name,
Product.ProductNumber
-- Materialized query: purchases
SELECT
Product.Name AS Product,
Product.ProductNumber AS ProductNumber,
sum(SalesOrderDetail.LineTotal) AS Sales
INTO #purchases__9D591
FROM staging.SalesOrderDetail
INNER JOIN staging.Product ON (SalesOrderDetail.ProductId=Product.ProductId)
GROUP BY
Product.Name,
Product.ProductNumber
IF EXISTS (
SELECT count(*)
FROM staging.PurchaseOrderDetail
LEFT JOIN staging.Product ON (PurchaseOrderDetail.ProductId=Product.ProductId)
HAVING count(CASE WHEN Product.ProductId IS NOT NULL THEN 1 END) <> (SELECT count(*) FROM staging.PurchaseOrderDetail)
) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de PurchaseOrderDetail.',1
IF EXISTS (
SELECT count(*)
FROM staging.SalesOrderDetail
LEFT JOIN staging.Product ON (SalesOrderDetail.ProductId=Product.ProductId)
HAVING count(CASE WHEN Product.ProductId IS NOT NULL THEN 1 END) <> (SELECT count(*) FROM staging.SalesOrderDetail)
) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de SalesOrderDetail.',1
;WITH
query AS (
SELECT
coalesce(sales.Product,purchases.Product) Product,
coalesce(sales.ProductNumber,purchases.ProductNumber) ProductNumber,
sales.Purchases Purchases,
purchases.Sales Sales
FROM #sales__14C85 sales
FULL JOIN #purchases__9D591 purchases ON (sales.Product=purchases.Product AND sales.ProductNumber=purchases.ProductNumber)
)
INSERT dwh.VentasVsCompras(Product,ProductNumber,Purchases,Sales)
SELECT
query.Product,
query.ProductNumber,
query.Purchases,
query.Sales
FROM query;
Una necesidad habitual en ETL/DWH es insertar únicamente los registros que no existan en la tabla destino. Para ello, es necesarrio especificar la "clave de inserción". La clave de inserción se define mediante el carácter # delante del alias de cada campo. Como veremos a lo largo de este tutorial, las claves de todas las estrategias de carga se definen con el carácter #.
La siguiente consulta inserta los productos que no existan aún en la tabla dwh.DimProducts
INSERT INTO dwh.DimProducts
select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost,
Product.ListPrice,
Product.Size,
Product.SizeUnitMeasureCode,
Product.Weight,
Product.WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost,
Product.ListPrice AS ListPrice,
Product.Size AS Size,
Product.SizeUnitMeasureCode AS SizeUnitMeasureCode,
Product.Weight AS Weight,
Product.WeightUnitMeasureCode AS WeightUnitMeasureCode
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost,ListPrice,Size,SizeUnitMeasureCode,Weight,WeightUnitMeasureCode)
SELECT
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost,
query.ListPrice,
query.Size,
query.SizeUnitMeasureCode,
query.Weight,
query.WeightUnitMeasureCode
FROM query
LEFT JOIN dwh.DimProducts ON (DimProducts.ProductID=query.ProductID)
WHERE DimProducts.ProductID IS NULL;
Es es posible realizar una recarga completa mediante la sentencia DELETE AND INSERT. Esta sentencia elimina el contenido de la tabla y la recarga con los datos de la consulta de origen.
DELETE AND INSERT INTO dwh.DimProducts
select
ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
Código SQL generado
DELETE FROM dwh.DimProducts;
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost)
SELECT
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost
FROM query;
También es posible realizar un TRUNCATE AND INSERT
TRUNCATE AND INSERT INTO dwh.DimProducts
select
ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)
LEFT JOIN staging.ProductModel using Product(ProductModelID)
Código SQL generado
TRUNCATE TABLE dwh.DimProducts;
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost)
SELECT
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost
FROM query;
La opción PARTITION permite recargar solo una parte de la tabla. Por ejemplo, es habitual cargar solo los movimientos del mes en curso (si sabemos que los demás no han sido modificados). La siguiente sentencia recarga las ventas de los últimos 30 días:
DELETE AND INSERT INTO dwh.FactSalesOrderHeader PARTITION (OrderDate>=getdate()-30)
SELECT
SalesOrderHeader.SalesOrderId,
Customer.CustomerId,
cast(SalesOrderHeader.OrderDate as date) OrderDate,
SalesOrderHeader.SalesOrderNumber,
cast(SalesOrderHeader.DueDate as date) DueDate,
cast(SalesOrderHeader.ShipDate as date) ShipDate,
SalesOrderHeader.OnlineOrderFlag,
SalesOrderHeader.PurchaseOrderNumber,
SalesOrderHeader.AccountNumber,
SalesOrderHeader.Freight,
SalesOrderHeader.CreditCardApprovalCode,
SalesOrderHeader.SubTotal Amount,
SalesOrderHeader.TaxAmt
FROM staging.SalesOrderHeader
INNER JOIN staging.customer using SalesOrderHeader(customerId)
where OrderDate>=getdate()-30
Código SQL generado
DELETE FROM dwh.FactSalesOrderHeader
WHERE
FactSalesOrderHeader.OrderDate>=getdate()-30;
;WITH
query AS (
SELECT
SalesOrderHeader.SalesOrderId AS SalesOrderId,
Customer.CustomerId AS CustomerId,
CAST(SalesOrderHeader.OrderDate AS date) AS OrderDate,
SalesOrderHeader.SalesOrderNumber AS SalesOrderNumber,
CAST(SalesOrderHeader.DueDate AS date) AS DueDate,
CAST(SalesOrderHeader.ShipDate AS date) AS ShipDate,
SalesOrderHeader.OnlineOrderFlag AS OnlineOrderFlag,
SalesOrderHeader.PurchaseOrderNumber AS PurchaseOrderNumber,
SalesOrderHeader.AccountNumber AS AccountNumber,
SalesOrderHeader.Freight AS Freight,
SalesOrderHeader.CreditCardApprovalCode AS CreditCardApprovalCode,
SalesOrderHeader.SubTotal AS Amount,
SalesOrderHeader.TaxAmt AS TaxAmt
FROM staging.SalesOrderHeader
INNER JOIN staging.customer ON (SalesOrderHeader.customerId=customer.customerId)
WHERE CAST(SalesOrderHeader.OrderDate AS date)>=getdate()-30
)
INSERT dwh.FactSalesOrderHeader(SalesOrderId,CustomerId,OrderDate,SalesOrderNumber,DueDate,ShipDate,OnlineOrderFlag,PurchaseOrderNumber,AccountNumber,Freight,CreditCardApprovalCode,Amount,TaxAmt)
SELECT
query.SalesOrderId,
query.CustomerId,
query.OrderDate,
query.SalesOrderNumber,
query.DueDate,
query.ShipDate,
query.OnlineOrderFlag,
query.PurchaseOrderNumber,
query.AccountNumber,
query.Freight,
query.CreditCardApprovalCode,
query.Amount,
query.TaxAmt
FROM query;
Con el tipo de estrategias descritas en este apartado es posible conseguir que cada tabla se cargue desde una única sentencia, centralizando la lógica de negocio en un único punto, facilitando el desarrollo, y promocionando el seguimiento de buenas prácticas. Los mismos principios y métodos se utilizan en el resto de sentencias DML, como veremos a continuación.
UPDATE
Crono SQL soporta la sintaxis estándar de la sentencia UPDATE:
UPDATE dwh.FactSalesOrderHeader
SET NetAmount=Amount-TaxAmt
WHERE year(OrderDate)=2017
Código SQL generado
UPDATE dwh.FactSalesOrderHeader SET
NetAmount=Amount-TaxAmt
WHERE
year(OrderDate)=2017;
Sin embargo, este tipo de sentencia es fundamentalmente inútil en un proyecto ETL/DWH. Los datos que queremos actualizar habitualmente no están ni se pueden calcular a partir de la tabla de origen. Por ello, cada fabricante de base de datos ha resuelto esta limitación de distintas maneras. Por ejemplo:
UPDATE table1
SET CodeDescription = (
SELECT table2.CODE
FROM table2
WHERE table1.CodeDescription = table2.description)
WHERE
EXISTS (
SELECT table2.CODE
FROM table2
WHERE table1.CodeDescription = table2.description)
Código SQL generado
UPDATE table1 SET
CodeDescription=(SELECT table2.CODE AS CODE
FROM table2
WHERE table1.CodeDescription=table2.description
)
WHERE
EXISTS (SELECT table2.CODE AS CODE
FROM table2
WHERE table1.CodeDescription=table2.description
);
Crono SQL soporta este tipo de sintaxis ISO, pero desaconseja claramente su uso.
En su lugar, el lenguaje Crono SQL propone una nueva sintaxis de UPDATE para actualizar una tabla en función del resultado de una consulta. Es una sintaxis similar a la del INSERT e igual al resto de instrucciones DML de Crono SQL. El codigo SQL generado corresponde a un MERGE.
UPDATE table1
SELECT
#code,
description CodeDescription
from table2
Código SQL generado
;WITH
query AS (
SELECT
code,
description AS CodeDescription
FROM table2
)
MERGE table1
USING query ON query.code=table1.code
WHEN MATCHED AND ((table1.CodeDescription<>query.CodeDescription OR (table1.CodeDescription IS NULL AND query.CodeDescription IS NOT NULL) OR (table1.CodeDescription IS NOT NULL AND query.CodeDescription IS NULL))) THEN
UPDATE SET
CodeDescription=query.CodeDescription;
De esta manera, seria trivial desnormalizar la información de productos en una única sentencia:
UPDATE dwh.DimProducts
select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductID=DimProducts.ProductID
WHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost;
La anterior consulta actualiza los registros de DimProducts que hayan cambiado (y solo los que hayan cambiado). El caracter # sirve para indicar la clave de actualización, y suele coincidir con la "business key" de la consulta de origen.
Si se quieren actualizar todos los registros (hayan cambiado o no), se puede utilizar la sentencia UPDATE ALL, aunque raramente aportará ningún beneficio (¡Al contrario... penalizará el rendimiento por actualizar registros que no lo necesitan!).
UPDATE ALL dwh.DimProducts
select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductID=DimProducts.ProductID
WHEN MATCHED THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost;
Se puede utilizar la opción PARTITION para actualizar solo una parte de la tabla:
UPDATE table1 PARTITION (table1.UPDATETYPE='blah')
SELECT
#code,
description CodeDescription
from table2
Código SQL generado
;WITH
query AS (
SELECT
code,
description AS CodeDescription
FROM table2
)
MERGE table1
USING query ON query.code=table1.code
WHEN MATCHED AND (table1.UPDATETYPE='blah'
AND (table1.CodeDescription<>query.CodeDescription OR (table1.CodeDescription IS NULL AND query.CodeDescription IS NOT NULL) OR (table1.CodeDescription IS NOT NULL AND query.CodeDescription IS NULL))) THEN
UPDATE SET
CodeDescription=query.CodeDescription;
Aunque, evidentemente, en un escenario ETL/DWH es más habitual actualizar múltiples campos. La siguiente sentencia actualiza la información de las Bikes de AdventureWorks:
UPDATE dwh.DimProducts PARTITION (ProductCategory='Bikes')
select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
WHERE ProductCategory='Bikes'
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
WHERE ProductCategory.name='Bikes'
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductID=DimProducts.ProductID
WHEN MATCHED AND (DimProducts.ProductCategory='Bikes'
AND (DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost;
También se podrían actualizar los datos de una tabla a partir de sus propios datos. La siguiente sentencia es equivalente al primer UPDATE de este apartado:
UPDATE dwh.FactSalesOrderHeader
SELECT
#SalesOrderId,
Amount-TaxAmt NetAmount
FROM dwh.FactSalesOrderHeader
WHERE year(OrderDate)=2017
Código SQL generado
;WITH
query AS (
SELECT
SalesOrderId,
Amount-TaxAmt AS NetAmount
FROM dwh.FactSalesOrderHeader
WHERE year(OrderDate)=2017
)
MERGE dwh.FactSalesOrderHeader AS FactSalesOrderHeader
USING query ON query.SalesOrderId=FactSalesOrderHeader.SalesOrderId
WHEN MATCHED AND ((FactSalesOrderHeader.NetAmount<>query.NetAmount OR (FactSalesOrderHeader.NetAmount IS NULL AND query.NetAmount IS NOT NULL) OR (FactSalesOrderHeader.NetAmount IS NOT NULL AND query.NetAmount IS NULL))) THEN
UPDATE SET
NetAmount=query.NetAmount;
MERGE
El lenguaje Crono SQL admite la sintaxis SQL estándar de la sentencia MERGE:
WITH
query AS (
SELECT
ProductID AS ProductId,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductId=DimProducts.ProductId
WHEN MATCHED THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN
INSERT (
ProductId,
Product,
ProductCategory,
ProductSubCategory,
ProductNumber,
ProductModel,
Color,ProductCost)
VALUES (
query.ProductId,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost)
Código SQL generado
WITH
query AS (
SELECT
ProductID AS ProductId,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductId=DimProducts.ProductId
WHEN MATCHED THEN UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN INSERT (ProductId,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES (
query.ProductId,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost);
La sentencia MERGE, en función de una serie de condiciones definidas, ejecutará un UPDATE de los registros que hayan cambiado y un INSERT de los registros de la consulta origen que no existan en la tabla destino. Por este motivo se le conoce como UPSERT (UPDATE+INSERT).
Debería ser la "sentencia estrella" de cualquier proyecto ETL/DWH, porque en estos proyectos lo que en definitiva se necesita es replicar los datos del origen en la tabla destino (¡exactamente lo que promete hacer el MERGE!). Lamentablemente, la sintaxis ISO es compleja, está llena de repeticiones, y es muy difícil escribir correctamente. Por este motivo, muchos desarrolladores la desconocen, o no la usan, y buscan métodos alternativos para realizar el trabajo (con mucho peor rendimiento, habitualmente).
Crono SQL propone una nueva sintaxis del MERGE para solucionar estos problemas. Es una sintaxis idéntica a la del INSERT y el UPDATE anteriormente descritos.
MERGE dwh.DimProducts
select
ProductID #ProductId,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
Código SQL generado
;WITH
query AS (
SELECT
ProductID AS ProductId,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductId=DimProducts.ProductId
WHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN
INSERT (ProductId,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES (
query.ProductId,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost);
Esta instrucción ejecutará un UPSERT (UPDATE+INSERT). Sin embargo, en ocasiones se necesita ejecutar también un DELETE para eliminar los registros que hayan desaparecido de la consulta original. Esta funcionalidad la podemos conseguir mediante la opción WHEN MISSING THEN DELETE
MERGE dwh.DimProducts WHEN MISSING THEN DELETE
select
ProductID #ProductId,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
Código SQL generado
;WITH
query AS (
SELECT
ProductID AS ProductId,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductId=DimProducts.ProductId
WHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN
INSERT (ProductId,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES (
query.ProductId,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
En ocasiones, para tratar las bajas, lo que se prefiere es realizar una baja lógica del registro. Es decir, marcar el registro como eliminado pero sin eliminarlo. Para cubrir este requerimiento tenemos la opción WHEN MISSING THEN SET
MERGE dwh.DimProducts WHEN MISSING THEN SET FechaBaja=getdate(), Deleted=1
select
ProductID #ProductId,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost,
NULL FechaBaja,
0 Deleted
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
Código SQL generado
;WITH
query AS (
SELECT
ProductID AS ProductId,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost,
null AS FechaBaja,
0 AS Deleted
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductId=DimProducts.ProductId
WHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL)
OR DimProducts.FechaBaja<>query.FechaBaja OR (DimProducts.FechaBaja IS NULL AND query.FechaBaja IS NOT NULL) OR (DimProducts.FechaBaja IS NOT NULL AND query.FechaBaja IS NULL)
OR DimProducts.Deleted<>query.Deleted OR (DimProducts.Deleted IS NULL AND query.Deleted IS NOT NULL) OR (DimProducts.Deleted IS NOT NULL AND query.Deleted IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost,
FechaBaja=query.FechaBaja,
Deleted=query.Deleted
WHEN NOT MATCHED THEN
INSERT (ProductId,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost,FechaBaja,Deleted) VALUES (
query.ProductId,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost,
query.FechaBaja,
query.Deleted)
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET
FechaBaja=getdate(),
Deleted=1;
De hecho, esta instrucción es exactamente lo que se conoce como una carga "Slow Changing Dimension Type 1", tan habitual en proyectos ETL/DWH.
Se puede utilizar la opción PARTITION para realizar un MERGE sobre una parte de la tabla. La siguiente consulta actualiza únicamente la información de las Bikes:
MERGE dwh.DimProducts PARTITION (ProductCategory='Bikes')
select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
WHERE ProductCategory='Bikes'
Código SQL generado
;WITH
query AS (
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
WHERE ProductCategory.name='Bikes'
)
MERGE dwh.DimProducts AS DimProducts
USING query ON query.ProductID=DimProducts.ProductID
WHEN MATCHED AND (DimProducts.ProductCategory='Bikes'
AND (DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN
INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES (
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost);
En ocasiones, el optimizador de consultas del motor de base de datos realiza un mejor plan de ejecución si la consulta está materializada. En estos casos, podemos emplear la opción MATERIALIZE.
MERGE dwh.DimProducts PARTITION (ProductCategory='Bikes')
MATERIALIZE select
ProductID #ProductID,
Product.Name Product,
ProductCategory.name ProductCategory,
ProductSubCategory.name ProductSubCategory,
ProductNumber,
ProductModel.name ProductModel,
Product.Color,
Product.StandardCost ProductCost
FROM staging.Product
LEFT JOIN staging.ProductSubCategory using ProductSubcategoryID
LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)
LEFT JOIN staging.ProductModel using ProductModelID
WHERE ProductCategory='Bikes'
Código SQL generado
-- Materialized query: query
SELECT
ProductID,
Product.Name AS Product,
ProductCategory.name AS ProductCategory,
ProductSubCategory.name AS ProductSubCategory,
ProductNumber,
ProductModel.name AS ProductModel,
Product.Color AS Color,
Product.StandardCost AS ProductCost
INTO #query__5A8FF
FROM staging.Product
LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID)
LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId)
LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID)
WHERE ProductCategory.name='Bikes'
CREATE UNIQUE INDEX unique_index_tmp_query__5A8FF ON #query__5A8FF (ProductID)
MERGE dwh.DimProducts AS DimProducts
USING #query__5A8FF AS query ON query.ProductID=DimProducts.ProductID
WHEN MATCHED AND (DimProducts.ProductCategory='Bikes'
AND (DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL)
OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL)
OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL)
OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL)
OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL)
OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL)
OR DimProducts.ProductCost<>query.ProductCost OR (DimProducts.ProductCost IS NULL AND query.ProductCost IS NOT NULL) OR (DimProducts.ProductCost IS NOT NULL AND query.ProductCost IS NULL))) THEN
UPDATE SET
Product=query.Product,
ProductCategory=query.ProductCategory,
ProductSubCategory=query.ProductSubCategory,
ProductNumber=query.ProductNumber,
ProductModel=query.ProductModel,
Color=query.Color,
ProductCost=query.ProductCost
WHEN NOT MATCHED THEN
INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES (
query.ProductID,
query.Product,
query.ProductCategory,
query.ProductSubCategory,
query.ProductNumber,
query.ProductModel,
query.Color,
query.ProductCost);
Por supuesto, también podemos utilizar la opción MATERIALIZE en las tablas o subconsultas interiores, y podemos utilizar todas las características de la sentencia SELECT (COMBINE, CHECK SNOWFLAKE, FILTER, etc.).
MERGE dwh.VentasVsCompras
COMBINE BY Product,ProductNumber
MATERIALIZE sales (
select
Product.Name #Product,
Product.ProductNumber #ProductNumber,
sum(PurchaseOrderDetail.LineTotal) Purchases
from staging.PurchaseOrderDetail
inner join staging.Product using ProductId
CHECK SNOWFLAKE),
MATERIALIZE purchases (
select
Product.Name Product,
Product.ProductNumber ProductNumber,
sum(SalesOrderDetail.LineTotal) Sales
from staging.SalesOrderDetail
inner join staging.Product using ProductId
CHECK SNOWFLAKE)
Código SQL generado
-- Materialized query: sales
SELECT
Product.Name AS Product,
Product.ProductNumber AS ProductNumber,
sum(PurchaseOrderDetail.LineTotal) AS Purchases
INTO #sales__FBEE9
FROM staging.PurchaseOrderDetail
INNER JOIN staging.Product ON (PurchaseOrderDetail.ProductId=Product.ProductId)
GROUP BY
Product.Name,
Product.ProductNumber
CREATE UNIQUE INDEX unique_index_tmp_sales__FBEE9 ON #sales__FBEE9 (Product,ProductNumber)
-- Materialized query: purchases
SELECT
Product.Name AS Product,
Product.ProductNumber AS ProductNumber,
sum(SalesOrderDetail.LineTotal) AS Sales
INTO #purchases__9D591
FROM staging.SalesOrderDetail
INNER JOIN staging.Product ON (SalesOrderDetail.ProductId=Product.ProductId)
GROUP BY
Product.Name,
Product.ProductNumber
IF EXISTS (
SELECT count(*)
FROM staging.PurchaseOrderDetail
LEFT JOIN staging.Product ON (PurchaseOrderDetail.ProductId=Product.ProductId)
HAVING count(CASE WHEN Product.ProductId IS NOT NULL THEN 1 END) <> (SELECT count(*) FROM staging.PurchaseOrderDetail)
) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de PurchaseOrderDetail.',1
IF EXISTS (
SELECT count(*)
FROM staging.SalesOrderDetail
LEFT JOIN staging.Product ON (SalesOrderDetail.ProductId=Product.ProductId)
HAVING count(CASE WHEN Product.ProductId IS NOT NULL THEN 1 END) <> (SELECT count(*) FROM staging.SalesOrderDetail)
) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de SalesOrderDetail.',1
;WITH
query AS (
SELECT
coalesce(sales.Product,purchases.Product) Product,
coalesce(sales.ProductNumber,purchases.ProductNumber) ProductNumber,
sales.Purchases Purchases,
purchases.Sales Sales
FROM #sales__FBEE9 sales
FULL JOIN #purchases__9D591 purchases ON (sales.Product=purchases.Product AND sales.ProductNumber=purchases.ProductNumber)
)
MERGE dwh.VentasVsCompras AS VentasVsCompras
USING query ON query.Product=VentasVsCompras.Product AND query.ProductNumber=VentasVsCompras.ProductNumber
WHEN MATCHED AND ((VentasVsCompras.Purchases<>query.Purchases OR (VentasVsCompras.Purchases IS NULL AND query.Purchases IS NOT NULL) OR (VentasVsCompras.Purchases IS NOT NULL AND query.Purchases IS NULL)
OR VentasVsCompras.Sales<>query.Sales OR (VentasVsCompras.Sales IS NULL AND query.Sales IS NOT NULL) OR (VentasVsCompras.Sales IS NOT NULL AND query.Sales IS NULL))) THEN
UPDATE SET
Purchases=query.Purchases,
Sales=query.Sales
WHEN NOT MATCHED THEN
INSERT (Product,ProductNumber,Purchases,Sales) VALUES (
query.Product,
query.ProductNumber,
query.Purchases,
query.Sales);
DELETE
Crono SQL soporta la sintaxis estándar de la sentencia DELETE de SQL:
DELETE FROM dwh.FactSalesOrderDetails WHERE SalesOrderId=43659
Código SQL generado
DELETE FROM dwh.FactSalesOrderDetails WHERE SalesOrderId=43659;
Si intervienen otras tablas, se puede definir el predicado utilizando las expresiones IN() o EXISTS (). La sigiuente sentencia elimina las líneas de venta de un cliente en concreto:
DELETE
FROM dwh.FactSalesOrderDetails
WHERE SalesOrderSid in (
SELECT SalesOrderSid
FROM dwh.FactSalesOrderHeader
INNER JOIN dwh.DimCustomers using FactSalesOrderHeader(CustomerSid)
WHERE DimCustomers.Customer='Oscar Simmons')
Código SQL generado
DELETE FROM dwh.FactSalesOrderDetails WHERE SalesOrderSid IN (
SELECT SalesOrderSid
FROM dwh.FactSalesOrderHeader
INNER JOIN dwh.DimCustomers ON (FactSalesOrderHeader.CustomerSid=DimCustomers.CustomerSid)
WHERE DimCustomers.Customer='Oscar Simmons'
);
De modo similar, la siguiente sentencia elimina las líneas de venta de otro cliente:
DELETE
FROM dwh.FactSalesOrderDetails
WHERE EXISTS (
SELECT *
FROM dwh.FactSalesOrderHeader cab
INNER JOIN dwh.DimCustomers using cab(CustomerSid)
WHERE
DimCustomers.Customer='Oscar Simmons'
AND FactSalesOrderDetails.SalesOrderSid=cab.SalesOrderSid)
Código SQL generado
DELETE FROM dwh.FactSalesOrderDetails WHERE EXISTS (SELECT *
FROM dwh.FactSalesOrderHeader cab
INNER JOIN dwh.DimCustomers ON (cab.CustomerSid=DimCustomers.CustomerSid)
WHERE
DimCustomers.Customer='Oscar Simmons'
AND FactSalesOrderDetails.SalesOrderSid=cab.SalesOrderSid
);
Crono SQL propone otra sintaxis de la sentencia DELETE (idéntica a la sintaxis de INSERT, UPDATE, y MERGE). La idea subyacente es que se ha de construir el SELECT de los datos que se quieren borrar. Solo el SELECT. Y Crono SQL eliminará precisamente esos registros:
DELETE dwh.FactSalesOrderDetails
select det.SalesOrderDetailSid #SalesOrderDetailSid, det.SalesOrderId, det.SalesOrderDetailsId
from dwh.FactSalesOrderDetails det
inner join dwh.FactSalesOrderHeader using SalesOrderSid
inner join dwh.DimCustomers using FactSalesOrderHeader(CustomerSid)
WHERE DimCustomers.Customer='Jada Morris'
Código SQL generado
;WITH
query AS (
SELECT
det.SalesOrderDetailSid AS SalesOrderDetailSid,
det.SalesOrderId AS SalesOrderId,
det.SalesOrderDetailsId AS SalesOrderDetailsId
FROM dwh.FactSalesOrderDetails det
INNER JOIN dwh.FactSalesOrderHeader ON (det.SalesOrderSid=FactSalesOrderHeader.SalesOrderSid)
INNER JOIN dwh.DimCustomers ON (FactSalesOrderHeader.CustomerSid=DimCustomers.CustomerSid)
WHERE DimCustomers.Customer='Jada Morris'
)
DELETE FROM dwh.FactSalesOrderDetails
WHERE
EXISTS (SELECT 1 FROM query WHERE FactSalesOrderDetails.SalesOrderDetailSid=query.SalesOrderDetailSid);
De hecho, no es necesario que la consulta tenga ninguna referencia a la tabla de la que se quieren eliminar registros. El ejemplo anterior se puede simplificar de la siguiente manera:
DELETE dwh.FactSalesOrderDetails
SELECT #SalesOrderSid
FROM dwh.FactSalesOrderHeader
INNER JOIN dwh.DimCustomers using CustomerSid
WHERE
DimCustomers.Customer='Katherine Turner'
Código SQL generado
;WITH
query AS (
SELECT SalesOrderSid
FROM dwh.FactSalesOrderHeader
INNER JOIN dwh.DimCustomers ON (FactSalesOrderHeader.CustomerSid=DimCustomers.CustomerSid)
WHERE DimCustomers.Customer='Katherine Turner'
)
DELETE FROM dwh.FactSalesOrderDetails
WHERE
EXISTS (SELECT 1 FROM query WHERE FactSalesOrderDetails.SalesOrderSid=query.SalesOrderSid);
En todos los casos, lo que marca los registros que se deben eliminar es la "clave de eliminación" (marcada con el carácter #).
La consulta del DELETE también puede utilizar todas las características del SELECT de Crono SQL. La siguiente sentencia elimina los clientes sin ninguna venta:
DELETE dwh.DimCustomers
SELECT #customerSid
FROM dwh.DimCustomers
ANTI JOIN dwh.FactSalesOrderHeader USING CustomerSid
Código SQL generado
;WITH
query AS (
SELECT customerSid
FROM dwh.DimCustomers
WHERE NOT EXISTS (SELECT 1 FROM dwh.FactSalesOrderHeader WHERE DimCustomers.CustomerSid=FactSalesOrderHeader.CustomerSid)
)
DELETE FROM dwh.DimCustomers
WHERE
EXISTS (SELECT 1 FROM query WHERE DimCustomers.customerSid=query.customerSid);
TRUNCATE
Crono SQL soporta la sintaxis estándar del a sentencia TRUNCATE de SQL:
TRUNCATE TABLE dwh.DimCustomers
Código SQL generado
TRUNCATE TABLE dwh.DimCustomers
Aunque no es propiamente una instrucción DML, se menciona aquí por su similitud con el DELETE.
Resumen
En este artículo hemos descrito la sintaxis de las siguientes sentencias de Crono SQL:
INSERT
UPDATE
MERGE
DELETE
TRUNCATE
En todos los casos, Crono SQL simplifica enormemente la codificación de estas instrucciones freante a su equivalente SQL, ahorrando frecuentemente centenares de líneas de código. Además, es un lenguaje mucho más imperativo, más fácil de escribir, de leer, y de mantener. Es, en definitiva, un lenguaje más natural para hablar con nuestras bases de datos, y especialmente en los proyectos ETL/DWH.
Siguientes pasos
Los siguientes artículos muestran otras características del lenguaje: