Pages

Thursday, March 22, 2012

How to apply nested REPLACE for data cleansing?


The following Microsoft SQL Server T-SQL sample scripts demonstrates the use of nested REPLACE and CONVERT to remove special characters and convert the price result column into money first, then into currency format.

-- Detect special non-printing characters (white space) - QUICK SYNTAX
DECLARE @WhiteSpace varchar(64) = 'New York'+' '+char(9)+'City'+char(31)
SELECT ExposeWP=CONVERT(varbinary(max), @WhiteSpace), NonPrint=@WhiteSpace
/* ExposeWP NonPrint
0x4E657720596F726B2009436974791F New York City */
-- Hex 20 is space; 09 and 1F are printing as space but they are special characters
------------

-- T-SQL remove special characters from string including space by nesting converts
DECLARE @text nvarchar(128) = '#124 $99^@'
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
       REPLACE(REPLACE(REPLACE(REPLACE(@text,
        '!',''),'@',''),'#',''),'$',''),'%',''),
        '^',''),'&',''),'*',''),' ','')
-- 12499
-- SQL Server nested CONVERTs - SQL format money as currency
-- SQL group by - sum aggregate function
USE Northwind;

SELECT   ProductName,
         AveragePrice = '$' + CONVERT(VARCHAR,CONVERT(MONEY,
         AVG(od.UnitPrice * (1.00 - Discount)),  1)),
         UnitsSold = SUM(od.Quantity)
FROM     dbo.[Order Details]  AS od,
         dbo.Products         AS p,
         dbo.Orders           AS o
WHERE    od.ProductID = p.ProductID
         AND o.OrderID = od.OrderID
         AND od.UnitPrice > 8.0
GROUP BY ProductName,
         od.ProductID
ORDER BY ProductName
GO
/* Partial results

ProductName AveragePrice      UnitsSold
Alice Mutton      $34.23      978
Aniseed Syrup     $9.89       228
Boston Crab Meat  $16.35      1103
Camembert Pierrot $30.12      1577
Carnarvon Tigers  $54.91      539
Chai              $15.80      828
*/

No comments:

Post a Comment