Say you have a table with columns, Date, GroupID, X and Y.
CREATE TABLE
(
[Date] DATETIME,
GroupID INT,
X FLOAT,
Y FLOAT
)
DECLARE @date DATETIME = getdate()
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
and you want to calculate the ratio of X and Y for each group. I am currently using CTEs which are a bit confusing:
;WITH DataAvgStd
AS (SELECT GroupID,
AVG(X) AS XAvg,
AVG(Y) AS YAvg,
STDEV(X) AS XStdev,
STDEV(Y) AS YSTDev,
COUNT(*) AS SampleSize
FROM
GROUP BY GroupID),
ExpectedVal
AS (SELECT s.GroupID,
SUM(( X - XAvg ) * ( Y - YAvg )) AS ExpectedValue
FROM
JOIN DataAvgStd das
ON s.GroupID = das.GroupID
GROUP BY s.GroupID)
SELECT das.GroupID,
ev.ExpectedValue / ( das.SampleSize - 1 ) / ( das.XStdev * das.YSTDev )
AS
Correlation
FROM DataAvgStd das
JOIN ExpectedVal ev
ON das.GroupID = ev.GroupID
DROP TABLE
There seems to be a way to use OVER and PARTITION to do it in one fell swoop without any subqueries. Ideally, TSQL will have a function so you can write:
SELECT GroupID, CORR(X, Y) OVER(PARTITION BY GroupID)
FROM
GROUP BY GroupID
source
share