Weighted Trendline

Excel creates scatterplots for sets of pair values. It also makes it possible to create a better trend line and formula for the trend line. It also creates bubble diagrams that take into account the weight provided by each value. However, weight does not affect the trend line or formula. The following is an example of a set of values ​​with their comparisons and weights.

Value Map Weight 0 1 10 1 2 10 2 5 10 3 5 20 4 6 20 5 1 1 

Using the Excel trend line, displaying a value of 5 affects the formula too much. Is there a way to create a formula that reflects the appropriate weights?

As an aid, I presented a weighted average for five consecutive values. But is it better?

+8
graph excel excel-formula curve-fitting least-squares
source share
2 answers

With the data in A2: C7 based on the standard weighted least squares formula, you can try:

 =LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0) 

entered with CTRL + SHIFT + ENTER in E2: F2 or in any 2x1 range. This also returns {1.1353,1.4412}.

For rsquared you can enter:

 =INDEX(LINEST((B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0,1),3,1) 

Formula explanation

First, consider the normal regression of y on X using LINEST. If const = TRUE, the regressor matrix is ​​an augmented matrix consisting of a column of units followed by regressor columns, i.e. X '= (1, X). If const = FALSE, the regressor matrix is ​​simply X, so starting a regression with a column of included ones gives the same estimates as running without a column of them and setting const = TRUE.

Now consider weighted least squares regression. The regression is now Wy on WX '= (W1, WX), where W is the diagonal matrix consisting of the square root of the weights. Since there is no column present, we must set const = FALSE and use two columns in the regressor matrix.

Calculation of Rsquared

Set the statistics to TRUE at the LINEST output of the first formula, which we get in the third and fifth lines:

 SSres = 59.76 SSreg(u) = 1461.24 SSTot(u) = 1521 Rsq(u) = 1 - 59.76/1521 = 0.9607 

Note that these values ​​are non-central versions (u), since const = FALSE (for more information, see MS's LINEST help.) For centered versions (c), we need to subtract the weighted average as shown below:

 SSTot(c) =SUMPRODUCT(C2:C7*(B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))^2) = 244.93 Rsq(c) = 1 - 59.76/244.93 = 0.756 
+12
source share

Update
Based on the additional information that you have tens of thousands of lines, here is a VBA UDF that will do the job (including r2)

According to the screen reduction below, it provides the same m , x and r2 values ​​that my extended dataset did in my original answer

enter image description here

 Public Function LinestWeighted(xRng As Range, yRng As Range, wRng As Range, bInt As Boolean, bStat As Boolean) As Variant Dim x As Variant Dim y As Variant Dim W As Variant Dim TotX As Variant Dim TotY As Variant Dim lngRow As Long Dim strDelim As String Dim strX As String Dim strY As String Dim NewSeries As Variant x = Application.Transpose(xRng) y = Application.Transpose(yRng) W = Application.Transpose(wRng) strDelim = "," If (UBound(x, 1) = UBound(y, 1)) And (UBound(x, 1) = UBound(W, 1)) Then For lngRow = 1 To UBound(W) strX = strX & Application.WorksheetFunction.Rept(x(lngRow) & strDelim, W(lngRow)) strY = strY & Application.WorksheetFunction.Rept(y(lngRow) & strDelim, W(lngRow)) Next lngRow TotX = Split(Left$(strX, Len(strX) - 1), strDelim) TotY = Split(Left$(strY, Len(strY) - 1), strDelim) ReDim NewSeries(1 To UBound(TotX) + 1, 1 To 2) For lngRow = 0 To UBound(TotX) NewSeries(lngRow + 1, 1) = CDbl(TotX(lngRow)) NewSeries(lngRow + 1, 2) = CDbl(TotY(lngRow)) Next With Application LinestWeighted = .WorksheetFunction.LinEst(.Index(.Transpose(NewSeries), 2), .Index(.Transpose(NewSeries), 1), bInt, bStat) End With Else LinestWeighted = "input ranges must be equal in length" Exit Function End If End Function 

Initial answer

Just expand your data series with your weights

So, instead of trying to count 6 pairs, use the ratio of your highest to lowest value to repeat the points

i.e. graph

 0 1 `10 times` 1 2 `10 times` ... 5 1 `once` 

enter image description here

+1
source share

All Articles