A query and / or function that multiplies data in one field for all types and calculations in another field

I am trying to move from this:

+------+------+------+------+ | fld1 | fld2 | fld3 | etc… | +------+------+------+------+ | a | 5 | 1 | | | b | 5 | 0 | | | c | 6 | 0 | | | b | 2 | 5 | | | b | 1 | 6 | | | c | 0 | 6 | | | a | 8 | 9 | | +------+------+------+------+ 

To:

 +--------+--------+-----------+-----+-----+------+ | Factor | Agg | CalcDate | Sum | Avg | etc… | +--------+--------+-----------+-----+-----+------+ | fld2 | fld1/a | 8/14/2015 | 13 | 6.5 | | | fld2 | fld1/b | 8/14/2015 | 8 | 2.7 | | | fld2 | fld1/c | 8/14/2015 | 6 | 3 | | | fld3 | fld1/a | 8/14/2015 | 10 | 5 | | | fld3 | fld1/b | 8/14/2015 | 11 | 3.7 | | | fld3 | fld1/c | 8/14/2015 | 6 | 3 | | +--------+--------+-----------+-----+-----+------+ 

Notes:

  • Obviously, this data is slightly simplified.
  • I have tons of fields that I need to do for
  • I have included light aggregation calculations here to help someone help me. Full list: NaPct, Mean, Sd, Low, Q1, Median, Q3, High, IQR, Kurt, Skew, Obs. Where NaPct = Percentage equal to NULL, Sd = Standard deviation, Q1 = quartile 1, Q3 = quartile 3, IQR = interval between quartiles, Kurt = Kurtosis, Skew = Skewness, Obs = number of observations that are not NULL.
  • In fact, in the second table above, the factor field will be FactorID, Agg will be AggID, and CalcDate will be CalcDateID, but I will give the actual values ​​there for illustration. It doesn't matter for the question / answer though.
  • Speed ​​is very important, since I have 1305 fields and several units for performing calculations before the start of the working day.
  • Answers using only MS Access, SQL and VBA. Sorry for the business requirements. However, for simplicity, only the SQL response that works in MS Access will be the best answer.
  • Below is the code that uses the custom domain function (DCalcForQueries) and the supporting functions that I created that return one calculated total value for each field and the selected aggregation. Aka, not what I want. Perhaps this code can be used for what I want, maybe not. However, I have the calculations that I want in it, which I hope will help.
  • Message boxes are just how I debug while I am alpha testing: not necessary.
  • To use the code, put all the code in the VBA module, change the table "tbl_DatedModel_2015_0702_0" to the table available in MS Access, change the field "Rk-IU Mkt Cap" to the field in your table and run TestIT (), and you should get the calculated values in the Immediate window.
  • Do not worry about the calculations. I will deal with this. I just need to know what is best to get from the first table above to the second table above to allow the calculations that I want. Thanks!

     Sub TestIt() Dim x Set x = GetOrOpenAndGetExcel Dim rst As DAO.Recordset Dim sSql As String Dim q As String q = VBA.Chr(34) sSql = "SELECT " & _ "DCalcForQueries(" & q & "NaPct" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS NaPct ," & _ "DCalcForQueries(" & q & "Mean" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Mean ," & _ "DCalcForQueries(" & q & "Sd" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Sd ," & _ "DCalcForQueries(" & q & "Low" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Low ," & _ "DCalcForQueries(" & q & "Q1" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q1 ," & _ "DCalcForQueries(" & q & "Median" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Median ," & _ "DCalcForQueries(" & q & "Q3" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q3 ," & _ "DCalcForQueries(" & q & "High" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS High ," & _ "DCalcForQueries(" & q & "IQR" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS IQR ," & _ "DCalcForQueries(" & q & "Kurt" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Kurt ," & _ "DCalcForQueries(" & q & "Skew" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Skew ," & _ "DCalcForQueries(" & q & "Obs" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Obs " & _ "FROM tbl_DatedModel_2015_0702_0;" Debug.Print sSql Set rst = CurrentDb.OpenRecordset(sSql, dbOpenSnapshot) rst.MoveFirst Debug.Print rst.RecordCount Debug.Print rst.Fields("NaPct") Debug.Print rst.Fields("Mean") Debug.Print rst.Fields("Sd") Debug.Print rst.Fields("Low") Debug.Print rst.Fields("Q1") Debug.Print rst.Fields("Median") Debug.Print rst.Fields("Q3") Debug.Print rst.Fields("High") Debug.Print rst.Fields("IQR") Debug.Print rst.Fields("Kurt") Debug.Print rst.Fields("Skew") Debug.Print rst.Fields("Obs") End Sub Public Function DCalcForQueries(sCalc As String, Optional sTbl As String = "", Optional sMainFld As String = "", Optional sWhereClause As String = "", Optional k As Double) As Variant Dim dblData() As Double Dim oxl As Object On Error Resume Next Set oxl = GetObject(, "Excel.Application") If Err.Number <> 0 Then MsgBox "Excel object must be openned by the calling sub of DCalcForQueries so it isn't opened over and over, which is very slow" GoTo cleanup End If Dim x As Integer Dim aV() As Variant Dim tmp Dim lObsCnt As Long Dim lNaCnt As Long Dim i As Long Dim vTmp As Variant Dim lTtl As Long Dim bDoCalc As Boolean aV = a2dvGetSubsetFromQuery(sTbl, sMainFld, sWhereClause, "Numeric") If aV(0, 0) = "Not Numeric" Then MsgBox "Data returned by query was not numeric. Press OK to Stop and debug." Stop End If If sCalc = "Percentile" Or sCalc = "Q1" Or sCalc = "Q2" Or sCalc = "Q3" Or sCalc = "Q4" Then DCalcForQueries = oxl.WorksheetFunction.Percentile_Exc(aV, k) ElseIf sCalc = "Median" Then DCalcForQueries = oxl.WorksheetFunction.Median(aV) ElseIf sCalc = "Kurt" Or sCalc = "Kurt" Then DCalcForQueries = oxl.WorksheetFunction.Kurt(aV) ElseIf sCalc = "Minimum" Or sCalc = "Low" Then DCalcForQueries = oxl.WorksheetFunction.Min(aV) ElseIf sCalc = "Maximum" Or sCalc = "High" Then DCalcForQueries = oxl.WorksheetFunction.Max(aV) ElseIf sCalc = "IQR" Then DCalcForQueries = oxl.WorksheetFunction.Quartile_Exc(aV, 3) - oxl.WorksheetFunction.Quartile_Exc(aV, 1) ElseIf sCalc = "Obs" Then lObsCnt = 0 For Each tmp In aV If Not IsNull(tmp) Then lObsCnt = lObsCnt + 1 End If Next DCalcForQueries = lObsCnt ElseIf sCalc = "%NA" Or sCalc = "PctNa" Or sCalc = "NaPct" Or sCalc = "%Null" Or sCalc = "PctNull" Then lNaCnt = 0 lTtl = UBound(aV, 2) + 1 For Each tmp In aV If IsNull(tmp) Then lNaCnt = lNaCnt + 1 End If Next DCalcForQueries = (lNaCnt / lTtl) * 100 ElseIf sCalc = "Skewness" Or sCalc = "Skew" Then DCalcForQueries = oxl.WorksheetFunction.Skew(aV) ElseIf sCalc = "StDev" Or sCalc = "Sd" Then DCalcForQueries = oxl.WorksheetFunction.StDev_S(aV) ElseIf sCalc = "Mean" Then DCalcForQueries = oxl.WorksheetFunction.Average(aV) Else MsgBox "sCalc parameter not recognized: " & sCalc End If cleanup: End Function Function a2dvGetSubsetFromQuery(sTbl As String, sMainFld As String, sWhereClause As String, sTest As String) As Variant() 'sTest can be "Numeric" or "None" ...will implement more as needed Dim iFieldType As Integer Dim rst As DAO.Recordset Dim db As Database Set db = CurrentDb Dim sMainFldFull As String Dim sSubSetFldFull As String Dim sSql As String sMainFldFull = "[" & sMainFld & "]" sSubSetFldFull = "" sSql = "" sSql = "SELECT " & sMainFldFull & " FROM " & sTbl If Len(sWhereClause) > 0 Then sSql = sSql & " WHERE " & sWhereClause End If Set rst = db.OpenRecordset(sSql, dbOpenSnapshot) 'make sure the data is the right type iFieldType = rst(sMainFld).Type If sTest = "Numeric" Then If iFieldType = dbByte Or _ iFieldType = dbInteger Or _ iFieldType = dbLong Or _ iFieldType = dbCurrency Or _ iFieldType = dbSingle Or _ iFieldType = dbDouble _ Then rst.MoveLast rst.MoveFirst a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount) Else Dim aV(0 To 1, 0 To 1) As Variant aV(0, 0) = "Not Numeric" a2dvGetSubsetFromQuery = aV End If ElseIf sTest = "None" Then 'don't do any testing rst.MoveLast rst.MoveFirst a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount) Else MsgBox "Test type (sTest) can only be 'None' or 'Numeric'. It was: " & sTest Stop End If cleanup: rst.Close Set rst = Nothing End Function Public Function GetOrOpenAndGetExcel() As Object 'if excel is open it will return the excel object 'if excel is not open it will open excel and return the excel object On Error GoTo 0 On Error Resume Next Set GetOrOpenAndGetExcel = GetObject(, "Excel.Application") If Err.Number <> 0 Then Set GetOrOpenAndGetExcel = CreateObject("Excel.Application") End If On Error GoTo 0 End Function 

Edit1: The code above just illustrates my attempt and calculations. I am pretty sure that this is not directly related to a good answer, but I'm not 100% sure. If I use what I have above, it produces one record at a time, and I would have to add (INSERT INTO) each record one at a time, which would be rather slow. My plan was to collect a 2d array of results and use this 2d array to add entries to the batch, but it was said that you cannot do this without looping the array, adding each entry once at a time to win target. I am sure that a solution that involves looping through the fld1 types or a single query with subqueries that can do this in one step is the direction that should be taken. What I have done for optimization so far: I pulled out the creation of an Excel object, therefore it is created only once in the TestIt () test.

Edit2: I have 1305 fields to do the calculations. They are not all in one table; however, for the purposes of this question, I just need a working answer that does more than one field at a time. That is, your answer may assume that all fields are in the same table, and for simplicity, your answer can include only 2 fields, and I can expand it from there. In the code above, I calculated 12 indicators on a single "Rk-IU Mkt Cap" field combining one type, "Consumer Discretionary" ([GICS Sector] = "Consumer Discretionary"). What I have is not what I get after.

+4
source share
4 answers

This is not so difficult to do in MS Access. If I have the correct logic:

 select "fld2" as factor, "fld1/"&fld1, #8/14/2015# as calcdate, sum(fld2), avg(fld2) from table group by fld1 union all select "fld3" as factor, "fld1/"&fld1, #8/14/2015# as calcdate, sum(fld3), avg(fld3) from table group by fld1; 
+1
source

Would something like this work just using pure tSql?

1: Create a table and insert some sample data

 CREATE TABLE [dbo].[FLD]( [fld1] [nvarchar](2) NOT NULL, [fld2] [int] NULL, [fld3] [int] NULL ) ON [PRIMARY] GO INSERT FLD VALUES ('a', 5, 9) INSERT FLD VALUES ('b', 1, 8) INSERT FLD VALUES ('a', 3, 7) 

2: Use nested UNPIVOT to create factors

 SELECT t.factor,t.val + '/' + tv AS Agg, SUM(value) AS [Sum], AVG(value) AS [AVG] FROM ( SELECT * from ( select * from FLD f UNPIVOT ( v for val in (fld1) ) piv ) f UNPIVOT ( value for factor in (fld2, fld3) ) s ) t group by tv, t.factor, t.val 
+3
source

You need a pivot table.

You have two alternatives:

Migrating to SQL Server

This is the preferred method, then you can use T-SQL , for example, suggested by @ Johnv2020.

To learn more about PIVOT and UNPIVOT on sql server, click here

Access Table / Excel PivotTable

I am personally more familiar with the Excel pivot table table, but it seems that access has the same concept ( see here ).

The desired result of your code basically runs several pivot tables on the same data with different aggregates (average, sum ...), and this can be done by automating pivot tables using VBA macros

+2
source

This will be a big hot mess for the Access database engine. This will only worsen your data set. I would recommend getting a free version of SQL Server Express and using Access only as an interface. Then, when you grow up, you can move all your databases to SQL Server ... this is a much more reliable database engine. You will be glad to know about it now.

SQL Server Express

If you go this route, you can do it all completely with T-SQL and with a fully established approach. Acceleration will be sharp. I cannot give you all the details here, but overall this is what you will need to do. Online docs and Google help you complete each step:

  • Install SQL Sever Express
  • Create database
  • Transfer data from access tables to your database.
  • Create a stored procedure to update your pivot table. (see below).
  • If you need an access interface ... I would recommend creating a new ADP (access project file) and connecting it to your SQL Server database. You can create forms and reports based on SQL Server tables and run procedures on it. But you can also just use the standard access project and use pass-while queries to retrieve data or execute procedures.

The procedure for inserting data into a pivot table will be easier if you change your first table structure to look something like this:

 +------+------+------+ | fld1 |fname | fval | +------+------+------+ | a | fld2 | 5 | | a | fld2 | 8 | | b | fld2 | 5 | | b | fld2 | 2 | | b | fld2 | 1 | | c | fld2 | 6 | | c | fld2 | 0 | | a | fld3 | 1 | | a | fld3 | 9 | | b | fld3 | 0 | | b | fld3 | 5 | | b | fld3 | 6 | | c | fld3 | 0 | | c | fld3 | 6 | +------+------+------+ 

You may not want to change the structure of the base data table; if not, you can create the view as a large union query to output it in this format:

 select fld1, 'fld2' fname, fld2 fval from OrigDataTable union all select fld1, 'fld3' fname, fld3 fval from OrigDataTable union all ...etc 

Then your request to insert aggregated data will look something like this:

 insert into AggreateTable select Fname, fld1, CONVERT(date, getdate()) CalcDate, SUM(fval) sum, AVG(Fval) avg, ...etc. from DataTable Group by Fname, fld1 

Here are some links to help you create expressions for aggregate functions:

If you want to try this approach with Access, this may help:

You might end up with something like this approach to work completely inside access ... but I really think that it will be too much to access the handle ... if not today, then soon.

+1
source

All Articles