Create a custom Oracle analytic function with multiple arguments

Background

I know that in Oracle, you can create custom aggregation functions that process a collection of values ​​and return a single result. Change I even read a friendly guide at docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm!

I also know that Oracle provides built-in analytic functions , such as DENSE_RANK and RATIO_TO_REPORT , which provide values ​​for each input, relative to the collection / window of values ​​whose input lies inside.

Problem

I want to know if there is a way to create my own analytic function , presumably similar to how I can create my own aggregation function and, in particular, create one with additional arguments in my custom analytic function.

Thin terminology clause

When I refer to the "analytic function", read it as a function that, in addition to accepting window parameters using the PARTITION keyword, can also return different values ​​to a given window. (If anyone has a better term for this, please let me know! Pure analytic function? DENSE_RANK classic analytic function? Non-aggregate analytic function?)

The Oracle documentation notes that an aggregate function can be used as an analytic (window) function. Unfortunately, this means that the PARTITION keyword for specifying a window in analytic functions can also be applied to aggregate functions. This does not contribute to the combined function for my desired status of the ability to return different values ​​within a fixed window.

Unit used as analytical:

 SELECT SUM(income) OVER (PARTITION BY first_initial) AS total FROM data; 

will have as many records as data , but will only have as many different total as there are first initials.

Analytical analysis is used as analytical:

 SELECT RATIO_TO_REPORT(income) OVER (PARTITION BY first_initial) AS ratio FROM data; 

will have as many records as data , AND, even within this first_initial section, those first_initial may be different.

Context

I was granted access only by calling the PL / SQL procedure, which takes a numerical collection as an IN OUT parameter and has several other IN configuration parameters. The procedure changes the values ​​of the collection (think of it as an “Authorized and Required Quality Procedure for the University”), depending on the configuration parameters.

Currently, the process of using this procedure is to hard-code the cursor loop, which detects a change from one section of data to another, and then inside each section, extracts data into a collection, which is then passed to the procedure, changes and, ultimately, dumped back to a separate table. I planned to improve this by creating a PIPELINED PARALLEL_ENABLE table that encapsulates some logic, but I would prefer to include queries such as:

 SELECT G.Course_ID , G.Student_ID , G.Raw_Grade , analytic_wrapper(G.raw_grade, P.course_config_data) OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade , P.course_config_data FROM grades G LEFT JOIN policies P ON G.Course_ID = P.Course_ID; 

This requires the ability to create a custom analytic function, and because the procedure requires different inputs on different sections (for example, Course_ID specific P.course_config_data above), it must also accept not only the argument associated with the data, but also additional entrances.

Is this possible, and if so, where can I find the documentation? My google-fu didn't help me.

Extra wrinkle

The PL / SQL procedure that I provided is (efficiently) non-deterministic, and its result has statistical properties that need to be preserved. For example, if A={A[0], A[1], A[3]} are raw estimates for one particular class, and B=f(A) is the result of calling the procedure on A at 1:00, and C=f(A) is the result of calling the procedure on A at 1:15, then B={B[0],B[1],B[2]} and C={C[0],C[1],C[2]} are acceptable outputs for use, but a mixture of elements like {C[0],B[1],C[2]} not acceptable.

As a result of this, the procedure must be called exactly once on each section. (Well, technically, it can be wastefully called as many times as needed, but all results for the section should come from the same call).

Suppose, for example, that the procedure I provided works as follows: it takes a collection of ratings as an IN OUT parameter, and then sets one of those classes selected at random to 100. All other classes are set to zero. Doing this at 13:00 can lead to Alice having only a passing class, and when starting at 13:01, Bob can only have a passing class. Despite this, it must be that exactly one student per class passes, no more and no less.

+5
source share
4 answers

This version does not suffer from the warnings of my previous answer, although it will be slower and slightly more difficult to use. Most of the slowness from a loop in ODCIAggregateDelete - you can find improvement where a loop through the entire collection is not required.

In any case, this version creates a custom analytic function that mimics the built-in COLLECT function for Oracle. Thus, instead of trying to create a custom analytic function that calculates the actual value that we want, it simply calculates the set of rows in the window.

Then, for each row, we pass the row data and the results of our user analytics "COLLECT" into a regular function that calculates the required value.

Here is the code. (NOTE: your original question also asked a few parameters. Easy - just put all the fields you want in matt_ratio_to_report_rec .) (Also, sorry for the names of the objects - I prefix my name on everything that other developers know who ask, calls whether the object is their problem.)

 -- This is the input data to the analytic function --DROP TYPE matt_ratio_to_report_rec; CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT ( value NUMBER ); -- This is a collection of input data --DROP TYPE matt_ratio_to_report_tab; CREATE OR REPLACE TYPE matt_ratio_to_report_tab AS TABLE OF matt_ratio_to_report_rec; -- This object type implements a custom analytic that acts as an analytic version of Oracle COLLECT function --DROP TYPE matt_ratio_to_report_col_impl; CREATE OR REPLACE TYPE matt_ratio_to_report_col_impl AS OBJECT ( analytics_window matt_ratio_to_report_tab, CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT, -- Called to initialize a new aggregation context -- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead -- of creating the new aggregation context from scratch STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER, -- Called when a new data point is added to an aggregation context MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER, -- Called to return the computed aggragate from an aggregation context MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER, -- Called to merge to two aggregation contexts into one (eg, merging results of parallel slaves) MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER, -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER ); CREATE OR REPLACE TYPE BODY matt_ratio_to_report_col_impl IS CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT IS BEGIN SELF.analytics_window := new matt_ratio_to_report_tab(); RETURN; END; STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize()'); sctx := matt_ratio_to_report_col_impl (); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.analytics_window.COUNT || ')'); -- Add record to collection self.analytics_window.extend(); self.analytics_window(self.analytics_window.COUNT) := value; RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.analytics_window.COUNT || ' - flags: ' || flags || ')'); IF flags = 1 THEN returnValue := self.analytics_window; END IF; RETURN ODCIConst.Success; EXCEPTION WHEN others THEN DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER IS BEGIN -- DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')'); -- TODO: Add all elements from ctx2 window to self window RETURN ODCIConst.Success; END; -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS l_ctr NUMBER; BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.analytics_window.COUNT || ' - ' || value.value || ')'); l_ctr := self.analytics_window.FIRST; <<window_loop>> WHILE l_ctr IS NOT NULL LOOP IF ( self.analytics_window(l_ctr).value = value.value ) THEN self.analytics_window.DELETE(l_ctr); DBMS_OUTPUT.PUT_LINE('... deleted slot ' || l_ctr); EXIT window_loop; END IF; l_ctr := self.analytics_window.NEXT(l_ctr); END LOOP; RETURN ODCIConst.Success; END; END; / -- This function is the analytic version of Oracle COLLECT function --DROP FUNCTION matt_ratio_to_report; CREATE OR REPLACE FUNCTION matt_ratio_to_report_col ( input matt_ratio_to_report_rec) RETURN matt_ratio_to_report_tab PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_col_impl; / -- This the actual function we want CREATE OR REPLACE FUNCTION matt_ratio_to_report ( p_row_value NUMBER, p_report_window matt_ratio_to_report_tab ) RETURN NUMBER IS l_report_window_sum NUMBER := 0; l_counter NUMBER := NULL; BEGIN IF p_row_value IS NULL or p_report_window IS NULL THEN RETURN NULL; END IF; -- Compute window sum l_counter := p_report_window.FIRST; WHILE l_counter IS NOT NULL LOOP l_report_window_sum := l_report_window_sum + NVL(p_report_window(l_counter).value,0); l_counter := p_report_window.NEXT(l_counter); END LOOP; RETURN p_row_value / NULLIF(l_report_window_sum,0); END matt_ratio_to_report; -- Create some test data --DROP TABLE matt_test_data; CREATE TABLE matt_test_data ( x, group# ) PARALLEL 4 AS SELECT rownum, ceil(rownum / 10) group# FROM DUAL CONNECT BY ROWNUM <= 50000; -- TESTER 9/30 with test as ( SELECT dx, CEIL (dx / 10) group#, ratio_to_report (dx) OVER (PARTITION BY d.group#) oracle_rr, matt_ratio_to_report ( dx, matt_ratio_to_report_col (matt_ratio_to_report_rec (dx)) OVER (PARTITION BY d.group#)) custom_rr FROM matt_test_data d ) SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test --where oracle_rr != custom_rr ORDER BY test_results nulls last, x; 
+2
source

The only way to create a custom aggregate with several parameters is to create a new TYPE with the desired number of elements, and then pass an instance of this type to the aggregate:

First, define a structure for storing all the parameters you need:

 create or replace type wrapper_type as object ( raw_grade integer, config_data varchar ); / 

Then create your aggregate:

 CREATE OR REPLACE TYPE analytic_wrapper AS OBJECT ( .. variables you might need STATIC FUNCTION ODCIAggregateInitialize(actx IN OUT wrapper_type) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateIterate(self IN OUT wrapper_type, val IN wrapper_type) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateTerminate(self IN wrapper_type, returnValue OUT number, flags IN NUMBER) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateMerge(self IN OUT wrapper_type, ctx2 IN wrapper_type) RETURN NUMBER ); / 

Then you need to implement the actual aggregated logic in type body . Once this is done, you can use something like this:

 select analytic_wrapper(wrapper_type(G.raw_grade, P.course_config_data)) from ... 

The above has been written more or less from memory, so I'm sure it is filled with syntax errors, but it should start working.

See the manual for more information and examples: http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/aggr_functions.htm#ADDCI026

The manual indicates that such an aggregate can be used as an analytical function:

When a custom aggregate is used as an analytic function, the aggregate is calculated for each row corresponding to the corresponding window.

+1
source

I have the same need. I am posting an approach that seems to work (it is consistent with the native-Oracle ratio_to_report function for all the cases I've tried so far).

My concern is that it relies on the “fact” that the ODCIIterate and ODCITerminate always invoke in the same order. I have no reason to believe that this is always the case. I can register SR because I don’t think I can use this version without explanation from Oracle.

However, I am posting the code as it is the answer to the question.

Caution # 1 - This code stores state in a PL / SQL package. I hate this, but I have not seen an alternative, since ODCITerminate passes SELF only as IN , not IN OUT . In addition to being ugly, this means that you cannot have multiple uses of a user-defined analytic function in a single query (since their states will mix). I am sure that this restriction could be related (for example, to give each ODCI context a unique value and keep separate states for each unique context).

Caveat # 2 - in my test case, the PARALLEL request is used. I see from explain plan that it works in parallel. However, this is not like creating and combining several contexts that I really wanted to test, because if something violates this approach, it will be so.

Here is the code.

 CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT ( key VARCHAR2(80), value NUMBER ); CREATE OR REPLACE PACKAGE matt_ratio_to_report_state AS TYPE values_tab_t IS TABLE OF matt_ratio_to_report_rec INDEX BY BINARY_INTEGER; TYPE index_tab_t IS TABLE OF NUMBER INDEX BY VARCHAR2(80); G_VALUES_TAB values_tab_t; G_INDEX_TAB index_tab_t; G_ITERATOR_POSITION NUMBER; G_TERMINATOR_POSITION NUMBER; END matt_ratio_to_report_state; / CREATE OR REPLACE TYPE matt_ratio_to_report_impl AS OBJECT ( window_sum NUMBER, CONSTRUCTOR FUNCTION matt_ratio_to_report_impl(SELF IN OUT NOCOPY matt_ratio_to_report_impl ) RETURN SELF AS RESULT, -- Called to initialize a new aggregation context -- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead -- of creating the new aggregation context from scratch STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_impl) RETURN NUMBER, -- Called when a new data point is added to an aggregation context MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER, -- Called to return the computed aggragate from an aggregation context MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_impl, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER, -- Called to merge to two aggregation contexts into one (eg, merging results of parallel slaves) MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_impl, ctx2 IN matt_ratio_to_report_impl) RETURN NUMBER, -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_impl, value matt_ratio_to_report_rec) RETURN NUMBER ); / CREATE OR REPLACE TYPE BODY matt_ratio_to_report_impl IS CONSTRUCTOR FUNCTION matt_ratio_to_report_impl(SELF IN OUT NOCOPY matt_ratio_to_report_impl ) RETURN SELF AS RESULT IS BEGIN SELF.window_sum := 0; matt_ratio_to_report_state.G_VALUES_TAB.DELETE; matt_ratio_to_report_state.G_INDEX_TAB.DELETE; matt_ratio_to_report_state.G_ITERATOR_POSITION := 0; matt_ratio_to_report_state.G_TERMINATOR_POSITION := 0; RETURN; END; STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_impl) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize(' || sctx.window_sum); sctx := matt_ratio_to_report_impl (); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.window_sum || ' - ' || value.key || ', ' || value.value || ')'); -- Increment sum self.window_sum := self.window_sum + value.value; matt_ratio_to_report_state.G_ITERATOR_POSITION := matt_ratio_to_report_state.G_ITERATOR_POSITION + 1; matt_ratio_to_report_state.G_VALUES_TAB(matt_ratio_to_report_state.G_ITERATOR_POSITION) := value; matt_ratio_to_report_state.G_INDEX_TAB(value.key) := matt_ratio_to_report_state.G_ITERATOR_POSITION; RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_impl, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.window_sum || ' - flags: ' || flags || ')'); IF flags = 1 THEN matt_ratio_to_report_state.G_TERMINATOR_POSITION := matt_ratio_to_report_state.G_TERMINATOR_POSITION + 1; returnValue := matt_ratio_to_report_state.G_VALUES_TAB( matt_ratio_to_report_state.G_TERMINATOR_POSITION).value / self.window_sum; --self.x_list; END IF; RETURN ODCIConst.Success; EXCEPTION WHEN others THEN DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE); RETURN ODCIConst.Success; END; MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_impl, ctx2 IN matt_ratio_to_report_impl) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')'); -- Increment sums self.window_sum := self.window_sum + ctx2.window_sum; RETURN ODCIConst.Success; END; -- ODCIAggregateDelete MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS BEGIN DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.window_sum || ' - ' || value.key || ', ' || value.value || ')'); -- Decrement sums matt_ratio_to_report_state.G_VALUES_TAB.DELETE(matt_ratio_to_report_state.G_INDEX_TAB(value.key)); matt_ratio_to_report_state.G_INDEX_TAB.DELETE(value.key); self.window_sum := self.window_sum - value.value; END; END; / CREATE OR REPLACE FUNCTION matt_ratio_to_report ( input matt_ratio_to_report_rec) RETURN NUMBER PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_impl; / CREATE TABLE matt_test_data ( x ) PARALLEL 4 AS SELECT rownum FROM DUAL CONNECT BY ROWNUM <= 50000; with test as ( select dx, sum(dx) over ( partition by mod(dx,5) order by dx desc ) running_sum, ratio_to_report(dx) over ( partition by mod(dx,500) ) oracle_rr, matt_ratio_to_report( matt_ratio_to_report_rec(to_char(dx), dx) ) over ( partition by mod(dx,500) ) custom_rr --matt_ratio_to_report( matt_ratio_to_report_rec(to_char(dx), dx) ) over ( partition by mod(dx,500) ORDER BY dx ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) custom_rr_windowed from matt_test_data d ) SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test --where oracle_rr != custom_rr ORDER BY test_results nulls last, x ; 
+1
source

All Articles