Is there an equivalent of multiple COUNT statuses (DISTINCT CASE WHEN ...) in Apache Pig?

I am new to Apache Pig and am trying to learn. Is there an SQL equivalent COUNT(DISTINCT CASE WHEN ...)in Apache Pig?

For example, I'm trying to do something like this:

CREATE TABLE email_profile AS
SELECT user_id
, COUNT(DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END) AS clickthroughs
, COUNT(DISTINCT CASE WHEN email_code = 'O' THEN message_id ELSE NULL END) AS opened_messages
, COUNT(DISTINCT message_id) AS total_messages_received
FROM email_campaigns
 GROUP BY user_id;

I canโ€™t use FILTER email_campaigns BY email_code = 'C'because it shortens other cases. Is there a way to do this all in one nested block FOREACH?

Thanks!

EDIT:

As requested, sample data. Fields used_id, email_codeand message_id.

user1@example.com    O     111
user1@example.com    C     111
user2@example.com    O     111
user1@example.com    O     222
user2@example.com    O     333

Expected Result:

user1@example.com    2    1    2
user2@example.com    2    0    2
+4
source share
1 answer

FOREACH GROUP used_id. . .

- :

-- Firstly we group so the FOREACH is applied per used_id
A = GROUP email_campaigns BY used_id ;
B = FOREACH A {
        -- We need these three lines to accomplish the:
        -- DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END
        -- First, we get only cases where email_code == 'C'
        click_filt = FILTER email_campaigns BY email_code == 'C' ;
        -- Since we only want unique message_ids, we need to project it out
        click_proj = FOREACH click_filt GENERATE message_id ;
        -- Now we can find all unique message_ids for a given filter
        click_dist = DISTINCT click_proj ;

        opened_filt = FILTER email_campaigns BY email_code == 'O' ;
        opened_proj = FOREACH opened_filt GENERATE message_id ;
        opened_dist = DISTINCT opened_proj ;

        total_proj = FOREACH email_campaigns GENERATE message_id ;
        total_dist = DISTINCT total_proj ;
    GENERATE group AS used_id, COUNT(click_dist) AS clickthroughs,
                               COUNT(opened_dist) AS opened_messages,
                               COUNT(total_dist) AS total_messages_received ;
}

B :

(user1@example.com,1,2,2)
(user2@example.com,0,2,2)

- , , .

+3

All Articles