Sorry for the specific question, but I feel like I'm at a standstill because my SQL knowledge doesn't go that far.
The data that is derived from the BigQuery implementation with the GoogleAnalytics source data is as follows:
|-visitId |- date |- (....) +- hits |- time +- customDimensions |- index |- value +- customMetrics |- index |- value
I know that there are hits that always send data to GA. In particular, I want customDimensions.index = 43, customDimensions.index = 24 and customMetrics.index = 14. To indicate size 43 is an object that is being viewed or sold, size 24 tells me if they are visible and metric 14, the value is 1 when it was just sold. My end result should look like this:
customDimension.value( when index=43) count(when customDimension.index=24 and customDimension.value=='ficha') count(when customMetrics.index=14 and customMetrics.value ==1))
Grouped by customDimension.value (with index = 43) I know that every time luck is sent using customMetrics.index = 14, the same hit has customDimensions.index = 43, the same thing, customDimensions.index = 24 is always has customDimensions.index = 43. I really managed to create SQL that does what I want, but at what price? He is big, he is slow, he is ugly. What I am doing now is:
- Create three tables, all having visitId, hit.time and value when index = 14,24,43
- Left join 43 with 24 ON 43.visitId == 24.visitId AND 43.hits.time == 24.hits.time as a result
- Left connection result from 14 ON 14.visitId == result.visitId AND 14.hits.time == result.hits.time
I'm not interested in visitId or hits.time, it's just a way to link the same hits (and find out which product they bought when customMetrics.index = 14 and value = 1.
This is my code:
SELECT Tviviendasvisitas.viviendaId as ViviendaID ,sum(Tviviendasvisitas.NumeroVisitas) as NumeroVisitas,sum(Ttransacciones.Transactions) as Transactions FROM ( SELECT Tviviendas.visitId as visitId, Tviviendas.hits.time as visitTime, Tviviendas.ViviendaID as viviendaId,Tvisitas.visitas as NumeroVisitas FROM ( SELECT visitId,hits.time,hits.customDimensions.value as ViviendaID FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))) WHERE hits.customDimensions.index = 43 GROUP EACH BY visitId,hits.time, ViviendaID)as Tviviendas LEFT JOIN EACH( SELECT visitId,hits.time,count(*) as visitas FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))) WHERE hits.customDimensions.index = 24 AND hits.customDimensions.value=='ficha' GROUP EACH BY visitId,hits.time) as Tvisitas ON Tvisitas.visitId==Tviviendas.visitId AND Tvisitas.time==Tviviendas.time) as Tviviendasvisitas LEFT JOIN EACH ( SELECT visitId ,hits.time as transactionTime, sum(hits.customMetrics.value) as Transactions FROM(TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))) WHERE hits.customMetrics.index = 14 AND hits.customMetrics.value=1 GROUP BY visitId, transactionTime) as Ttransacciones ON Tviviendasvisitas.visitId==Ttransacciones.visitId AND Tviviendasvisitas.visitTime==Ttransacciones.transactionTime GROUP BY ViviendaID
Running this query takes too long to create a propper toolbar with the results.
So God help me if this is my final result. I feel that WAY should be a more elegant solution to this problem, but I cannot find it on my own.
reference