Choosing Different CustomDimensions for BigQuery GA Integration

Sorry for the specific question, but I feel like I'm at a standstill because my SQL knowledge doesn't go that far.

The data that is derived from the BigQuery implementation with the GoogleAnalytics source data is as follows:

|-visitId |- date |- (....) +- hits |- time +- customDimensions |- index |- value +- customMetrics |- index |- value 

I know that there are hits that always send data to GA. In particular, I want customDimensions.index = 43, customDimensions.index = 24 and customMetrics.index = 14. To indicate size 43 is an object that is being viewed or sold, size 24 tells me if they are visible and metric 14, the value is 1 when it was just sold. My end result should look like this:

 customDimension.value( when index=43) count(when customDimension.index=24 and customDimension.value=='ficha') count(when customMetrics.index=14 and customMetrics.value ==1)) 

Grouped by customDimension.value (with index = 43) I know that every time luck is sent using customMetrics.index = 14, the same hit has customDimensions.index = 43, the same thing, customDimensions.index = 24 is always has customDimensions.index = 43. I really managed to create SQL that does what I want, but at what price? He is big, he is slow, he is ugly. What I am doing now is:

  • Create three tables, all having visitId, hit.time and value when index = 14,24,43
  • Left join 43 with 24 ON 43.visitId == 24.visitId AND 43.hits.time == 24.hits.time as a result
  • Left connection result from 14 ON 14.visitId == result.visitId AND 14.hits.time == result.hits.time

I'm not interested in visitId or hits.time, it's just a way to link the same hits (and find out which product they bought when customMetrics.index = 14 and value = 1.

This is my code:

 SELECT Tviviendasvisitas.viviendaId as ViviendaID ,sum(Tviviendasvisitas.NumeroVisitas) as NumeroVisitas,sum(Ttransacciones.Transactions) as Transactions FROM ( SELECT Tviviendas.visitId as visitId, Tviviendas.hits.time as visitTime, Tviviendas.ViviendaID as viviendaId,Tvisitas.visitas as NumeroVisitas FROM ( SELECT visitId,hits.time,hits.customDimensions.value as ViviendaID FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))) WHERE hits.customDimensions.index = 43 GROUP EACH BY visitId,hits.time, ViviendaID)as Tviviendas LEFT JOIN EACH( SELECT visitId,hits.time,count(*) as visitas FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))) WHERE hits.customDimensions.index = 24 AND hits.customDimensions.value=='ficha' GROUP EACH BY visitId,hits.time) as Tvisitas ON Tvisitas.visitId==Tviviendas.visitId AND Tvisitas.time==Tviviendas.time) as Tviviendasvisitas LEFT JOIN EACH ( SELECT visitId ,hits.time as transactionTime, sum(hits.customMetrics.value) as Transactions FROM(TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))) WHERE hits.customMetrics.index = 14 AND hits.customMetrics.value=1 GROUP BY visitId, transactionTime) as Ttransacciones ON Tviviendasvisitas.visitId==Ttransacciones.visitId AND Tviviendasvisitas.visitTime==Ttransacciones.transactionTime GROUP BY ViviendaID 

Running this query takes too long to create a propper toolbar with the results.

So God help me if this is my final result. I feel that WAY should be a more elegant solution to this problem, but I cannot find it on my own.

reference

+7
sql google-analytics google-bigquery
source share
2 answers

You should be able to structure this query without joins using BigQuery scope aggregation ( WITHIN clause ). Here is a small example that may not be entirely logical, but should illustrate some of the possibilities:

 SELECT visitId, hits.time, SOME(hits.customDimensions.index = 43) WITHIN RECORD AS has43, SUM(IF(hits.customDimensions.index = 24 AND hits.customDimensions.value = 'ficha', 1, 0)) WITHIN RECORD AS numFichas, SUM(IF(hits.customMetrics.index = 14, hits.customMetrics.value, 0)) WITHIN RECORD AS totalValues FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))) HAVING has43 

The example shows three WITHIN RECORD aggregates, which means that they will be calculated from the repeating fields of the same record. SOME () takes a boolean expression and returns true if any field in the record matches that expression. Thus, has43 will be true for visits that have one or more hits with customDimensions.index = 43. The HAVING clause filters out entries where it is false.

SUM (IF (...)) expressions calculate the total number of customDimensions with index = 24 and value = 'ficha' and the total values ​​associated with customMetrics with index = 14.

+4
source share

If you just want to get the value from customLimvel hitLevel and add it to your own column, this is a neat trick:

 SELECT fullVisitorId, visitId, hits.hitNumber, MAX(IF(hits.customDimensions.index=43, hits.customDimensions.value, NULL)) WITHIN hits AS product, FROM [tableID.ga_sessions_20150305] LIMIT 100 
0
source share

All Articles