google analytics and bigquery, by javier ramirez, from datawaki
DESCRIPTION
Google Analytics is great, but having access to your raw data and being able to query it any way you want is much more powerful. Learn how you can integrate Analytics and BigQuery to unleash all your data potential. Talk delivered at Conversion Thursday LondonTRANSCRIPT
javier ramirez@supercoco9
Get more fromAnalytics with
Google BigQuery
about me
19 years working on software: banking, e-commerce, government, CMS, start-ups...
founder of https://datawaki.comhttps://teowaki.com
https://teowaki.com/services
Google Developer Expert on the Cloud Platform
mail: [email protected] twitter: @supercoco9
datawaki
BigQueryis awes..
I use GoogleAnalytics
javier ramirez @supercoco9 https://teowaki.com
javier ramirez @supercoco9 https://teowaki.com
Isn't Google Analytics good enough?
javier ramirez @supercoco9 https://teowaki.com
Google Analytics is great but...
It lets you access aggregated data and sampled reports, not individual sessions/visits data.
Even premium accounts get sampled reports when there are too many data (and not all the reports can be unsampled).
javier ramirez @supercoco9 https://teowaki.com
Google Analytics is great but...
If you need to manage many different segments, and if you want to combine segments, it can get tricky.
Moreover, you can only segment or create reports using the pre-defined filters, which might or not be enough for you*.
*even if segments have experienced a huge improvement with Universal Analytics
javier ramirez @supercoco9 https://teowaki.com
Google Analytics is great but...
It's not easy to cross data in Analytics with data from other sources (CRM, invoicing system...)
Now you can use Import Data from UniversalAnalytics, but there are many constraints to what you can do
javier ramirez @supercoco9 https://teowaki.com
Google Analytics is great but...
Good for knowing what's happening in your application, but difficult for:
* business intelligence/big data (data mining, find patterns...)
* machine learning (classify information, predict future trends...)
Designed to run analytics over huge volumes of raw data, and to integrate with other data sources
javier ramirez @supercoco9 https://teowaki.com
Google BigQuery
onemorething
Google Analytics Premium users get free daily exports from GA to BigQuery.
javier ramirez @supercoco9 https://teowaki.com
Google BigQuery + GA Premium
All your raw data.Unsampled.Use it however you want.
BOOM!javier ramirez @supercoco9 https://teowaki.com
Google BigQuery + GA Premium
o'reilly
khan academy
it's just SQL
javier ramirez @supercoco9 https://teowaki.com
SQL is not very hardGive me the count of visitors from our analytics who visited yesterday using a mobile device, by country
SELECT count(fullVisitorId) from ga_sessions_20141203where device.isMobile = true GROUP BY geoNetwork.country
data schema
javier ramirez @supercoco9 https://teowaki.com
SELECT trafficSource.source, SUM( totals.transactions ) AS total_transactionsFROM playground.ga_sessions_20140621GROUP BY trafficSource.sourceORDER BY total_transactions;
basic queries (metric/dimension)
SELECT device.isMobile, SUM ( totals.pageviews ) AS total_pageviewsFROM playground.ga_sessions_20140621GROUP BY device.isMobileORDER BY total_pageviews;
SELECT IF(DOMAIN(trafficSource.source) is null,
trafficSource.source,DOMAIN(trafficSource.source))
AS normalized_source, SUM ( totals.transactions ) AS total_transactions
FROM playground.ga_sessions_20140621GROUP BY normalized_sourceORDER BY total_transactions;
basic queries with a twist
SELECT ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS avg_revenue_by_user_per_visitFROM ( SELECT SUM(totals.visits) AS total_visits_per_user, SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user, visitorId FROM playground.ga_sessions_20140621 WHERE totals.visits>0 AND totals.transactions>=1 AND totals.transactionRevenue IS NOT NULL GROUP BY visitorId ) ;
Average amount spent per visit
2 segments, combined
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantityFROM playground.ga_sessions_20140621WHERE fullVisitorId IN ( SELECT fullVisitorId FROM playground.ga_sessions_20140621 WHERE hits.item.productName CONTAINS 'Light Helmet' AND totals.transactions>=1 GROUP BY fullVisitorId ) AND hits.item.productName IS NOT NULL AND hits.item.productName !='Light Helmet'GROUP BY other_purchased_productsORDER BY quantity DESC;
Users who bought product A,also bought product B
SELECT prod_name, count(*) as transactionsFROM(SELECT fullVisitorId, min(date) AS date, visitId, hits.item.productName as prod_nameFROM (SELECT fullVisitorId, date, visitId, totals.transactions, hits.item.productName FROM (TABLE_DATE_RANGE([dataset.ga_sessions_], TIMESTAMP('2014-06-01'), TIMESTAMP('2014-06-14'))))WHERE fullVisitorId IN(SELECT fullVisitorIdFROM (TABLE_DATE_RANGE([dataset.ga_sessions_], TIMESTAMP('2014-06-01'), TIMESTAMP('2014-06-14'))) GROUP BY fullVisitorId HAVING SUM(totals.transactions) > 1)AND hits.item.productName IS NOT NULLGROUP BY fullVisitorId, visitId, prod_name ORDER BY fullVisitorId DESC)GROUP BY prod_name ORDER BY transactions DESC;
* example query from the lunametrics blog. Check them out for more awesomeness
Products that are purchasedand lead to other products being purchased
SELECT fullvisitorID, visitID, visitNumber, hits.page.pagePathFROM playground.ga_sessions_20140621where hits.type='PAGE'order by fullvisitorID, visitID, hits.hitnumber asc
Identify user path/user actions
individual users data is awesome
Cross CRM data with individual users actions to seehow your response to incidents affect your users.
Use the “frequently bought together” query and findusers who didn't buy the related products. Send ane-mail campaign with an offer for those products.
integrating with external data sources
* Connectors/REST API* Export into GCS* Import into BigQuery
javier ramirez @supercoco9 https://teowaki.com
What if I don't havea GA Premium
Account?
just send your own data
javier ramirez @supercoco9 https://teowaki.com
define a data structure that fits your needs (or replicate the one GA provides), use a JS snippet to send data to your server, thento BigQuery**
..you will miss many of the GA dimensions, butyou can keep using GA and use BigQuery only for your unsampled data
datawaki** If you want to do this without managing your own servers, we can help you
BigQuery pricing
$20 per stored TBA site with 50m pageviews, would pay less than $10 a month per every 6 months worth of data
$5 per processed TB*the 1st TB every month is free of charge
** GA premium get $500 free credit monthlyjavier ramirez @supercoco9 https://teowaki.com
for GA premium users BigQuery is effectively
for free
*unless you upload huge external data or make huge queries
javier ramirez @supercoco9 https://teowaki.com
Want to know more?https://cloud.google.com/products/bigquery/
https://datawaki.com
Need help?https://teowaki.com/services
Thanks!
Javier Ramírez@supercoco9