Google Analytics via BigQuery

When connecting Google Analytics (GA) to BigQuery, your data is stored in a format known as sharding. Table sharding is the practice of storing data in multiple tables, using a naming prefix such as [PREFIX]_YYYYMMDD. This presents a challenge when doing analytics on the data, as the data is not in a single table (the BigQuery documentation also recommends against using table sharding).

When importing this directly into Datawisp, Datawisp would treat each shard as a separate table -- this would lead to a very poor experience overall as you end up with hundreds of tables when you logically only really want one.

In order to solve this, we have created this guide to show you how to create a view in BigQuery that will present the data in a single table to Datawisp.

Create a view in BigQuery

First, we need to create a view in BigQuery that will present the data in a single table to Datawisp. Open the BigQuery console create a new SQL query.

There, paste the following SQL statements:

CREATE OR REPLACE VIEW `<dataset>.ga_events` AS 
SELECT
  * EXCEPT(event_params),
  parse_json((SELECT 
      CONCAT(
      '{',
        STRING_AGG(
          CONCAT(
            TO_JSON_STRING(ep.key),
            ': ',
            COALESCE(
              NULLIF(TO_JSON_STRING(ep.value.int_value), 'null'),
              NULLIF(TO_JSON_STRING(ep.value.string_value), 'null'),
              NULLIF(TO_JSON_STRING(ep.value.double_value), 'null'),
              NULLIF(TO_JSON_STRING(ep.value.float_value), 'null'),
              'null'
            )
          )
        ),
      '}'
    )
    FROM UNNEST(event_params) AS ep
  ), wide_number_mode=>'round') as event_params
FROM `<dataset>.events_*`;
CREATE OR REPLACE VIEW `<dataset>.ga_users` AS select * from `<dataset>.users_*`;
CREATE OR REPLACE VIEW `<dataset>.ga_pseudonymous_users` AS select * from `<dataset>.pseudonymous_users_*`;

Google Analytics supports exporting both Event Data (events_YYYYMMDD) and User Data (users_YYYYMMDD, pseudonymous_users_YYYYMMDD).

This guide shows you how to import all of those tables into Datawisp. If you don't have some of them (e.g. because you did not enable the export of User Data), simply omit the respective steps.

For example, if you don't have any events_YYYYMMDD tables, simply remove the first CREATE VIEW statement in the above SQL snippet.

Make sure to replace <dataset> with the name of your dataset.

Then press Run.

Grant Access Permissions

At this point the GA-specific preparations are complete, so you can now get started with the generic BigQuery import guide.

Follow this guide until you reach the step Grant Access to the Dataset to the Service Account. At this point, it is important that you do not grant the BigQuery Data Viewer role on the entire dataset to the service account. Grant the BigQuery Metadata Viewer role instead.

Then, on the dataset go to Sharing -> Authorize Views.

Authorize the views that you created in the previous step.

Then, on each of the views, grant the BigQuery Data Viewer role.

This setup effectively hides all the sharded tables from Datawisp and only grants access to the views.

Then, proceed with the rest of the guide.

Datawisp will now be able to import your GA data correctly.

Last updated

Logo

Copyright 2023 Datawisp, Inc.