BigQuery bill for queries based on the amount of data they process. And they give a generous 1TB of query data for free. Because of this it is helpful to be able to track these billable bytes to avoid unplessant billing surprises. The article shows one way of doing that via the audit logs, a sink to BigQuery plus some examples of queries to get the data out for reports like this:

Permissions

Before you start, you will need permissions to complete the task. I found these worked:

  • BigQuery Admin
  • Logs Configuration Writer
  • Service Usage Consumer

Audit Logs

The Google Cloud Logs Explorer is where you can see all the audit logs. In this case we are interested in just the BigQuery queries (jobs) that have billable bytes. You can view those by entering the following into the log filter, running the query, then selecting an appropriate time range using “edit time”. This link runs the filter for you.

protoPayload.methodName="jobservice.jobcompleted"
resource.type="bigquery_resource"
protoPayload.serviceData.jobCompletedEvent.job.jobStatistics.totalBilledBytes!="0"

The billable bytes for each job is inside the details of the log.

Create a Sink to BigQuery

The next step is that we want to automate the sending of these logs to BigQuery. This is done by creating a Sink. You can start the creation of the Sink via “More actions”->”Create sink”.

Give it a name and a description. I chose “bigquery-jobs-to-bigquery” and “Sending BigQuery Jobs to BigQuery”. Then Next.

For step two we want to select the “BigQuery dataset” sink service and then select that we want to create a new dataset.

For the new dataset we want to give it the id of “bq_logs”. I also enable a 31 day table expiration to stop it getting too large. When ready, clic the “Create Dataset” button.

This brings us back to the steps. Click next and you should see our filter already applied. Click next again.

And finally “Create Sink”.

After a while a table will be added to the dataset with new job logs.

Querying the Job Logs

For our solution I schedule a query every day that extracts new rows and inserts them into a table for use in our reports. Here is the SELECT part of that query:

SELECT
      TIMESTAMP_TRUNC(timestamp, DAY) AS day_timestamp,
      protopayload_auditlog.authenticationInfo.principalEmail AS principal_email,
      SUM(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes) AS billed_bytes,  
      COUNT(1) AS billed_query_count,
      COUNTIF(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatus.error.message IS NOT NULL) AS error_count,
      CAST((EXTRACT(DAY FROM CURRENT_DATE()) * 1000 * 1000 * 1000 * 1000) / EXTRACT(DAY FROM LAST_DAY(CURRENT_DATE())) AS INT64) AS budget_trendline_bytes
    FROM
      `bq_logs.cloudaudit_googleapis_com_data_access_*`
    GROUP BY 1, 2
    ORDER BY day_timestamp DESC, principal_email;

In this case I added a budget_trendline_bytes which creates a linear line from zero at the start of the month to 1TB at the end. If your billed_bytes goes over the line you are on the way to having a bill at the end of the month.

I post processed my table to add a rolling 31 day total. This provides a nice line to see day to day estimates on how the month will pan out.

  UPDATE `tag_rocket.query_logs` AS MAIN
  SET rolling_total_bytes = (SELECT 
          SUM(billed_bytes) 
          FROM `tag_rocket.query_logs` AS SUB
          WHERE SUB.day_timestamp <= MAIN.day_timestamp AND SUB.day_timestamp > DATE_SUB(MAIN.day_timestamp,INTERVAL 31 DAY) 
        )
  WHERE rolling_total_bytes IS NULL;

And I also add a month to date field that shows a cumulative line of how the months billable bytes are progressing.

  UPDATE `tag_rocket.query_logs` AS MAIN
  SET month_to_date_bytes = (SELECT 
        SUM(billed_bytes) 
        FROM `tag_rocket.query_logs` AS SUB
        WHERE SUB.day_timestamp <= MAIN.day_timestamp
        AND 
        EXTRACT(MONTH FROM SUB.day_timestamp) = EXTRACT(MONTH FROM MAIN.day_timestamp)
        AND 
        EXTRACT(YEAR FROM SUB.day_timestamp) = EXTRACT(YEAR FROM MAIN.day_timestamp)
      )
  WHERE month_to_date_bytes IS NULL;

These two extra fields let me build the nice Z-chart shown at the start of the article.