Beefing Up BigQuery Cost Data with INFORMATION_SCHEMA and Audit Logs
To deepen your BigQuery cost insights beyond raw billing data, this article shows you how to break down usage by jobs and tables using INFORMATION_SCHEMA and Audit Logs. It highlights the valuable parts of INFORMATION_SCHEMA.JOBS and TABLE_STORAGE_USAGE_TIMELINE, and also compares it to Audit Logs.

To augment your billing data with more details, there are two primary technical data sources in BigQuery: the INFORMATION_SCHEMA and the Audit Logs. Compared to the Billing Exports, these provide more granular information that can be used to apportion costs at the job and table levels.
Using the INFORMATION_SCHEMA
Before we dive in, let's clarify what the INFORMATION_SCHEMA
is. It is a set of read-only system views that let you query valuable metadata about jobs (queries), tables, datasets, and more. It is very convenient to use, as you can use SQL to return and analyze the data, as opposed to more complicated API calls.
The most valuable parts of the INFORMATION_SCHEMA
from a FinOps perspective are found in these two views:
- For jobs/queries that consume much of the compute resources, the
INFORMATION_SCHEMA.JOBS
view provides detailed insight. - For tables that are in direct relation to your storage consumption, the
INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
view provides the daily usage details over time.
The INFORMATION_SCHEMA.JOBS
View
The INFORMATION_SCHEMA.JOBS
view, also known as the INFORMATION_SCHEMA.JOBS_BY_PROJECT
view, provides the data to break down the cost by job/query, identified by job_id
, and it offers various facts and dimensions useful for usage and cost apportionment.
The usage facts for apportionment are:
total_bytes_billed
for on-demand consumption, which is based on scanned bytes, with a minimum billing of 10MB, even if less data was scanned.total_slot_ms
for capacity-based consumption, which is based on slot usage measured in time (milliseconds). When estimating costs, reservations and commitments also need to be considered, as the baseline will need to be paid even if not used, and the unit price will depend on the level of commitments.
Then, to slice and dice the above consumption facts, the most crucial usage dimensions are:
- projects with
project_id
, - reservations with
reservation_id
, - user or service account with
user_email
, - timeline with
start_time
andend_time
.
If this still does not suffice, even more details are available for more technically inspired analyses:
- To distinguish job and statement types, there are
job_type
andstatement_type,
respectively. - To analyse cache hits (e.g., ratio to all), you can use the boolean
cache_hit.
- To understand more about errors, you can check out
error_result
, which returns null on success or more details about the error in JSON on error. - To relate jobs to tables, use
referenced_tables
ordestination_table
.
The TABLE_STORAGE_USAGE_TIMELINE
View
The INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
view (different from the INFORMATION_SCHEMA.TABLE_STORAGE
view!) provides daily granularity data over time to break down storage costs at the table level, identified by the TABLE_NAME
.
There are various storage usage facts, which are measured in MiBs, and they all start with BILLABLE_*
by convention:
- For logical storage (total, active, long-term):
BILLABLE_TOTAL_LOGICAL_USAGE
BILLABLE_ACTIVE_LOGICAL_USAGE
BILLABLE_LONG_TERM_LOGICAL_USAGE
- For physical storage (total, active, long-term):
BILLABLE_TOTAL_PHYSICAL_USAGE
BILLABLE_ACTIVE_PHYSICAL_USAGE
BILLABLE_LONG_TERM_PHYSICAL_USAGE
Then the key dimensions for grouping and aggregating your consumption are:
- projects with
PROJECT_ID
, - datasets with
TABLE_SCHEMA
(practically containing the dataset ID), - timeline with
USAGE_DATE
(as this view provides daily granularity data).
Tips & Tricks, Plus Some Limitations to Watch Out For
For enterprise-scale users with multiple projects and regions, additional work is required to obtain a complete view of a payer account. There is no consolidated view that gives you the full picture out of the box, so you will need to consider a billing period, and then:
- Identify all project IDs from the billing export.
- Iterate through all projects AND regions to collect data from the INFORMATION_SCHEMA.
- Eventually, consolidate them.
To accomplish this, you will need the necessary permissions to query this data. Otherwise, your query will return an incomplete result or an error.
Jobs' and tables' consumption are measured in various units and scales (e.g., bytes vs. MiB). I recommend converting them to the same scale. This is also useful for matching the pricing sheet and the pricing calculator units, where the preferred scales of measurement are TiB for on-demand consumption (data scanned) and storage, and slot-hours for capacity-based compute consumption. Here is how you can do the conversions in SQL:
- Bytes to TiB for on-demand consumption
total_bytes_billed / POW(1024, 4)
. - MiB to TiB for storage consumption
BILLABLE_*_USAGE / POW(1024, 2)
. - Ms (milliseconds) to hours for capacity-based consumption
SAFE_DIVIDE(total_slot_ms, 1000*60*60)
.
Scripted queries have both aggregated data at the parent level and split data at the child level. To avoid double-counting, exclude the aggregate. The way to do this is to exclude the SCRIPT statement type by adding the condition WHERE statement_type != 'SCRIPT'
.
The provider uses Pacific Time (PST/PDT) for discriminating billing months and days, and daylight saving time also applies. Make sure that you consider this when using UTC (Universal Time Coordinated) or your local time zone when working with your data. The provider consequently uses the TIMESTAMP
data type which represents an absolute point in time. To interpret this in a specific time zone, BigQuery provides great convenience functions, so in this case the conversion is easy using the "America/Los_Angeles"
time zone, with one of these functions:
DATE(your_timestamp, "America/Los_Angeles")
for aDATE
in PST/PDT, andDATETIME(your_timestamp, "America/Los_Angeles")
for aDATETIME
in PST/PDT.
There are also limitations for RLS, ML, and Spark.
For queries over tables with RLS (Row Level Security), BigQuery hides specific statistics, including billable usage.
For BigQuery ML (Machine Learning), on-demand pricing depends on the type of model created, which is not tracked in INFORMATION_SCHEMA.JOBS
, and instead it assumes the higher-billed model types.
For Apache Spark, there is a limitation on the SKU-level reporting, which makes the SKU indistinguishable from BigQuery Enterprise Edition pay-as-you-go SKU.
If any of these limitations affect you, read the official docs for more details.
Using Audit Logs
While I prefer using the INFORMATION_SCHEMA
for billing data enrichment, Audit Logs can also be used as an alternative for job/query cost apportionment.
BigQuery provides several types of logs, out of which the AuditData
is the most interesting for FinOps. Other types include BigQueryAuditMetadata
and AuditLog
, however, those are less relevant for our purposes and are out of scope for this discussion.
When a job – such as query, load, export, copy, or ML – is done (or cancelled), a JobCompletedEvent
log is emitted. This event, along with the AuthenticationInfo
in the Audit Logs, provides the information required to apportion the cost of jobs. It can also be linked to the INFORMATION_SCHEMA
with the matching job identifier.
Here are the most valuable parts from the Audit Log's hierarchical JSON schema:
AuthenticationInfo
(full official docs forAuthenticationInfo
here)principalEmail
– the user or service account executing the job
JobCompletedEvent
(full official docs for BigQueryAuditData
here)Job
JobName
JobId
– the job identifier, which
can be linked to thejob_id
inINFORMATION_SCHEMA.JOBS
ProjectId
– the project identifierlocation
– the region/zone
JobStatistics
- Usage facts:
totalBilledBytes
– can be used to apportion on-demand spend per job or other dimensions based on data scanned, when multiplied by the cost/unittotalSlotMs
– can be used to apportion spend across jobs or other dimensions for capacity-based usage (BigQuery Editions), based on slot usage time, when multiplied by cost/unit (do not forget to account for reservations and commitments, where the baseline needs to be paid even if not used, and the level of commitments determines the unit price)
- Usage facts:
- Useful dimensions:
reservation
– the reservationstartTime
andendTime
– the usage timelinereferencedTables
andreferencedViews
– the linked tables/views
- Useful in calculating unit metrics:
queryOutputRowCount
– for cost per output rowtotalLoadOutputBytes
– for cost per output byte
JobConfiguration
- It holds job-type-specific low-level details in various objects:
query
,load
,extract
,tableCopy
- It holds job-type-specific low-level details in various objects:
JobStatus
state
– holds the state of the job, eitherPENDING
,RUNNING
, orDONE
error
andadditionalErrors
– if the job did not complete successfully, it describes the error(s)
Regarding the technical implementation, I do not have hands-on experience to describe precisely how it is done. Based on the high-level research I had done, the steps involve (1) enabling logging for the desired scope, (2) exporting the logs to BigQuery by creating a sink with the Logs Router, and (3) querying the logs via SQL and building reusable views.
However, as I prefer the INFORMATION_SCHEMA
, I will let you do your own research, should you choose this path. If you do so, you will find these docs a good companion for your journey:
Now, let's explore why I prefer the other path.
Why I Like the INFORMATION_SCHEMA Better
There are many reasons why I think the INFORMATION_SCHEMA
is a better choice, and the most important ones are:
- It contains more information compared to Audit Logs, especially concerning table (storage) data.
- It was designed to serve as a metadata repository, as opposed to Audit Logs, which, as the name suggests, are more event-focused and were primarily intended for usage audit purposes.
- Its storage does not incur additional costs, unlike Audit Logs, for which you will need to pay for its storage if you want to export to BigQuery. Analyses (compute) will still cost money for both.
- It can be queried with simple SQL right out of the box in a tabular format, unlike Audit Logs, for which the native hierarchical structure requires more thinking and unnesting.
So, that's what I wanted to share about enriching your billing data for detailed BigQuery insights and cost and usage apportionment.
I hope you learned something and enjoyed it. That's my pleasure!
If you haven't done it yet, check out my related writing about FinOps for BigQurey:
