BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes of data without needing a database administrator or any infrastructure to manage. BigQuery uses familiar SQL and a pay-only-for-what-you-use charging model. BigQuery allows you to focus on analyzing data to find meaningful insights.
In this codelab, you'll use the bq command-line tool to load a local CSV file into a new BigQuery table.
What you'll learn
- Using BigQuery via the bq command-line tool
- Loading local data files into a BigQuery table
What you'll need
If you don't already have a Google Account (Gmail or Google Apps), you must create one.
- Sign-in to Google Cloud Platform console (console.cloud.google.com) and navigate to BigQuery. You can also open the BigQuery web UI directly by entering the following URL in your browser.
- Accept the terms of service.
- Before you can use BigQuery, you must create a project. Follow the prompts to create your new project.
Choose a project name and make note of the project ID.
The project ID is a unique name across all Google Cloud projects. It will be referred to later in this codelab as
This codelab uses BigQuery resources withing the BigQuery sandbox limits. A billing account is not required. If you later want to remove the sandbox limits, you can add a billing account by signing up for the Google Cloud Platform free trial.
Google Cloud Shell
While this codelab can be operated from your computer, in this codelab we will be using Google Cloud Shell, a command line environment running in the Cloud.
Activate Google Cloud Shell
From the GCP Console click the Cloud Shell icon on the top right toolbar:
Then click "Start Cloud Shell":
It should only take a few moments to provision and connect to the environment:
This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.
Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.
Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Credentialed accounts: - <myaccount>@<mydomain>.com (active)
gcloud config list project
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Updated property [core/project].
Create a dataset to contain your tables.
What is a dataset?
Creating a dataset
In the Cloud Shell, use the
bq mk command to create a dataset called
bq mk bq_load_codelab
Viewing dataset properties
Verify that you created the dataset by viewing the dataset's properties with the
bq show command.
bq show bq_load_codelab
You should see output similar to:
Dataset my-project:bq_load_codelab Last modified ACLs Labels ----------------- -------------------- -------- 15 Jun 14:12:49 Owners: projectOwners, firstname.lastname@example.org Writers: projectWriters Readers: projectReaders
Create a CSV file
In the Cloud Shell, create an empty CSV file.
Open the CSV in the Cloud Shell code editor by running the
cloudshell edit command.
cloudshell edit customer_transactions.csv
It will open a new browser window with a code editor and Cloud Shell panel.
In the editor, enter some comma-separated values to load into BigQuery.
ID,Zipcode,Timestamp,Amount,Feedback,SKU c123,78757,2018-02-14 17:01:39Z,1.20,4.7,he4rt5 c456,10012,2018-03-14 15:09:26Z,53.60,3.1,ppiieee c123,78741,2018-04-01 05:59:47Z,5.98,2.0,ch0c0
Save the CSV with the File -> Edit menu button.
bq load command to load your CSV into a BigQuery table.
bq load \ --source_format=CSV \ --skip_leading_rows=1 \ bq_load_codelab.customer_transactions \ ./customer_transactions.csv \ id:string,zip:string,ttime:timestamp,amount:numeric,fdbk:float,sku:string
You used the following options:
--source_format=CSVuses the CSV data format when parsing the data file.
--skip_leading_rows=1skips the first line in the CSV file, because it is a header row.
bq_load_codelab.customer_transactionsThe first positional argument defines which table the data should be loaded into.
./customer_transactions.csvThe second positional argument defines which file to load. In addition to local files, the
bq loadcommand can load files from Google Cloud Storage, with
schemaA schema can be defined in a JSON schema file or as a comma-separated list. This codelab uses a comma-separated list for simplicity.
customer_transactions table uses the following schema:
id:stringA customer identifier.
zip:stringA United States postal zip code.
ttime:timestampThe date and time that the transaction took place.
amount:numericThe amount of a transaction. A numeric column stores data in decimal form, useful for monetary values.
fdbk:floatThe rating from a feedback survey about the transaction.
sku:stringA an identifier for the item purchased.
Get the table details
Verify that the table loaded by showing the table properties.
bq show bq_load_codelab.customer_transactions
Table my-project:bq_load_codelab.customer_transactions Last modified Schema Total Rows Total Bytes ----------------- --------------------- ------------ ------------- 15 Jun 15:13:55 |- id: string 3 159 |- zip: string |- ttime: timestamp |- amount: numeric |- fdbk: float |- sku: string
Now that your data is loaded, you can query it by using the BigQuery Web UI, the
bq command, or the API. Your queries can join your data against any dataset (or datasets, so long as they all are in the same location) that you have permission to read.
Run a standard SQL query that joins your dataset with the zipcode public dataset and sums up transactions by U.S. state. Use the
bq query command to execute the query.
bq query --nouse_legacy_sql ' SELECT SUM(c.amount) AS amount_total, z.state_code AS state_code FROM `bq_load_codelab.customer_transactions` c JOIN `bigquery-public-data.utility_us.zipcode_area` z ON c.zip = z.zipcode GROUP BY state_code '
This command should output something like:
Waiting on bqjob_r26...05a15b38_1 ... (1s) Current status: DONE +--------------+------------+ | amount_total | state_code | +--------------+------------+ | 53.6 | NY | | 7.18 | TX | +--------------+------------+
The query you just ran used both a public dataset and your own private dataset. Learn more by reading this commented version of the same query:
#standardSQL SELECT /* Total of all transactions in the state. */ SUM(c.amount) AS amount_total, /* State corresponding to the transaction's zipcode. */ z.state_code AS state_code /* Query the table you just constructed. * Note: If you omit the project from the table ID, * the dataset is read from your project. */ FROM `bq_load_codelab.customer_transactions` c /* Join the table to the zipcode public dataset. */ JOIN `bigquery-public-data.utility_us.zipcode_area` z /* Find the state corresponding to the transaction's zipcode. */ ON c.zip = z.zipcode /* Group over all transactions by state. */ GROUP BY state_code
Optionally, delete the dataset you created with the
bq rm command. Use the
-r flag to remove any tables it contains.
bq rm -r bq_load_codelab
You just uploaded a table to BigQuery and queried it!
What we've covered
- Using the
bqcommand-line tool to interact with BigQuery.
- Joining your own data against a public dataset with a BigQuery query.