Spotted a bug? Have a great idea? Help us make google.dev great!

BigQuery is a fully-managed, petabyte-scale, low-cost enterprise data warehouse for analytics. BigQuery is serverless. You do not need to set up and manage clusters.

A BigQuery dataset resides in a GCP project and contains one or more tables. You can query these datasets with SQL.

In this codelab, you will use the BigQuery web UI in the GCP Console to query public datasets, load your own data, and export data to a Google Cloud Storage bucket.

For more information about BigQuery, see BigQuery documentation.

What you'll learn

  • How to query public datasets
  • How to load data into a table
  • How to export table data

What you'll need

To complete this lab, you need:

  • The latest version of Google Chrome
  • A Google Cloud Platform billing account

To work with BigQuery, you need to create a GCP project or select an existing project.

Create a project

To create a new project, follow these steps:

  1. If you don't already have a Google Account (Gmail or Google Apps), create one.
  2. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project.
  3. If you don't have any projects, click the create project button:

Otherwise, create a new project from the project selection menu:

  1. Enter a project name and select Create. Note the project ID, is a unique name across all Google Cloud projects.

BigQuery allows you to work with public datasets, including BBC News, GitHub Repos, Stack Overflow, and the US National Oceanic and Atmospheric Administration (NOAA) datasets. You do not need to load these datasets into BigQuery. You just need to open the datasets to browse and query them in BigQuery.

Browse the Stackoverflow dataset

The Stack Overflow dataset contains information about posts, tags, badges, comments, users, and more. To browse the Stack Overflow dataset in the BigQuery web UI, follow these steps:

  1. Open the Stackoverflow dataset. The BigQuery web UI opens in the GCP Console and displays information about the Stackoverflow dataset.
  2. In the navigation panel , select bigquery-public-data. The menu expands to list public datasets. Each dataset comprises one or more tables.
  3. Scroll down and select stackoverflow. The menu expands to list the tables in the Stack Overflow dataset.
  4. Select badges to see the schema for the badges table. Note the names of the fields in the table.
  5. Above the Field names, click Preview to see sample data for the badges table.

For more information about all the public datasets available in BigQuery, see Google BigQuery Public Datasets.

Query the Stackoverflow dataset

Browsing a dataset is a good way to understand the data that you are working with, but querying datasets is where BigQuery really shines. This section teaches you how to run BigQuery queries. You do not need to know any SQL at this point. You can copy and paste the queries below.

To run a query, complete the following steps:

  1. Near the top right of the GCP console, select Compose new query.
  2. In the Query editor text area, copy and paste the following SQL query. BigQuery validates the query and the web UI displays a green checkmark below the text area to indicate that the syntax is valid.
#standardSQL
SELECT
  name,
  COUNT(*) AS total_granted
FROM `bigquery-public-data.stackoverflow.badges`
GROUP BY name
ORDER BY total_granted DESC
LIMIT 10
  1. Select Run. The query returns 10 different names, such as Teacher, Student, and Editor, and the total count of those names in the table.

To run a more complex query on the same dataset, complete the following steps:

  1. Near the top right of the GCP console, select Compose new query.
  2. In the Query editor text area, copy and paste the following SQL query. BigQuery validates the query and the web UI displays a green checkmark below the text area to indicate that the syntax is valid.
#standardSQL
SELECT badge_name AS First_Gold_Badge, 
       COUNT(1) AS Num_Users,
       ROUND(AVG(tenure_in_days)) AS Avg_Num_Days
FROM
(
  SELECT 
    badges.user_id AS user_id,
    badges.name AS badge_name,
    TIMESTAMP_DIFF(badges.date, users.creation_date, DAY) AS tenure_in_days,
    ROW_NUMBER() OVER (PARTITION BY badges.user_id
                       ORDER BY badges.date) AS row_number
  FROM 
    `bigquery-public-data.stackoverflow.badges` badges
  JOIN
    `bigquery-public-data.stackoverflow.users` users
  ON badges.user_id = users.id
  WHERE badges.class = 1 
) 
WHERE row_number = 1
GROUP BY First_Gold_Badge
ORDER BY Num_Users DESC
LIMIT 10
  1. Select Run. The query returns the top 10 gold badges, ranked by how many users got them as their first gold badges. The query also determines how many days it took for these gold badges to be obtained on average. This query is relatively complex, so it may take longer to process than many BigQuery queries. You can see a representative result in the table below. Your results might differ because the public datasets are updated regularly.

Row

First_Gold_Badge

Num_Users

Avg_Num_Days

1

Famous Question

176982

1169.0

2

Fanatic

15085

618.0

3

Unsung Hero

12875

595.0

4

Great Answer

10641

1363.0

5

Electorate

5769

829.0

6

Populist

5491

1227.0

7

Steward

1005

993.0

8

Great Question

582

717.0

9

Copy Editor

253

595.0

In the previous section, you queried public datasets that BigQuery makes available to you. In this section, you will upload a dataset to BigQuery and query that dataset. You will use a specific baby names dataset from US Social Security for this codelab, but you can load your own datasets in many different file formats and work with them in BigQuery.

Load the 2014 US baby names dataset

To load table data into BigQuery, first create a BigQuery dataset to hold the data by completing the following steps:

  1. Download the baby names zip file. The zip file, provided by the US Social Security Administration, contains approximately 7 MB of data about popular baby names.
  2. Unzip the file in a local folder and note its location.
  3. In the GCP console navigation panel, select the project name that begins with bqproject.
  4. On the right side, in the details panel, select Create dataset.
  5. In the Create dataset dialog, for Dataset ID, type babynames. Leave all of the other default settings in place and click OK.

Next, create a table inside the dataset and load your data into the table by completing the following steps:

  1. In the navigation panel, select the babynames dataset that you just created. If the dataset does not appear immediately, refresh your browser.
  2. On the right side, in the details panel, click Create table.
  3. In the Create table dialog, provide the following information:
  • Create table from: Upload
  • Select file: yob2014.txt. To find this file, navigate to your unzipped names directory.
  • File format: CSV
  • Project name: BQProject
  • Dataset name: babynames
  • Table name: names_2014
  1. Still in the Create table dialog, toggle Edit as text to On and copy and paste the following text string into the text area below:
name:string,gender:string,count:integer
  1. Click Create table.
  1. In the navigation panel, confirm that BigQuery created your table by selecting the project name that begins with bqproject and then the dataset babynames. You should see a table names_2014. In the next section, you will query that table.

For more information about loading data into BigQuery, see Introduction to Loading Data in the BigQuery documentation.

Query the 2014 US baby names dataset

Now that you have loaded a dataset, you can query that dataset in BigQuery. To query the 2014 US baby names dataset, complete the following steps:

  1. Near the top right of the GCP console, select Compose new query.
  2. In the Query editor text area, copy and paste the following SQL query.
#standardSQL
SELECT
  DISTINCT name
FROM
  `babynames.2014_names`
LIMIT
  10
  1. Select Run. The query returns 10 different names, such as Emma, Olivia, and Sophia.

Just as you can load datasets into BigQuery, you can export datasets from BigQuery to Google Cloud Storage.

Export the 2014 US baby names dataset

To export a dataset, complete the following steps:

  1. In the navigation panel, select the names_2014 table.
  2. On the right side, in the details panel, select Export and Export to GCS.
  3. In the Export to Google Cloud Storage dialog, select Browse.
  4. In the Save as dialog, select the new bucket icon, which appears as a bucket icon with a + sign.
  5. In the Create a bucket dialog, enter a globally unique name for your bucket and select Continue.
  1. Leave the default settings for all other options in the Create a bucket dialog and select Create.
  2. In the Save as dialog, enter babynames2014.csv for Name, choose the bucket that you created for Location, and select Select.
  3. In the Export to Google Cloud Storage dialog, select CSV as the export format and select Export.

Confirm your data export

To confirm that your export succeeded and to find your exported file, perform the following steps:

  1. In the GCP navigation menu, select Storage to open Google Cloud Storage.
  2. In the details panel, select the name of the bucket that you created.
  3. In the details panel for your bucket, you should see babynames2014.csv in the file list. You can select the file to download your CSV file if you like.

Unless you plan to continue working with the 2014 US baby names dataset, you should delete the BigQuery dataset, delete the Cloud Storage bucket and file, and delete the project that you created for this codelab.

Delete the BigQuery dataset

To delete the BigQuery dataset, perform the following steps:

  1. In the GCP navigation menu, select BigQuery to open BigQuery.
  2. In the navigation panel, select the project that you created for this codelab. The menu expands to display the datasets in your project.
  3. Select the babynames dataset.
  4. In the details panel, select Delete dataset.
  5. In the Delete dataset dialog, enter babynames and select Delete to confirm that you want to delete the dataset.

Delete the Storage bucket

To delete the Cloud Storage bucket and all of the files inside the bucket, perform the following steps:

  1. In the GCP navigation menu, select Storage to open Google Cloud Storage.
  2. In the details panel, select the checkbox next to the bucket that you created.
  3. Near the top of the details panel, select Delete to delete your bucket and all of its contents.
  4. In the Delete bucket dialog, select Delete to confirm deleting your bucket and all of its contents.

Delete the project

To delete the GCP project that you created for this codelab, perform the following steps:

  1. In the GCP navigation menu, select IAM & Admin.
  2. In the navigation panel, select Settings.
  3. In the details panel, confirm that your current project is BQProject and select Shut down.
  4. In the Shut down project dialog, enter the project ID (not project name) for your project and select Shut down to confirm.

Congratulations! You have now learned how to use the BigQuery web UI to query public datasets, export data, and load your own data. Note that you did not have to set up or manage clusters to work with datasets.