Setting up Kaggle in Google Colab

I want all the data and I want it now!

Photo by Logan Fisher on Unsplash

You know where all those datasets are and you know where you want them to go, but how do you easily move your datasets from Kaggle into Google Colab without a lot of complicated madness?

Let me show you!

Discovering the joy that is Google Colab was definitely one of the smartest things I’ve done since getting started with deep learning, machine learning, and AI. Google Colab provides free GPU (for real!) to pretty much anyone who wants it. If you’re just getting started, you need to get on Colab! I wrote another article that covers getting set up in Colab for the first time, but getting Kaggle up and running in Colab really deserves its own article.

Photo by Oscar Söderlund on Unsplash

Although Colab is extremely user-friendly, there are a few details that you might want help with while getting yourself set up.

Kaggle, it turns out, is one of those details.

Kaggle needs a little finesse. A little love. However, if you’re after those sweet, sweet datasets, you want to get this working! It’s actually really simple; there are just a few easy steps you need to take. If you just want to view the code on GitHub and move on with your day (things can get a little…verbose…around here), you are welcome to do so!

Here’s the simplest way I’ve found to access the Kaggle data for the first time:

Getting Started

(One quick note: in order to be able to access the Kaggle data, you’ll need to be signed up with Kaggle (free!) and agree to the terms and conditions of the competition that you want to participate in.)

First, grab your token from Kaggle.

Go to your account page (the drop-down menu in the top right corner of the screen will take you there).

Then scroll down to API and hit “Create New API Token.”

That’s going to download a file called kaggle.json. Make sure you know where this file is! Maybe put it somewhere you can find it…

Just a suggestion.

Open the file and you’ll see something that looks a lot like this:

{“username”:”YOUR-USER-NAME”,”key”:”SOME-VERY-LONG-STRING”}

Have that thing handy for a future copy-and-paste!

Next, go to Colab and start a new notebook. I’m a big fan of getting up and running on GPU right away, and to do that, go to the “runtime” drop-down menu, select “change runtime type” and then select GPU in the “Hardware accelerator” drop-down menu. Then hit SAVE.

Next, you’ll want to install Kaggle. It’s almost exactly like installing it in your Jupyter Notebooks, but Colab wants an exclamation point at the beginning of your code. Just run:

!pip install kaggle

You can use !ls to check if you already have a folder called Kaggle, or just run

!mkdir .kaggle

to create one.

Next, you’ll want to run the cell below, but please pay attention to a couple of things:

  • there’s no exclamation point on this one
  • you definitely want to change the username and password to the ones you did that copy-and-paste on from your downloaded Kaggle file!
import json
token = {“username”:”YOUR-USER-NAME”,”key”:”SOME-VERY-LONG-STRING”}
with open('/content/.kaggle/kaggle.json', 'w') as file:
    json.dump(token, file)

I did a copy-and-paste when I ran this code and actually had a little trouble. I have no idea why, but I had to delete and re-type the single apostrophes in the code above to get that cell to run properly. If you’re popping an error code for no discernable reason, give that a try!

Next, run

!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json

and then

!kaggle config set -n path -v{/content}

You’ll get a warning that looks like this:

You can easily fix that by running:

!chmod 600 /root/.kaggle/kaggle.json

After that, you should be able to run

!kaggle datasets list

To access a list of Kaggle datasets.

If you’re looking for a specific dataset, you can run something like

!kaggle datasets list -s sentiment

in order to list, for example, datasets that include “sentiment” in their titles.

Now it’s time to start having real fun!

Downloading the Data

Go to Kaggle, find the dataset you want, and on that page, click the API button (it will copy the code automatically).

You’ll paste that code into your next cell, but make sure you add that exclamation point to the beginning of the cell and add -p /contentto clarify your path.

!kaggle datasets download -d kazanova/sentiment140 -p /content

To unzip your files, run

!unzip *.zip

Welcome to Data Town!!! Want to take a look? Try running:

import pandas as pd
d = pd.read_csv('training.1600000.processed.noemoticon.csv')
d.head()

(substitute a filename in your dataset for the filename above, of course.)

Now get out there and create something amazing!

Photo by Fidel Fernando on Unsplash

If anyone out there does something seriously awesome with their newly-gotten data, I want to hear about it! Please let everyone know what you’ve created in the responses below.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s