Loading data in Jupyter notebooks

A Jupyter notebook is a document that allows one to write code, notes and explanations as well as visualize the output of the code all in a single page.

Its uses include data cleaning and transformations, data visualization, machine learning, simulations among others. Its numerous uses allow for working with multiple and varied data-sets from excel and .csv files to .png and .tiff files.Working with multiple data sets can be confusing and at times, one may be stuck at the starting point; how to load the data into a notebook for use.

1. Local environment

It is always wise to have all your data in a single location to ease identification and access to what you might need.

When working with a small data-set opting to work in the local environment is the way to go. This is with the assumption that one has installed the needed prerequisites to run a notebook. This includes having installed Python and Jupyter and all this can be done by installing Anaconda. VS Code also has the capability to run Jupyter notebooks but it still requires an installation of Jupyter notebook thus the first option might be easier

Image 1: Sample Folder contents in the local PC -this is an image based on a folder in a PC using Ubuntu

This location can then be set within a notebook for use within the Jupyter notebook as in Image 2 below:

Image 2: reading a folder in the local environment within a jupyter notebook

2. Google Drive & colab

Image 3: Same content uploaded into a google drive.

  • Within a Google Drive, click on the new button then the file upload to upload a zipped file or the folder upload to upload a folder
  • Uploading zipped files is faster but one has to be aware of how to unzip the files for use

Once uploaded, the content will be as in Image 3.

The drive contents can then be read into a notebook. Google has a Jupyter Notebook environment hosted into the cloud and which anyone can use. Colab(colaboratory), the environment only requires one to sign in to a google account to use it and thus create or run existing notebooks on it.

To load data that is on the drive, one mounts the drive onto the notebook

Image 4: Mounting the drive into a Colab notebook session
Image 5: the contents of the mounted drive path

In the case of Zipped contents, one can have the command below into a cell for unzipping the file

Image 6: Unzipping a file into a colab notebook session

This requires one to give the path where the contents will be extracted as well as the path of the zipped file.

The code from the screenshots

# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# Enter the foldername in your Drive where you have saved the unzipped
# folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = 'assignments/assignment1'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

%cd /content/drive/My\ Drive/$FOLDERNAME


import os
print(os.listdir('/content/drive/My Drive/{}'.format(FOLDERNAME)))

One can paste the entire code into a single cell within Jupyter or Colab. Remember to change the folder name to your specific folder

Note: Google Drive authentication will be required for the mounting to occur and to access the drive

Leave a Reply

Your email address will not be published. Required fields are marked *

More Reading

Post navigation