Getting Started

# Import the NumPy and Pandas packages

NumPy Practice

Arrays have the same underlying structure as lists, except that all items inside of it must have be same data type. We can construct them using np.array() and index them the exact same way

# Create two NumPy arrays; one from 1-5, and another from 6-10

# Retrieve the first element from your arrays

# Retrieve the first 3 elements from your arrays

We can do mathematical operations between arrays such as addition and subtraction!

# Add, subtract and multiply your arrays

# Find the sum of all elements in your arrays

# Create a 3x3 array of values from 1-9

# Confirm that your array has dimensions of 3x3

Sometimes, it can be important for us to choose/know the data type of the elements inside our array!

# Recreate the same array but with a float data type instead of integer

# Confirm that your array have a float data type

Pandas Practice

Pandas Data Frames have the same underlying structure as Python dictionaries. We can create them by using pd.DataFrame() and index them the exact same way

# Create a data frame with two columns; one with your three favorite foods and
# another the values 1-3

Important Functions:

Below are some important functions that can help you analyze your data. Do NOT get overwhelmed trying to remember all of them, just practice using them and examining the outputs (you can always search these up later!)

Examining your data:

.head(): shows first 5 observations of data
.info(): shows number of rows, columns, blank (“null”) values, and the data types of each variable
.dtype: shows the underlying data type (integers, floats, etc.)

Analyzing your data:

.min(): Minimum value of a column
.max(): Maximum value of a column
.mean(): Average value of a column
.median(): Median value of a column
.sum(): Sum of all values in a column
.corr(): Shows correlation between columns
.value_counts(): Shows number of observations per value in a column
.nunique(): Shows the amount of unique observations in a column

# DO NOT REMOVE #

from google.colab import drive
drive.mount('/content/drive')

# DO NOT REMOVE #

Mounted at /content/drive

Importing data into Google Collab is different than on other Python environments. To prepare for data importing for this worksheet you can follow the steps below: 1) Download the data at: https://www.kaggle.com/datasets/sootersaalu/amazon-top-50-bestselling-books-2009-2019 (you might need to make an account)

Run the code chunk above
Go to Google Drive, and upload the file

# Import the bestsellers dataset and view the head

# Use .info() to see more details about our dataset

# Let's check the type of our dataset!

# Retrieve the "User Rating" column from our dataset

# Find the *data type* of the "User Rating" column

# Find the average amount of reviews in this dataset

# Find the cheapest and most expensive book prices in our dataset

# Find the names of the cheapest and most expensive books

# Which authors have produced the most bestselling books

# Find the books that have a user rating less than 4

# How many of those books have less than 10,000 reviews?

# Find the correlation between User Rating and Reviews