typeof(1.1)
[1] "double"
typeof(1)
[1] "double"
Part 2: Data Types and Data Structures
Ethan Marzban
DS Collab
Please make sure you have downloaded both R
and RStudio
.
Please make sure you are comfortable with the material in Part 1 (Fundamentals).
Much in the way different variables (in a statistical sense) can have different types (numerical vs. categorical), so to can quantities in R
. For example, R
treats the quantity 2
differently from the quantity "hello"
. We use the term data type to, loosely speaking, refer to the actual type of a particular quantity (e.g. numerical, character, etc.) The main data types in R
are:
double
: refers to numerical (real-valued) quantitiesinteger
: refers to integerscharacter
: refers to character- or text-type data (and will always be enclosed in either single quotation marks or double quotation marks)In R
, numbers are by default encoded with type double
.
To check the type of a particular quantity, we can use the typeof()
function. For example:
Again, note that the type of 1
is actually double
! If we really wanted to reference the integer 1, we can use an L
:
We can combine quantities in R
as well, to produce what are known as data structures. Some of more common data structures in R
are:
Each data structure has a situation in which it is ideal; for now, let’s discuss vectors and data frames.
Vectors are the most fundamental data structure in R
. A vector is, effectively, a one-dimensional list of values. In R
, we use the following syntax to create a vector:
For example,
(By the Way: note how, in the output, R
has converted all elements to be of type character
! We’ll revisit this later.)
Another very popular way of storing data in R
is using what is known as a data frame. A data frame can be thought of as a collection of vectors, arranged in tabular format; indeed, when storing data in a data frame, we are necessarily implementing the data matrix representation of data (see Workshop 02 for a refresher on this).
Data frames can be created using the data.frame()
function in R
:
For example:
When displaying data frames, R
always puts an initial column corresponding to row indices.
Note that data frames are created column-wise. Additionally, the columns in a data frame must all be of the same length;
Error in data.frame(col1 = c(2, 4, 6), col2 = c(1, 3, 5, 7)): arguments imply differing number of rows: 3, 4
The columns in a data frame can be comprised of different data types.
Make a data frame in R
based on the following table:
name | species | age |
---|---|---|
Rover | Dog | 4 |
Samson | Parrot | 1 |
Olivia | Cat | 3 |
Hoppers | Rabbit | 2 |
Assign your data frame to a variable called animals
. That is, after completing this exercise, you should be able to run animals
and obtain
We may wish to extract or access only certain portions of a vector and/or data frame. This can be accomplished using slicing (aka indexing).
Given a vector x
, the i
th element of x
can be extracted using the syntax x[i]
. For example:
Given a data frame d
, the (i
, j
)th element of d
can be extracted using the syntax d[i, j]
. For example:
We can also access columns in a data frame by using the $
operator. That is: given a data frame df
with a column named col1
, the synax df$col1
returns the col1
column of df
. For example, given the d
data frame defined above, we can access col2
by running
When returning columns of a data frame using slicing, R
always returns a vector.
Returning to the animals
data frame created above: return Olivia’s age in two different ways:
In many situations, we will want to import data that has been written or stored in a file or website. The basic function for reading data into R
is read.table()
.
Look up the help file for the read.table()
function. Additionally, look up the help file for the read.csv()
function, and note how it differs from the read.table()
function.
Import the data located at https://pstat5a.github.io/Files/Datasets/movies_2000s.csv, and assign it to a variable called movies_2000
. (Though you can technically download the data, try to import the data without downloading it onto your local machine). We will explore this data during the in-person portion of the Workshop in General Meeting 3.