Week 1 Basics

1.1 Day 1 (Monday) Zoom orientation

1.1.1 Logistics (10 minutes)

Course material

Available at https://mtmorgan.github.io/QuaRantine

Cadence

Monday and Friday group zoom sessions – these will review and troubleshoot previous material, and outline goals for the next set of independent activities.
Daily independent activities – most of your learning will happen here!

Communicating

We’ll use Microsoft Teams (if most participants have access to the course)
Visit Microsoft Teams and sign in with your Roswell username (e.g., MA38727@RoswellPark.org) and the password you use to check email, etc. Join the ‘QuaRantine’ team.

1.1.2 Installing R and RStudio (25 minutes, Shawn)

What is R?

A programming language for statistical computing, data analysis and scientific graphics.
Open-source with a large (and growing) user community.
Currently in the top 10 most popular languages according to the tiobe index.

What is RStudio?

RStudio provides an integrated editor and shell environment to make R programming easier. Some of the more useful features include:
- Syntax highlighting and color coding
- Easy switching between shell and editor
- Dynamic help and docs

Installing R and RStudio

Two ways to “get” RStudio:
- Install on your laptop or desktop
  - Download the free desktop installer here
- Use the rstudio.cloud resource
  - Visit rstudio.cloud, sign-up, and sign-on

The preferred approach for this course is to try to install R and RStudio on your own computer

Windows Users:
- Download R for Windows and run the installer. Avoid, if possible, installing as administrator.
- Download RStudio for Windows and run the installer.
- Test the installation by launching RStudio. You should end up with a window like the screen shot below.
Mac Users:
- Download R for macOS (OS X 10.11, El Capitan, and later) or older macOS and run the installer.
- Download RStudio for macOS and run the installer.
- Test the installation by launching RStudio. You should end up with a window like the screen shot below.

An ALTERNATIVE, if installing on your own computer does not work:

Do the following only if you are NOT ABLE TO INSTALL R and RStudio.
Visit rstudio.cloud. Click the ‘Get Started’ button, and create an account (I used my gmail account…). You should end up at a screen like the following.
Click on the ‘New Project’ button, to end up with a screen like the one below. Note the ‘Untitled Project’ at the top of the screen; click on it to name your project, e.g., ‘QuaRantine’.

Breakout Room

At this point you should have RStudio running either via your desktop installation or through rstudio.cloud. If not, please let us know via the chat window and we’ll invite you to a breakout room to troubleshoot your installation.

1.1.3 Basics of R (25 minutes)

R as a simple calculator

1 + 2
## [1] 3

R Console Output

Enter this in the console:

2 + 3 * 5
## [1] 17

Q: what’s the [1] all about in the output?

A: It’s the index of the first entry in each line.

This is maybe a better example:

1:30
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

Displaying help in the R Console

? <command-name>

Some examples:
```
? cat
? print
```

Variables

Naming variables in R

A variable name can contain letters, numbers, and the dot . or underline _ characters. Variables should start with a letter.
Try entering these in the console:

y = 2

try.this = 33.3

oneMoreTime = "woohoo"
Now try these:

2y = 2

_z = 33.3

function = "oops, my bad"

R is case sensitive (R != r)

R = 2
r = 3
R == r
## [1] FALSE

Variable Assignment

You may use = or <- (and even ->) to assign values to a variable.

x <- 2 + 3 * 5
y =  2 + 3 * 6
2 + 3 * 7 -> z
cat(x, y, z)
## 17 20 23

R’s four basic ‘atomic’ data types

Numeric (includes integer, double, etc.)
- 3.14, 1, 2600
Character (string)
- "hey, I'm a string"
- 'single quotes are ok too'
Logical
- TRUE or FALSE (note all caps)
NA
- not assigned (no known value)

Use class() to query the class of data:

a <- 5
class(a)
## [1] "numeric"

Use as. to coerce a variable to a specific data type

a <- as.integer(5)
class(a)
## [1] "integer"

d <- as.logical(a)
d
## [1] TRUE
class(d)
## [1] "logical"

Using Logical Operators

Equivalence test (==):

1 == 2
## [1] FALSE

Not equal test (!=):

1 != 2
## [1] TRUE

less-than (<) and greater-than (>):

18 > 44
## [1] FALSE
3 < 204
## [1] TRUE

Logical Or (|):

(1 == 2) | (2 == 2)
## [1] TRUE

Logical And (&):

(1 == 2) & (2 == 2)
## [1] FALSE

Objects and Vectors in R

Objects

R stores everything, variables included, in ‘objects’.

x <- 2.71

# print the value of an object
print(x)
## [1] 2.71

# determine class or internal type of an object
class(x)
## [1] "numeric"

# TRUE if an object has not been assigned a value
is.na(x)
## [1] FALSE

Vectors

‘Vectors’ and ‘data frames’ are the bread and butter of R
Vectors consist of several elements of the same class
- e.g. a vector of heart rates, one per patient

Data frames (data.frame)

Data frames are structures that can contain columns of various types
- e.g. height, weight, age, heart rate, etc.
- Handy containers for experimental data
- Analogous to spreadsheet data
- More on Data Frames throughout the week!

Working with Vectors

Creating a Vector

Use the c() function

name <- c("John Doe", "Jane Smith", "MacGillicuddy Jones", "Echo Shamus")
age <- c(36, 54, 82, 15)
favorite_color <- c("red", "orange", "green", "black")

## print the vectors
name
## [1] "John Doe"            "Jane Smith"          "MacGillicuddy Jones"
## [4] "Echo Shamus"
age
## [1] 36 54 82 15
favorite_color
## [1] "red"    "orange" "green"  "black"

Accessing vector data

Use numerical indexing
R uses 1-based indexing
- 1st vector element has index of 1
- 2nd has an index of 2
- 3rd has an index of 3
- and so on

name[1]
## [1] "John Doe"
age[3]
## [1] 82

R supports “slicing” (i.e. extracting multiple items)

favorite_color[c(2, 3)]
## [1] "orange" "green"

Negative indices are omitted

age[-2]
## [1] 36 82 15

Some Useful Vector Operations

length(): number of elements
sum(): sum of all element values
unique(): distinct values
sort(): sort elements, omitting NAs
order(): indices of sorted elements, NAs are last
rev(): reverse the order
summary(): simple statistics

a <- c(5, 5, 6, 7, 8, 4)
sum(a)
## [1] 35
length(a)
## [1] 6
unique(a)
## [1] 5 6 7 8 4
sort(a)
## [1] 4 5 5 6 7 8
order(a)
## [1] 6 1 2 3 4 5
a[order(a)]
## [1] 4 5 5 6 7 8
rev(a)
## [1] 4 8 7 6 5 5
summary(a)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.000   5.000   5.500   5.833   6.750   8.000

Handling Missing Data

First consider the reason(s) for the missing data
- e.g. concentrations that are below detectable levels?
Sometimes NAs in data require special statistical methods
Other times we can safely discard / ignore NA entries
To remove NAs prior to a calculation:

y = c(1,NA,3,2,NA)
sum(y, na.rm=TRUE)
## [1] 6

Wrapping up day 1

The goal for today was to rapidly cover some of the essential aspects of R programming. For the remainder of the week you’ll work at your own pace to get more of a hands-on deep dive into this material. If you run into trouble please don’t hesiate to ask for help via Teams (QuaRantine Team), slack (QuaRantine Course), or email (Drs. Matott and Morgan) — whatever works best for you!

1.2 Day 2: Vectors and variables

Our overall goal for the next few days is to use R to create a daily log of quarantine activities.

Our goal for today is to become familiar with R vectors. Along the way we’ll probably make data entry and other errors that will start to get us comfortable with R.

If you run into problems, reach out to the slack channel for support!

The astronaut Scott Kelly said that to survive a year on the International Space Station he found it essential to

Follow a schedule – plan your day, and stick to the plan
Pace yourselves – you’ve got a long time to accomplish tasks, so don’t try to get everything done in the first week.
Go outside – if Scott can head out to space, we should be able to make it to the back yard or around the block!
Get a hobby – something not work related, and away from that evil little screen. Maybe it’s as simple as rediscovering the joy of reading.
Keep a journal
Take time to connect – on a human level, with people you work with and people you don’t!
Listen to experts – Scott talked about relying on the mission controllers; for us maybe that’s watching webinars or taking courses in new topics!
Wash your hands!

I wanted to emphasize ‘follow a schedule’ and ‘keep a journal’. How can R help? Well, I want to create a short record of how I spend today, day 2 of my quarantine.

My first goal is to create vectors describing things I plan to do today. Let’s start with some of these. To get up to speed, type the following into the R console, at the > prompt

1 + 2

Press the carriage return and remind yourself that R is a calculator, and knows how to work with numbers!

Now type an activity in your day, for instance I often start with

"check e-mail"

Now try assigning that to a variable, and displaying the variable, e.g.,

activity <- "check e-mail"
activity
## [1] "check e-mail"

OK, likely you have several activities scheduled. Create a vector of a few of these by concatenating individual values

c("check e-mail", "breakfast", "conference call", "webinar", "walk")
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"

Assign these to a variable

activity <- c("check e-mail", "breakfast", "conference call", "webinar", "walk")
activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"

Create another vector, but this time the vector should contain the minutes spent on each activity

minutes <- c(20, 30, 60, 60, 60)
minutes
## [1] 20 30 60 60 60

So I spent 20 minutes checking email, 30 minutes having breakfast and things like that, I was in a conference call for 60 minutes, and then attended a webinar where I learned new stuff for another 60 minutes. Finally I went for a walk to clear my head and remember why I’m doing things.

Apply some basic functions to the variables, e.g., use length() to demonstrate that you for each activity you have recorded the minutes.

length(activity)
## [1] 5
length(minutes)
## [1] 5

Use tail() to select the last two activities (or head() to select the first two…)

tail(activity, 2)
## [1] "webinar" "walk"
tail(minutes, 2)
## [1] 60 60

R has other types of vectors. Create a logical vector that indicates whether each activity was ‘work’ activity’ or something you did for your own survival. We’ll say that checking email is a work-related activity!

is_work <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
is_work
## [1]  TRUE FALSE  TRUE  TRUE FALSE

1.3 Day 3: `factor()`, `Date()`, and `NA`

Yesterday we learned about character, numeric, and logical vectors in R (you may need to revisit previous notes and re-create these variables)

activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"
minutes
## [1] 20 30 60 60 60
is_work
## [1]  TRUE FALSE  TRUE  TRUE FALSE

Today we will learn about slightly more complicated vectors.

We created the logical vector is_work to classify each activity as either work-related or not. What if we had several different categories? For instance, we might want to classify the activities into categories inspired by astronaut Kelly’s guidance. Categories might include: connect with others; go outside and exercise; consult experts; get a hobby; and (my own category, I guess) perform essential functions like eating and sleeping. So the values of activity could be classified as

classification <-
    c("connect", "essential", "connect", "consult", "exercise")

I want to emphasize a difference between the activity and classification variables. I want activity to be a character vector that could contain any description of an activity. But I want classification to be terms only from a limited set of possibilities. In R, I want classification to be a special type of vector called a factor, with the values of the vector restricted to a set of possible levels that I define. I create a factor by enumerating the possible levels that the factor can take on

levels <- c("connect", "exercise", "consult", "hobby", "essential")

And then tell R that the vector classification should be a factor with values taken from a particular set of levels

classification <- factor(
    c("connect", "essential", "connect", "consult", "exercise"),
    levels = levels
)
classification
## [1] connect   essential connect   consult   exercise 
## Levels: connect exercise consult hobby essential

Notice that activity (a character vector) displays differently from classification (a factor)

activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"
classification
## [1] connect   essential connect   consult   exercise 
## Levels: connect exercise consult hobby essential

Also, some of the levels (e.g., hobby) have not been part of our schedule yet, but the factor still ‘knows’ about the level.

Notice also what happens when I try to use a value (disconnect) that is not a level of a factor

factor(c("connect", "disconnect"), levels = levels)
## [1] connect <NA>   
## Levels: connect exercise consult hobby essential

The value with the unknown level is displayed as NA, for ‘not known’. NA values can be present in any vector, e.g.,

c(1, 2, NA, 4)
## [1]  1  2 NA  4
c("walk", "talk", NA)
## [1] "walk" "talk" NA
c(NA, TRUE, FALSE, TRUE, TRUE)
## [1]    NA  TRUE FALSE  TRUE  TRUE

This serves as an indication that the value is simply not available. Use NA rather than adopting some special code (e.g., ‘-99’) to indicate when a value is not available.

One other type of vector we will work a lot with are dates. All of my activities are for today, so I’ll start with a character vector with the same length as my activity vector, each indicating the date in a consistent month-day-year format

dates <- c("04-14-2020", "04-14-2020", "04-14-2020", "04-14-2020", "04-14-2020")
dates
## [1] "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020"

Incidentally, I could do this more efficiently using the replicate function

rep("04-14-2020", 5)
## [1] "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020"

And even better use length() to know for sure how many times I should replicate the character vector

rep("04-14-2020", length(activity))
## [1] "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020" "04-14-2020"

dates is a character vector, but it has specially meaning as a calendar date, R has a Date class that knows how to work with dates, for instance to calculate the number of days between two dates. We will coerce date to an object of class Date using a function as.Date. Here’s our first attempt…

as.Date(dates)

… but this results in an error:

Error in charToDate(x) :
  character string is not in a standard unambiguous format

R doesn’t know the format (month-day-year) of the dates we provide. The solution is to add a second argument to as.Date(). The second argument is a character vector that describes the date format. The format we use is "%m-%d-%Y", which says that we provide the %month first, then a hyphen, then the %day, another hyphen, and finally the four-digit %Year.

as.Date(dates, format = "%m-%d-%Y")
## [1] "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14"

Notice that the format has been standardized to year-month-day. Also notice that although the original value of date and the return from as.Data() look the same, they are actually of different class.

class(date)
## [1] "function"
class(as.Date(dates, format = "%m-%d-%Y"))
## [1] "Date"

R will use the information about class to enable specialized calculation on dates, e.g., to sort them or to determine the number of days between different dates. So here’s our date vector as a Date object.

dates <- rep("04-14-2020", length(activity))
date <- as.Date(dates, format = "%m-%d-%Y")
date
## [1] "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14"

OK, time for a walk! See you tomorrow!

1.4 Day 4: Working with variables

Remember that R can act as a simple calculator, and that one can create new variables by assignment

x <- 1
x + 1
## [1] 2
y <- x + 1
y
## [1] 2

Let’s apply these ideaas to our minutes vector from earlier in the week.

minutes <- c(20, 30, 60, 60, 60)

We can perform basic arithmetic on vectors. Suppose we wanted to increase the time of each activity by 5 minutes

minutes + 5
## [1] 25 35 65 65 65

or to increase the time of the first two activities by 5 minutes, and the last three activities by 10 minutes

minutes + c(5, 5, 10, 10, 10)
## [1] 25 35 70 70 70

R has a very large number of functions that can be used on vectors. For instance, the average time spent on activities is

mean(minutes)
## [1] 46

while the total amount of time is

sum(minutes)
## [1] 230

Explore other typical mathematical transformations, e.g., log(), log10(), sqrt() (square root), … Check out the help pages for each, e.g., ?log.

Explore the consequences of NA in a vector for functions like mean() and sum().

x <- c(1, 2, NA, 3)
mean(x)
## [1] NA

R is saying that, since there is an unknown (NA) value in the vector, it cannot possibly know what the mean is! Tell R to remove the missing values before performing the calculation by adding the na.rm = TRUE argument

mean(x, na.rm = TRUE)
## [1] 2

Check out the help page ?mean to find a description of the na.rm and other arguments.

It’s possible to perform logical operations on vectors, e.g., to ask which activities lasted 60 minutes or more

minutes >= 60
## [1] FALSE FALSE  TRUE  TRUE  TRUE

Here’s our activity vector

activity <- c("check e-mail", "breakfast", "conference call", "webinar", "walk")

The elements of this vector are numbered from 1 to 5. We can create a new vector that is a subset of this vector using [ and an integer index, e.g., the second activity is

activity[2]
## [1] "breakfast"

The index can actually be a vector, so we could choose the second and fourth activity as

index <- c(2, 4)
activity[index]
## [1] "breakfast" "webinar"

In fact, we can use logical vectors for subsetting. Consider the activities that take sixty minutes or longer:

index <- minutes >= 60
activity[index]
## [1] "conference call" "webinar"         "walk"

We had previously characterized the activities as ‘work’ or otherwise.

is_work <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

Use is_work to subset activity and identify the work-related activities

activity[is_work]
## [1] "check e-mail"    "conference call" "webinar"

How many minutes were work-related?

work_minutes <- minutes[is_work]
sum(work_minutes)
## [1] 140

What about not work related? ! negates logical vectors, so

is_work
## [1]  TRUE FALSE  TRUE  TRUE FALSE
!is_work
## [1] FALSE  TRUE FALSE FALSE  TRUE
non_work_minutes <- minutes[!is_work]
sum(non_work_minutes)
## [1] 90

Note that it doesn’t make sense to take the mean() of a character vector like activity, and R signals a warning and returns NA

mean(activity)
## Warning in mean.default(activity): argument is not numeric or logical: returning
## NA
## [1] NA

Nonetheless, there are many functions that do work on character vectors, e.g., the number of letters in each element nchar(), or transformation to upper-case

nchar(activity)
## [1] 12  9 15  7  4
toupper(activity)
## [1] "CHECK E-MAIL"    "BREAKFAST"       "CONFERENCE CALL" "WEBINAR"        
## [5] "WALK"

1.5 Day 5 (Friday) Zoom check-in

1.5.1 Logistics

Please join Microsoft Teams! Need help? Contact Adam.Kisailus at RoswellPark.org.

1.5.2 Review and trouble shoot (25 minutes; Martin)

Data representations

‘Atomic’ vectors

activity <- c("check e-mail", "breakfast", "conference call", "webinar", "walk")
minutes <- c(20, 30, 60, 60, 60)
is_work <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

factor() and date()

levels <- c("connect", "exercise", "consult", "hobby", "essential")
classification <- factor(
    c("connect", "essential", "connect", "consult", "exercise"),
    levels = levels
)

dates <- rep("04-14-2020", length(activity))
date <- as.Date(dates, format = "%m-%d-%Y")

Missing values

x <- c(1, 3, NA, 5)
sum(x)
## [1] NA
sum(x, na.rm = TRUE)
## [1] 9

factor(c("connect", "disconnect"), levels = levels)
## [1] connect <NA>   
## Levels: connect exercise consult hobby essential

Functions and logical operators

x <- c(1, 3, NA, 5)
sum(x)
## [1] NA
sum(x, na.rm = TRUE)
## [1] 9

minutes >= 60
## [1] FALSE FALSE  TRUE  TRUE  TRUE

Subsetting vectors

1-basaed numeric indexes

activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"

idx <- c(1, 3, 1)
activity[idx]
## [1] "check e-mail"    "conference call" "check e-mail"

logical index

is_work
## [1]  TRUE FALSE  TRUE  TRUE FALSE
activity[is_work]
## [1] "check e-mail"    "conference call" "webinar"

sum(minutes[is_work])
## [1] 140

Maybe more interesting…

short <- minutes < 60
short
## [1]  TRUE  TRUE FALSE FALSE FALSE
minutes[short]
## [1] 20 30
activity[short]
## [1] "check e-mail" "breakfast"

1.5.3 Weekend activities (25 minutes; Shawn)

Writing R scripts

R scripts are convenient text files that we can use to save one or more lines of R syntax. Over the weekend you will get some experience working with R scripts. The example below will help you be a bit more prepared.

In RStudio, click File --> New File --> R Script to create a new script file and open it in the editor.

If you’ve followed the daily coding activities throughout the week, you should have some R code that keeps track of your daily activities.

If so, enter that code into your R script now.
Otherwise, feel free to use the code below. Look for a copy to clipboard icon in the top-right of the code block. To copy the code block to your R script:
- Click on the copy to clipboard icon
- Place your cursor in your R script
- Click Edit --> Paste:

## ====================================================================
## day 1 information 
## ====================================================================
day1_activity = c("breakfast", 
                  "check e-mail", 
                  "projects",
                  "conference call", 
                  "teams meeting", 
                  "lunch", 
                  "conference call", 
                  "webinar")
day1_is_work = c(FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE)
day1_minutes = c(30, 75, 120, 30, 60, 30, 60, 120)

n = length(day1_activity)
day1_total_hours = sum(day1_minutes) / 60
day1_work_hours = sum(day1_minutes[day1_is_work == TRUE]) / 60

cat("Total time recorded for day 1 : ", day1_total_hours,  
    "hours, over", n, "activities\n")
cat("Total time working  for day 1 : ", day1_work_hours, "hours \n\n")

Recall the discussion of factors and levels in Day 3; the code below leverages this but adds another level named independent work.

If you’ve already got code to assign factors and levels to your daily activity, enter that code into your R script now.
Otherwise, feel free to use the code below via the copy to clipboard procedure outlined above:

## ====================================================================
## Kelly, Morgan, and Matott's classification strategy
## ====================================================================
kmm_levels = c("connect", 
              "exercise", 
              "consult", 
              "hobby", 
              "essential",
              "independent work")
## manually map day 1 activity to appropriate kmm_levels
day1_classes = factor(
    c("essential", "connect", "independent work",
      "connect", "connect",  "essential", 
      "connect", "consult"),
    levels = kmm_levels
)

On day 3 you also got some experience working with dates. The code below stamps our day 1 activity data with an appropriately formatted date.

If you’ve already got code to assign dates your daily activity, enter that code into your R script now.
Otherwise, feel free to use the code below via the copy to clipboard procedure outlined above:

## ====================================================================
## Assign dates
## ====================================================================
day1_dates = rep("04-13-2020",length(day1_activity))
day1_dates = as.Date(day1_dates,format = "%m-%d-%Y")

[OPTIONAL ADVANCED MATERIAL]

Earlier today Dr. Morgan touched on named vectors. We can leverage named vectors to create a more general mapping between activities and levels. The code for this is given below. Try it and compare the result to your manual mapping!

## kmm_map is a named vector that maps activities to categories
kmm_map = c("breakfast"       = "essential", 
            "check e-mail"    = "connect", 
            "projects"        = "independent work",
            "requests"        = "independent work",
            "conference call" = "connect",
            "teams meeting"   = "connect", 
            "lunch"           = "essential", 
            "webinar"         = "consult",
            "walk"            = "exercise")
day1_classes = factor(kmm_map[day1_activity], levels = kmm_levels)

Saving R scripts

If you’ve been following along you should now have an R script that contains a bunch of code for keeping track of your daily activity log. Let’s save this file:

In RStudio, place your cursor anywhere in the script file
click File --> Save (or press CTRL+S)
- Name your file something like daily_activity.R.

Running R scripts

Now that we’ve created an R script you may be wondering “How do I run the code in the script?” There’s actually a few ways to do this:

Option #1 (Run)

Highlight the first block of the code (e.g the part where you recorded day 1 activity and maybe calculated amount of time worked).
Click the --> Run icon in the top-right portion of the script editor window.
- This will run the highlighted block of code. The output will appear in the RStudio console window along with an echo of the code itself.

Option #2 (Source)

Click on the --> Source icon just to the right of the --> Run icon.
This will run the entire script.
Equivalent to entering into the console
```
source("daily_activity.R")
```
Only the output generated by print() and cat() will appear in the RStudio console (i.e. the code in the script is not echoed to the console).

Option #3 (Source with Echo)

Click on the downward pointing arrowhead next to the source buton to open a dropdown menu
In the dropdown menu, sectect Source with Echo
This will run the entire script and the code in the script will be echoed to the RStudio console along with any output generated by print() and cat().
The echoed source and the normal output are not color-coded like they are when using the --> Run button.

Equivalent to running

source("daily_activity.R", echo = TRUE, max = Inf)

Saving data

It can be useful to save objects created in an R script as a data file. These data files can be loaded or re-loaded into a new or existing R session.

For example, let’s suppose you had an R script that mined a trove of Twitter feeds for sentiment data related to government responses to COVID-19. Suppose you ran the script for serveral weeks and collected lot’s of valuable data into a bunch of vectors. Even though the R code is saved as a script file, the data that the script is collecting would be lost once script stops running. Furthermore, due to the temporal nature of Twitter feeds, you wouldn’t be able to collect the same data by simply re-running the script. Luckily, R provides several routines for saving and loading objects. Placing the appropriate code in your R script will ensure that your data is preserved even after the script stops running.

Saving individual R objects

R supports storing a single R object as an .rds file. For example, the code below saves the day1_activity vector to an .rds file. The saveRDS() function is the workhorse in this case and the setwd(), getwd(), and file.path() commands allow us to conveniently specify a name and location for the data file:

## ====================================================================
## creating .rds data files (for saving individual objects)
## ====================================================================
setwd("C:/Matott/MyQuarantine")
my_rds_file = file.path(getwd(), "day1_activity.rds")
my_rds_file # print value -- sanity check
saveRDS(day1_activity, my_rds_file)

Loading individual R objects

The complement to the saveRDS() function is the readRDS() function. It loads the R object stored in the specified file. In the example below a data file is loaded and stored as an object named day1_activity_loaded. Compare this object to the existing day1_activity object - they should be the same!

## ====================================================================
## Reading .rds data files (for loading individual objects)
## ====================================================================
setwd("C:/Matott/MyQuarantine")
my_rds_file = file.path(getwd() ,"day1_activity.rds")
my_rds_file
day1_activity_loaded = readRDS(my_rds_file) # now load from disk

Saving multiple R objects

The save() function will save one or more objects into a .Rdata file (these are also known as session files). The example below saves various day1 and related factor-level objects to an .Rdata file.

## ====================================================================
## creating .RData files (for saving multiple objects)
## ====================================================================
setwd("C:/Matott/MyQuarantine")
my_rdata_file = file.path(getwd(), "day1.rdata")
save(kmm_levels, kmm_map,
     day1_activity, day1_classes, day1_dates, day1_is_work, 
     day1_minutes, day1_total_hours, day1_work_hours, 
     file = my_rdata_file)

If you have many objects that you want to save, listing them all can be tedious. Fortunately, the ls() command provides a list of all objects in the current R session. The results of ls() can be passed along to the save() command and this will result in all objects being saved. An example of the required syntax is given below.

setwd("C:/Matott/MyQuarantine")
my_rdata_file = file.path(getwd(), "day1.rdata")
save(list = ls(), file = my_rdata_file)

Loading multiple R objects

The complement to the save() function is the load() function. This will load all objects stored in an .Rdata file into the current R session. Example syntax is given below:

## ====================================================================
## Reading .RData files (for loading multiple objects)
## ====================================================================
setwd("C:/Matott/MyQuarantine")
my_rdata_file = file.path(getwd(), "daily_activity.rdata")
load(my_rdata_file) # reload

It is also possible to load an .Rdata file using the RStudio interface.

Click Session --> Load Workspace ...
A file browser dialog will open
Navigate to the .rdata file and select

Wrapping up day 5

Today we reviewed the concepts that you worked with throughout the week during your independent activity. We also troubleshooted any problems or questions that may have come up during this time. Finally, we previewed the creation and use of R scripts and learned about saving and loading objects. Over the weekend you will gain some more experience with these topics.

1.6 Day 6: R scripts

Some of you may have already started saving your R commands as script files. As the material gets more complicated (and more interesting) everyone will want to start doing this. Here is an example to get you started:

Recall that we can create a script file in RStudio, click “File –> New File –> R Script” to create a new script file and open it in the editor

By convention, R scripts have a .R exstension (e.g. my_script.R)
- In RStudio, click into your untitled script and click “File –> Save”
- Name your file something fun like my_first_script.R and save it
Use the # character for comments. Enter the following into your R Script file:

## This is my first R script

Enter each command on a separate line. It’s also possible to enter multiple (short!) commands on a single line, separated by a semi-colon ;

x = "Hello world!"
y = 'Today is'; d = format(Sys.Date(), "%b %d, %Y")
cat(x, y, d)

Use the “Run” button in RStudio to run the highlighted portion of an R script file. Try this on your simple R Script.

x = "Hello world!"; y = 'Today is'; d = format(Sys.Date(),"%b %d, %Y")
cat(x, y, d, "\n")
## Hello world! Today is May 22, 2020

Alternatively, use “Run –> Run All” to run an entire script file.

For today’s exercise, create a script file that summarizes your quarantine activities over several days. Use comments, white space (blank lines and spaces), and variable names to summarize each day. Here’s what I’ve got…

## 'classification' factor levels
levels <- c("connect", "exercise", "consult", "hobby", "essential")

## Quarantine log, day 1

activity_day_1 <-
    c("check e-mail", "breakfast", "conference call", "webinar", "walk")
minutes_day_1 <- c(20, 30, 60, 60, 60)
is_work_day_1 <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
classification_day_1 <- factor(
    c("connect", "essential", "connect", "consult", "exercise"),
    levels = levels
)
date_day_1 <- as.Date(rep("04-14-2020", length(activity_day_1)), "%m-%d-%Y")

## Quarantine log, day 2

activity_day_2 <-
    c("check e-mail", "breakfast", "conference call", "webinar", "read a book")
minutes_day_2 <- c(20, 30, 60, 60, 60)
is_work_day_2 <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
classification_day_2 <- factor(
    c("connect", "essential", "connect", "consult", "hobby"),
    levels = levels
)
date_day_2 <- as.Date(rep("04-15-2020", length(activity_day_2)), "%m-%d-%Y")

## Quarantine log, day 3

activity_day_3 <-
    c("check e-mail", "breakfast", "webinar", "read a book")
minutes_day_3 <- c(20, 30, 60, 60)
is_work_day_3 <- c(TRUE, FALSE, TRUE, FALSE)
classification_day_3 <- factor(
    c("connect", "essential", "connect", "consult", "hobby"),
    levels = levels
)
date_day_3 <- as.Date(rep("04-16-2020", length(activity_day_3)), "%m-%d-%Y")

Try concatenating these values, e.g.,

activity <- c(activity_day_1, activity_day_2, activity_day_3)
activity
##  [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
##  [5] "walk"            "check e-mail"    "breakfast"       "conference call"
##  [9] "webinar"         "read a book"     "check e-mail"    "breakfast"      
## [13] "webinar"         "read a book"

Save your script, quit R and RStudio, and restart R. Re-open and run the script to re-do your original work.

Think about how this makes your work reproducible from one day to the next, and how making your scientific work reproducible would be advantageous.

1.7 Day 7: Saving data

We’ve defined these variables

activity <- c("check e-mail", "breakfast", "conference call", "webinar", "walk")
minutes <- c(20, 30, 60, 60, 60)
is_work <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

levels <- c("connect", "exercise", "consult", "hobby", "essential")
classification <- factor(
    c("connect", "essential", "connect", "consult", "exercise"),
    levels = levels
)

dates <- rep("04-14-2020", length(activity))
date <- as.Date(dates, format = "%m-%d-%Y")

Individual variables can be saved to a file.

Define the path to the file. The file extension is, by convention, ‘.rds’. We’ll use a temporary location
```
temporary_file_path <- tempfile(fileext = ".rds")
```
…but we could have chosen the destination interactively
```
interactive_file_path <- file.choose(new = TRUE)
```
…or provided path relative to the ‘current working directory’, or an absolute file path (use ‘/’ to specify paths on all operating systems, including Windows)
```
getwd()
relative_file_path <- "my_activity.rds"
absolute_file_path_on_macOS <- "/Users/ma38727/my_activity.rda"
```
use saveRDS() to save a single object to a file
```
saveRDS(activity, temporary_file_path)
```

use readRDS() to read the object back in

activity_from_disk <- readRDS(temporary_file_path)
activity_from_disk
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"

Use save() and load() to save and load several objects.

Use .RDaata as the file extension.Usually we would NOT save to a temporary location, because the temporary location would be deleted when we ended our R session.
```
temporary_file_path <- tempfile(fileext = ".RData")
save(activity, minutes, file = temporary_file_path)
```

Remove the objects from the R session, and verify that they are absent

rm(activity, minutes)
try(activity) # fails -- object not present
## Error in try(activity) : object 'activity' not found

Load the saved objects

load(temporary_file_path)
activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"

As an exercise…

Chose a location to save your data, e.g., in the current working direcotry

getwd()    # Where the heck are we?
## [1] "/Users/ma38727/a/github/QuaRantine"
my_file_path <- "my_quaRantine.RData"

Save the data

save(activity, minutes, is_work, classification, date, file = my_file_path)

Now the moment of truth. Quit R without saving your workspace
```
quit(save = FALSE)
```

Start a new session of R, and verify that your objects are not present

ls() # list objects available in the '.GlobalEnv' -- there should be none
## character(0)
try(activity) # nope, not there...
## Error in try(activity) : object 'activity' not found

Create a path to the saved data file
```
my_file_path <- "my_quaRantine.RData"
```

Load the data and verify that it is correct

load(my_file_path)
activity
## [1] "check e-mail"    "breakfast"       "conference call" "webinar"        
## [5] "walk"
minutes
## [1] 20 30 60 60 60
is_work
## [1]  TRUE FALSE  TRUE  TRUE FALSE
date
## [1] "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14"
classification
## [1] connect   essential connect   consult   exercise 
## Levels: connect exercise consult hobby essential

See you in zoom on Monday!

40 Days and 40 Nights

Week 1 Basics

1.1 Day 1 (Monday) Zoom orientation

1.1.1 Logistics (10 minutes)

1.1.2 Installing R and RStudio (25 minutes, Shawn)

Breakout Room

1.1.3 Basics of R (25 minutes)

R as a simple calculator

R Console Output

Displaying help in the R Console

Variables

Using Logical Operators

Objects and Vectors in R

Working with Vectors

Wrapping up day 1

1.2 Day 2: Vectors and variables

1.3 Day 3: `factor()`, `Date()`, and `NA`

1.4 Day 4: Working with variables

1.5 Day 5 (Friday) Zoom check-in

1.5.1 Logistics

1.5.2 Review and trouble shoot (25 minutes; Martin)

Data representations

Other fun topics

1.5.3 Weekend activities (25 minutes; Shawn)

Writing R scripts

[OPTIONAL ADVANCED MATERIAL]

Saving R scripts

Running R scripts

Saving data

Saving individual R objects

Loading individual R objects

Saving multiple R objects

Loading multiple R objects

Wrapping up day 5

1.6 Day 6: R scripts

1.7 Day 7: Saving data

Week 1 Basics

1.1 Day 1 (Monday) Zoom orientation

1.1.1 Logistics (10 minutes)

1.1.2 Installing R and RStudio (25 minutes, Shawn)

Breakout Room

1.1.3 Basics of R (25 minutes)

R as a simple calculator

R Console Output

Displaying help in the R Console

Variables

Using Logical Operators

Objects and Vectors in R

Working with Vectors

Wrapping up day 1

1.2 Day 2: Vectors and variables

1.3 Day 3: factor(), Date(), and NA

1.4 Day 4: Working with variables

1.5 Day 5 (Friday) Zoom check-in

1.5.1 Logistics

1.5.2 Review and trouble shoot (25 minutes; Martin)

Data representations

Other fun topics

1.5.3 Weekend activities (25 minutes; Shawn)

Writing R scripts

[OPTIONAL ADVANCED MATERIAL]

Saving R scripts

Running R scripts

Saving data

Saving individual R objects

Loading individual R objects

Saving multiple R objects

Loading multiple R objects

Wrapping up day 5

1.6 Day 6: R scripts

1.7 Day 7: Saving data

1.3 Day 3: `factor()`, `Date()`, and `NA`