To find other tutorials for this class, go to the main website, https://ds112-lendway.netlify.app/.
Welcome to another tutorial for this class, COMP/STAT 112: Introduction to Data Science! It will be similar to the others, including demo videos and files embedded in this document and practice problems with hints or solutions at the end. There are some new libraries, so be sure to install those first. There are also some additional instructions (especially if you’re using the server) down below the demo section.
As most of our files do, we start this one with three R code chunks: 1. options, 2. libraries and settings, 3. data.
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
library(tidyverse) # for data cleaning and plotting
library(gardenR) # for Lisa's garden data
library(lubridate) # for date manipulation
library(openintro) # for the abbr2state() function
library(palmerpenguins)# for Palmer penguin data
library(maps) # for map data
library(ggmap) # for mapping points on maps
library(gplots) # for col2hex() function
library(RColorBrewer) # for color palettes
library(sf) # for working with spatial data
library(leaflet) # for highly customizable mapping
library(ggthemes) # for more themes (including theme_map())
library(plotly) # for the ggplotly() - basic interactivity
library(gganimate) # for adding animation layers to ggplots
library(gifski) # for creating the gif (don't need to load this library every time,but need it installed)
library(transformr) # for "tweening" (gganimate)
library(shiny) # for creating interactive apps
theme_set(theme_minimal())
# Lisa's garden data
data("garden_harvest")
After this tutorial, you should be able to do the following:
Add basic interactivity to a ggplot2
plot using ggplotly()
.
Add animation layers to plots using gganimate
functions.
Create a shiny app that requires inputs.
Publish a shiny app to shinyapps.io.
plotly
Probably the easiest way to add interactivity to a plot created with ggplot2
is by using the ggplotly()
function from the plotly
library. The plotly
package can do A LOT more than what we’ll cover in this course as it is a plotting framework if its own. But, it can do a lot with just that one function.
Let’s look at an example. In the code below, I compute the cumulative harvest in pounds by vegetable and create a bar graph. I save the graph and print it out. The code and graph should be familiar.
veggie_harvest_graph <- garden_harvest %>%
group_by(vegetable) %>%
summarize(total_wt_lbs = sum(weight)*0.00220462) %>%
ggplot() +
geom_col(aes(x = total_wt_lbs,
y = fct_reorder(vegetable,
total_wt_lbs,
.desc = FALSE))) +
labs(title = "Total Harvest by vegetable (lb)",
x = "",
y = "")
veggie_harvest_graph
Now, we plotly
-ify it!
ggplotly(veggie_harvest_graph)
The labeling is fairly ugly in the graph above. I can fix some of that by editing my original plot. In the code below, I add a text
aesthetic, which will be used in ggplotly()
to display the vegetable name, and use tooltip
to tell it the aesthetics to display when scrolling over the graph.
veggie_harvest_graph2 <- garden_harvest %>%
group_by(vegetable) %>%
summarize(total_wt_lbs = sum(weight)*0.00220462) %>%
ggplot() +
geom_col(aes(x = total_wt_lbs,
y = fct_reorder(vegetable,
total_wt_lbs,
.desc = FALSE),
text = vegetable)) +
labs(title = "Total Harvest by vegetable (lb)",
x = "",
y = "")
ggplotly(veggie_harvest_graph2,
tooltip = c("text", "x"))
This works for many different types of plots created with ggplot2
.
In this exercise, choose 2 graphs you have created for ANY assignment in this class and add interactivity using the ggplotly()
function.
gganimate
The gganimate
package works well with ggplot2
functions by providing additional grammar that assists in adding animation to the plots. These functions get added as layers in ggplot()
, just like you are used to adding geom_*()
layers and other layers that modify the graph.
From Thomas Pedersen’s documentation, here are the key functions/grammar of the package:
transition_*()
defines how the data should be spread out and how it relates to itself across time (time is not always actual time).view_*()
defines how the positional scales should change along the animation.shadow_*()
defines how data from other points in time should be presented in the given point in time.enter_*()/exit_*()
defines how new data should appear and how old data should disappear during the course of the animation.ease_aes()
defines how different aesthetics should be eased during transitions.You only need a transition_*()
or view_*()
function to add animation. This tutorial focuses on three transition_*()
functions: transition_states()
, transition_time()
, and transition_reveal()
.
gganimate
ggplot()
geom_*()
layers.gganimate
transition_*()
layergganimate
options, which may include making some changes in the ggplot()
code.transition_*()
functionsThe following image, taken from the gganimate cheatsheet, gives a nice overview of the three functions.
transition_states()
This transition is used to transition between distinct stages of the data. We will show an example of transitioning between levels of a categorical variable. We will use the garden_harvest
dataset and will follow the steps outlined above for creating an animated plot.
First, we create a dataset of daily tomato harvests in pounds for each variety of tomato. We add day of week and reorder variety from most to least harvested.
daily_tomato <- garden_harvest %>%
filter(vegetable == "tomatoes") %>%
group_by(variety, date) %>%
summarize(daily_harvest = sum(weight)*0.00220462) %>%
mutate(day_of_week = wday(date, label = TRUE)) %>%
ungroup() %>%
mutate(variety = fct_reorder(variety, daily_harvest, sum, .desc = TRUE))
daily_tomato
Next, we create a jittered scatterplot of daily harvest by day of week. We facet the plot by variety.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week)) +
geom_jitter() +
facet_wrap(vars(variety)) +
labs(title = "Daily tomato harvest",
x = "",
y = "")
Now, instead of looking at the data by faceting, we will use animation and transition by variety. This code takes a while to run. And the animation shows up over in the Viewer in the lower right-hand pane, rather than in the preview below the code chunk.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
x = "",
y = "") +
transition_states(variety)
Because it takes a while to create the animation, you don’t want to recreate it each time you knit your file. So, in the code chunk where you create the animation, add eval=FALSE
to the code chunk options (ie. inside the curly brackets next to the lowercase r).
Then, save the gif using the anim_save()
function, like in the code below. The name in quotes is the name of the file that will be created, which needs to end in .gif. This will automatically save your most recent gganimate
plot. So, be sure to run the code right after you create the animation. Alternatively, you can save your gganimate
, say you called it plot1
and do anim_save(plot_1, "tomatoes1.gif")
. This will be saved to your working directory. If you are working in a project (hopefully the one linked to your GitHub repo, right?), then this will go to the main folder for the project if that is where the .Rmd file is located.
anim_save("tomatoes1.gif")
Then, load the file back in using the following code. You can add echo=FALSE
to the code chunk options to omit displaying the code.
knitr::include_graphics("tomatoes1.gif")
Now, let’s return to the animation that was created. There are a couple things we should fix. One is that as it animates, it looks like the observations from one variety
morph into the observations from the next variety.
We can fix this in two ways. One, is to color by variety
:
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
color = variety)) +
geom_jitter() +
scale_color_viridis_d(option = "magma") +
labs(title = "Daily tomato harvest",
x = "",
y = "",
color = "") +
theme(legend.position = "none") +
transition_states(variety)
Another, is to map variety
to the group
aesthetic (This is the recommended way to do it, even if we also color by variety
.):
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
x = "",
y = "") +
transition_states(variety)
Another issue is that we don’t see the variety
names as it animates through. Thankfully, the various transition_*()
functions create some useful variables we can use to display the names of variety
. The variables created are shown below.
We can access the variables by putting them in square brackets inside a label. Below, I use the closest_state
variable that is created to add the variety
to the subtitle of the plot.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
subtitle = "Variety: {closest_state}",
x = "",
y = "") +
transition_states(variety)
There are many options we can change. Below, we make a couple more changes.
Save the animated plot as tomato_gganim
and output the animation using animate()
in order to control the duration (there are other options in that function, too).
Change the relative transition lengths (how long it takes to switch variety
) and state lengths (how long it stays on a variety). These are relative lengths, so the transition time is twice as long as the time spent in a state.
Shrink the points as variety
transitions using exit_shrink()
.
Color the points light blue as they enter and exit.
tomato_gganim <- daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
subtitle = "Variety: {closest_state}",
x = "",
y = "") +
transition_states(variety,
transition_length = 2,
state_length = 1) +
exit_shrink() +
enter_recolor(color = "lightblue") +
exit_recolor(color = "lightblue")
animate(tomato_gganim, duration = 20)
transition_time()
This transition is used to transition between distinct states in time. We will show an example of transitioning over harvest dates in the garden_harvest
dataset. We will follow the steps outlined earlier for creating an animated plot.
First, we create a dataset of daily harvest in pounds for a subset of four vegetables.
daily_harvest_subset <- garden_harvest %>%
filter(vegetable %in% c("tomatoes", "beans",
"peas", "zucchini")) %>%
group_by(vegetable, date) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462)
daily_harvest_subset
Then, we create a static plot, coloring the points differently and assigning different shapes to distinguish the various green colors.
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank())
Now we animate the plot, transiting over time by date
.
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_time(date)
Now, let’s try adding some other features:
Keep a little history of the data via shadow_wake()
Fade the old data points out via exit_fade()
Add a date subtitle using the frame_time
variable created from transition_time()
.
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
subtitle = "Date: {frame_time}",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_time(date) +
shadow_wake(wake_length = .3) +
exit_fade()
transition_reveal()
This transition allows you to let data gradually appear. We will show an example of building up the cumulative harvest data over harvest dates using the garden_harvest
dataset. We will follow the steps outlined earlier for creating an animated plot.
First we create a dataset of cumulative harvest by date for a subset of vegetables.
cum_harvest_subset <- garden_harvest %>%
filter(vegetable %in% c("tomatoes", "beans",
"peas", "zucchini")) %>%
group_by(vegetable, date) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462) %>%
mutate(cum_harvest_lb = cumsum(daily_harvest_lb))
cum_harvest_subset
Next, we create a static plot of cumulative harvest, coloring the lines by vegetable.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank())
And now, add animation!
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_reveal(date)
And now let’s do a couple things to improve the plot:
Remove the legend and add text that shows vegetable name on the plot (I love this!).
Add date to the subtitle.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
geom_text(aes(label = vegetable)) +
labs(title = "Cumulative harvest (lb)",
subtitle = "Date: {frame_along}",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "none") +
transition_reveal(date)
We could have used this same data with a different type of transition. It’s always good to think about the point you are trying to make with the animation.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
subtitle = "Vegetable: {closest_state}",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "none") +
transition_states(vegetable)
gganimate intro slides by Katherine Goode (she animates bats flying!)
gganimate by Thomas Pedersen - scroll down to the bottom
Pedersen introductory vignette - gives a brief intro to what each of the key functions do
gganimate wiki page - most of this is currently under development but there’s some good examples
gganimate
This package might require some extra setup.
Make sure you can load the following packages: gganimate
, gifski
, transformr
. First, try just installing gganimate
and see if you can load all the other packages after only installing that one. If so, you are done. If not, try installing the other packages. After you install them all, RESTART RStudio. Hopefully you have success at that point. If not, talk to me. If you are using Macalester’s server, you will likely have to do the next step.
If you use Macalester’s server, you will almost surely get an error when you try to install gifski
. The error will say something about not have RUST and will direct you to the Rust website. Click Getting started.
rustup update
. If that is successful, you are done. If it tells you something about not having Rust, then go to the next step.curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
(this is from the Click Getting started Rust page and could be slightly out of date - go to that page to assure the code is correct). This may run on its own or it may give you some options. Always type the yes options in the terminal.Use animation to tell an interesting story with the small_trains
dataset that contains data from the SNCF (National Society of French Railways). These are Tidy Tuesday data! Read more about it here.
small_trains <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv")
shiny
In this section, we will learn how to create a Shiny App. Shiny Apps are applications that allow the user to interact or make changes to graphs or tables. You will learn how to create them in R and how to deploy them on your own shinyapps.io webpage. See examples of some apps here.
The concept map below illustrates the key components of a shiny app and how they relate to one another. We will go through more detail during the tutorial.
I am doing this tutorial a bit differently than I’ve done other tutorials. I am going to walk you through the creation of a shiny app by following the intro_to_shiny slides on my GitHub page. You can download the slides below. I will include short screen captures to illustrate how to do each part on your own.
To begin, you are going to copy everything from my GitHub repo to your own GitHub repo. You do this by forking. From your GitHub account, search for mine: llendway/intro_to_shiny. Once there, click the fork button. Then, all my files will be in a repo of the same name on your GitHub page.
From there, clone the repo and create a new project in R Studio.
Once you’ve done that, you can access all the files from your computer. As you make changes, you can commit and push them out to your own GitHub account, if you’d like.
Creating a Shiny app is different from what we’ve done so far in this class. One of the biggest changes, is that we’ll be working from .R files, rather than .Rmd files. In .R files, everything is read as R code. So, it’s like one big R code chunk. If you want to make comments, you need to use the pound/hashtag symbol, #
.
Let’s start by opening the basic_app_template.R
file in the app_files
folder. Since you’ve cloned the repo, this will be one of the files in your project folder. Make sure you have the project open first! Open the file and click Run App. This is a really boring app - there is nothing there! But, it is a great starting point because it gives you an outline of what you need in order to make your app work.
Before getting into a lot of details, let’s add a little bit to the app. At the top of the file, load the tidyverse
and babynames
libraries and add some text between quotes inside the fluidPage()
function.Run the app. You can check that you did this right by looking in the basic_app_add_more.R
file.
Now, let’s move on to creating a more interesting app. The goal is to create a Shiny app for my kids to explore the babynames
dataset! Remember, that’s their favorite.
Requirements:
How do we do this?
Setup: