Chapter 4 Data analysis with the tidyverse
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
For learning how to do data analysis from importing data and tidying it to analyzing it and reporting results, we will use book R for Data Science. You can find most of the exercise solutions there.
4.1 Program
my {R Markdown} presentation (also see https://r4ds.had.co.nz/r-markdown.html)
my {ggplot2} presentation + exercises from data visualization with {ggplot2}
tidy data will rationalize the concept of “tidy” data that is used in the tidyverse and that is easier to work with
relational data will give you tools to join information from several datasets
more if time allows it (see below)
4.2 Other chapters from this book
The other chapters of R for Data Science book are very interesting and you should read them. Unfortunately, we won’t have time to cover them in class. A brief introduction of what you could learn:
data import will give you tools to import data (e.g. as a replacement of
read.table
)strings will help you work with strings and regular expressions
factors will help you work with factors
dates and times will help you work with dates and times
many models will introduce the concept of list-columns that enable you to store complex objects in a structured way inside a data frame
databases: packages {DBI} and {dbplyr} + RStudio’s webpage
4.3 Other resources
package {tidylog} provides verbose feedback about {dplyr} and {tidyr} operations
4.4 Other “tidy” packages
analysis of text data: package {tidytext} with the associated book,
analysis of financial data: package {tidyquant},
analysis of time series data: package {tidytime},
a collection of packages for modeling and machine learning using tidyverse principles: package {tidymodels},
a tidy API for graph manipulation: package {tidygraph},
many other packages..