class: center, middle, inverse, title-slide # Visualisation with ggplot2 ##
https://privefl.github.io/R-presentation/ggplot2.html
### Florian Privé ### October 19, 2017
(minor updates in November 2020) --- class: center, top, inverse background-image: url(http://hexb.in/vector/ggplot2.svg) background-position: 50% 80% background-size: 40% # http://ggplot2.tidyverse.org/ --- class: center, middle, inverse > "The simple graph has brought more information to the data analyst’s mind than any other device." --- John Tukey --- class: center, middle, inverse # Introduction --- ## What does *ggplot2* stand for? -- ### A __Grammar of Graphics__! -- ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> ``` -- </br> #### You can uniquely describe any plot as a combination of these 7 parameters. --- ## How long have you known **ggplot2**? <br> <br> <br> -- <blockquote class="twitter-tweet" data-lang="en" align="center"><p lang="en" dir="ltr">Happy 10th birthday ggplot2! 🎉🎂📊📈10 years ago today the first version was accepted to CRAN: <a href="https://t.co/tiXIkqnCcA">https://t.co/tiXIkqnCcA</a></p>— Hadley Wickham (@hadleywickham) <a href="https://twitter.com/hadleywickham/status/873556949207535616">10 juin 2017</a></blockquote> --- ## Why use ggplot2? </br> - Automatic legends, colors, etc. - Easy superposition, facetting, etc. - Nice rendering (though I don't like the default grey theme). - Store any ggplot2 object for modification or future recall. Super useful for packages. - Lots of users (less bugs, more help on Stack Overflow). - Lots of extensions. - Nice saving option. --- class: center, middle, inverse # Tidy data? --- ## Untidy data ```r fertilityData ``` ``` ## # A tibble: 12 x 7 ## Country `1800` `1801` `1802` `1803` `1804` `1805` ## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Afghanistan 7 7 7 7 7 7 ## 2 Albania 4.6 4.6 4.6 4.6 4.6 4.6 ## 3 Algeria 6.99 6.99 6.99 6.99 6.99 6.99 ## 4 Angola 6.93 6.93 6.93 6.93 6.93 6.93 ## 5 Antigua and Barbuda 5 5 4.99 4.99 4.99 4.98 ## 6 Argentina 6.8 6.8 6.8 6.8 6.8 6.8 ## 7 Armenia 7.8 7.8 7.81 7.81 7.81 7.82 ## 8 Aruba 5.64 5.64 5.64 5.64 5.64 5.64 ## 9 Australia 6.5 6.48 6.46 6.44 6.42 6.4 ## 10 Austria 5.1 5.1 5.1 5.1 5.1 5.1 ## 11 Azerbaijan 8.1 8.1 8.1 8.1 8.1 8.1 ## 12 Bahamas 5.9 5.9 5.9 5.9 5.9 5.9 ``` The variables are the **country**, **year** and **fertility rate**. --- ## Tidy data <img src="http://r4ds.had.co.nz/images/tidy-1.png" style="display: block; margin: auto;" /> It is easier to work and reason with - operations - manipulation - visualization <br> Learn more at http://tidyr.tidyverse.org/articles/tidy-data.html. --- ### Tidy the previous data ```r library(dplyr) ``` ```r (fertilityTidy <- fertilityData %>% tidyr::pivot_longer(cols = -Country, names_to = "Year", values_to = "Fertility") %>% mutate(Year = as.integer(Year))) ``` ``` ## # A tibble: 72 x 3 ## Country Year Fertility ## <fct> <int> <dbl> ## 1 Afghanistan 1800 7 ## 2 Afghanistan 1801 7 ## 3 Afghanistan 1802 7 ## 4 Afghanistan 1803 7 ## 5 Afghanistan 1804 7 ## 6 Afghanistan 1805 7 ## 7 Albania 1800 4.6 ## 8 Albania 1801 4.6 ## 9 Albania 1802 4.6 ## 10 Albania 1803 4.6 ## # ... with 62 more rows ``` --- ### This is easier to plot ```r ggplot(data = fertilityTidy) + geom_point(mapping = aes(x = Year, y = Fertility)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-6-1.svg" width="80%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Basics & Customization --- ### Use black and white theme everywhere ```r library(ggplot2) theme_set(theme_bw(18)) ``` *** ```r ggplot(data = fertilityTidy) + geom_point(mapping = aes(x = Year, y = Fertility)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-8-1.svg" width="65%" style="display: block; margin: auto;" /> --- ## Drop extra typing ```r ggplot(fertilityTidy) + geom_point(aes(Year, Fertility)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-9-1.svg" width="80%" style="display: block; margin: auto;" /> --- ## Add colors ```r ggplot(fertilityTidy) + geom_point(aes(Year, Fertility, color = Country)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-10-1.svg" width="80%" style="display: block; margin: auto;" /> --- ## Add lines: add one geom ```r ggplot(fertilityTidy) + geom_point(aes(Year, Fertility, color = Country)) + geom_line(aes(Year, Fertility, color = Country)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-11-1.svg" width="80%" style="display: block; margin: auto;" /> --- ### Remove redundancy: move 'aes' to the top #### So that the mapping is inherited by both geoms ```r ggplot(fertilityTidy, aes(Year, Fertility, color = Country)) + geom_point() + geom_line() ``` <img src="ggplot2_files/figure-html/unnamed-chunk-12-1.svg" width="72%" style="display: block; margin: auto;" /> --- ### Larger points and lines ```r ggplot(fertilityTidy, aes(Year, Fertility, color = Country)) + geom_point(size = 4) + geom_line(size = 3) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-13-1.svg" width="80%" style="display: block; margin: auto;" /> --- ### Futher customization: themes ```r ggplot(fertilityTidy, aes(Year, Fertility, color = Country)) + geom_point(size = 4) + geom_line(size = 3) + theme(aspect.ratio = 0.8, legend.key.width = unit(3, "line")) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-14-1.svg" width="90%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Layers --- ## Iris: base dataset of R ### about plants ```r as_tibble(iris) ## print is better ``` ``` ## # A tibble: 150 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## # ... with 140 more rows ``` --- ## Layers: example with geom_smooth() ```r ggplot(iris) + geom_point(aes(Petal.Length, Petal.Width, color = Species, shape = Species), size = 3) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-16-1.svg" width="80%" style="display: block; margin: auto;" /> --- ### Geom_smooth on all: move x and y on top ```r ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_point(aes(color = Species, shape = Species), size = 3) + geom_smooth(color = "black") ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="ggplot2_files/figure-html/unnamed-chunk-17-1.svg" width="75%" style="display: block; margin: auto;" /> --- ### Points on top: change the order of layers ```r ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_smooth(color = "black") + geom_point(aes(color = Species, shape = Species), size = 3) ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="ggplot2_files/figure-html/unnamed-chunk-18-1.svg" width="75%" style="display: block; margin: auto;" /> --- ### Geom_smooth by group ```r ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_smooth(aes(group = Species), color = "black") + geom_point(aes(color = Species, shape = Species), size = 3) ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="ggplot2_files/figure-html/unnamed-chunk-19-1.svg" width="75%" style="display: block; margin: auto;" /> --- ### Or use color for both geoms ```r ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_smooth() + geom_point(aes(shape = Species), size = 3) ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="ggplot2_files/figure-html/unnamed-chunk-20-1.svg" width="75%" style="display: block; margin: auto;" /> --- ## An important application ### Simpson's Paradox <br> <img src="https://paulvanderlaken.files.wordpress.com/2017/09/simpsonsparadox.png?w=1080" style="display: block; margin: auto;" /> .footnote[Source: https://goo.gl/GycYod] --- ## Learn more - [Chapter *Data Visualisation*](http://r4ds.had.co.nz/data-visualisation.html) of <img src="http://r4ds.had.co.nz/cover.png" width="35%" style="display: block; margin: auto;" /> .footnote[Freely available online.] --- ## Find answers <blockquote class="twitter-tweet" data-lang="en" align="center"><p lang="en" dir="ltr">ok, hands up... when looking up code errors/questions with the intent of finding answers on Stack Overflow, do you:</p>— Sharla Gelfand (@sharlagelfand) <a href="https://twitter.com/sharlagelfand/status/915654253456318464?ref_src=twsrc%5Etfw">4 octobre 2017</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <br><center>e.g. google "larger points legend ggplot2" --- ## Using the Stack Overflow answer ```r ggplot(iris) + geom_point(aes(Petal.Length, Petal.Width, color = Species, shape = Species), size = 3) + guides(colour = guide_legend(override.aes = list(size = 10))) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-23-1.svg" width="75%" style="display: block; margin: auto;" /> --- ## Go check the [**R Graph Gallery**](https://www.r-graph-gallery.com/ggplot2-package.html) <br> <br> <blockquote class="twitter-tweet" data-cards="hidden" data-lang="en" align="center"><p lang="en" dir="ltr">🍾🍾 Today the <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> graph gallery reached 1.000.000 visits! 🍾🍾 Thanks to all users & contributors!<a href="https://t.co/94JzuHDJot">https://t.co/94JzuHDJot</a> <a href="https://t.co/xu4XIsqmut">pic.twitter.com/xu4XIsqmut</a></p>— The R Graph Gallery (@R_Graph_Gallery) <a href="https://twitter.com/R_Graph_Gallery/status/915092693977460741?ref_src=twsrc%5Etfw">3 octobre 2017</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: center, middle, inverse # More --- ### Coordinates ```r ggplot(iris) + geom_point(aes(Petal.Length, Petal.Width, color = Species, shape = Species), size = 3) + scale_x_log10(breaks = 1:7) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-24-1.svg" width="77%" style="display: block; margin: auto;" /> --- ### Facets ```r ggplot(iris) + geom_point(aes(Petal.Length, Petal.Width, color = Species, shape = Species), size = 3) + facet_grid(~ Species) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-25-1.svg" width="85%" style="display: block; margin: auto;" /> --- ## Iterate over variables with **aes_string** ```r (var <- names(iris)[1:4]) ``` ``` ## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ``` ```r p_list <- list() for (i in seq_along(var)) { p_list[[i]] <- ggplot(iris, coeff = 0.6) + geom_density(aes_string(var[i], fill = "Species"), alpha = 0.6) } str(p_list, max.level = 1) ``` ``` ## List of 4 ## $ :List of 9 ## ..- attr(*, "class")= chr [1:2] "gg" "ggplot" ## $ :List of 9 ## ..- attr(*, "class")= chr [1:2] "gg" "ggplot" ## $ :List of 9 ## ..- attr(*, "class")= chr [1:2] "gg" "ggplot" ## $ :List of 9 ## ..- attr(*, "class")= chr [1:2] "gg" "ggplot" ``` --- ## Combine plots with [cowplot](https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html) ```r cowplot::plot_grid(plotlist = p_list, ncol = 2, align = "hv", labels = LETTERS[1:4], label_size = 15) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-27-1.svg" width="85%" style="display: block; margin: auto;" /> --- ### Common legend ```r lapply(p_list, function(p) p + theme(legend.position = "none")) %>% cowplot::plot_grid(plotlist = ., ncol = 2, align = "hv", labels = LETTERS[1:4], label_size = 15) %>% cowplot::plot_grid(cowplot::get_legend(p_list[[1]]), rel_widths = c(1, 0.3)) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-28-1.svg" width="85%" style="display: block; margin: auto;" /> --- ### Your turn Create a similar plot by pivoting the data + using facets -- <br> ```r (iris_tidy <- tidyr::pivot_longer(iris, -Species)) ``` ``` ## # A tibble: 600 x 3 ## Species name value ## <fct> <chr> <dbl> ## 1 setosa Sepal.Length 5.1 ## 2 setosa Sepal.Width 3.5 ## 3 setosa Petal.Length 1.4 ## 4 setosa Petal.Width 0.2 ## 5 setosa Sepal.Length 4.9 ## 6 setosa Sepal.Width 3 ## 7 setosa Petal.Length 1.4 ## 8 setosa Petal.Width 0.2 ## 9 setosa Sepal.Length 4.7 ## 10 setosa Sepal.Width 3.2 ## # ... with 590 more rows ``` --- ```r ggplot(iris_tidy) + geom_density(aes(value, fill = Species), alpha = 0.6) + facet_wrap(~ name, scales = "free") ``` <img src="ggplot2_files/figure-html/unnamed-chunk-30-1.svg" width="95%" style="display: block; margin: auto;" /> --- ## Interactive plots ```r ggplot(iris, aes(Petal.Length, Petal.Width, color = Species, shape = Species)) + geom_point(size = 3) ``` <img src="ggplot2_files/figure-html/unnamed-chunk-31-1.svg" width="80%" style="display: block; margin: auto;" /> --- ## Transform ggplot to plotly ```r plotly::ggplotly(width = 700, height = 450) ``` ### Add more infos ```r plotly::ggplotly( last_plot() + aes(text = bigstatsr::asPlotlyText(iris)), tooltip = "text", width = 700, height = 420) ``` <br> You might want to look at package [{widgetframe}](https://cran.r-project.org/web/packages/widgetframe/vignettes/Using_widgetframe.html). --- ## Miscellaneous - [Pie charts](https://guangchuangyu.github.io/2016/12/scatterpie-for-plotting-pies-on-ggplot/) but [others plots are often better](http://annkemery.com/pie-chart-guidelines/) - [Spatial Visualization](https://cran.r-project.org/web/packages/ggmap/index.html) - [Heatmaps](http://blog.aicry.com/r-heat-maps-with-ggplot2/) - [Cookbook for R - Graphs](http://www.cookbook-r.com/Graphs/) - [**Cheatsheet**](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf) - [Top 50 ggplot2 Visualizations](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html) - [**Viridis color palette**](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html) - [An RStudio addin for ggplot2 theme tweaking](https://github.com/calligross/ggthemeassist) - [**Publication Ready Plots**](http://www.sthda.com/english/rpkgs/ggpubr/) - [Extensions](https://exts.ggplot2.tidyverse.org/) --- class: center, middle, inverse # Two last points from my experience --- ## Why I finally switched to using ggplot2? ```r library(bigstatsr) X <- big_attachExtdata() svd <- big_SVD(X, big_scale(), k = 10) plot(svd, type = "scores") ``` <img src="ggplot2_files/figure-html/unnamed-chunk-34-1.svg" width="70%" style="display: block; margin: auto;" /> --- ### An object that the user can modify ```r pop <- rep(c("POP1", "POP2", "POP3"), c(143, 167, 207)) last_plot() + # add colors aes(color = pop) + labs(color = "Population") + ## change the place of the legend theme(legend.position = c(0.85, 0.2)) + ## change the title and the label of the x-axis labs(title = "Yet another title", x = "with a new 'x' label") ``` <img src="ggplot2_files/figure-html/unnamed-chunk-35-1.svg" width="60%" style="display: block; margin: auto;" /> --- ## How I choose the size of a plot? - I plot something, e.g. ```r ggplot(iris, aes(Petal.Length, Petal.Width, color = Species, shape = Species)) + geom_point(size = 3) ``` - I use the *Zoom* button of RStudio - I resize the "Plot Zoom" windows till I'm satisfied - Right-click on this window -> "Open image" (or "Inspect element") - Then, in `ggsave()`, I use the dimensions that are displayed with `scale = 1/100` (calibrate the value for your computer), e.g. ```r ggsave("myggplot.pdf", scale = 1/100, width = 888, height = 725) ``` .footnote[In R Markdown, use chunk options `out.width` and `fig.asp`.] --- class: center, middle, inverse # Your turn --- ## Use this data.. ```r (df <- dplyr::filter(gapminder::gapminder, year == 1992)) ``` ``` ## # A tibble: 142 x 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1992 41.7 16317921 649. ## 2 Albania Europe 1992 71.6 3326498 2497. ## 3 Algeria Africa 1992 67.7 26298373 5023. ## 4 Angola Africa 1992 40.6 8735988 2628. ## 5 Argentina Americas 1992 71.9 33958947 9308. ## 6 Australia Oceania 1992 77.6 17481977 23425. ## 7 Austria Europe 1992 76.0 7914969 27042. ## 8 Bahrain Asia 1992 72.6 529491 19036. ## 9 Bangladesh Asia 1992 56.0 113704579 838. ## 10 Belgium Europe 1992 76.5 10045622 25576. ## # ... with 132 more rows ``` --- ## ..to reproduce this plot <img src="ggplot2_files/figure-html/unnamed-chunk-39-1.svg" width="100%" style="display: block; margin: auto;" /> --- ```r ggplot(df) + geom_point(aes(gdpPercap, lifeExp, size = pop / 1e6, color = continent)) + scale_x_log10(breaks = c(300, 1e3, 3e3, 10e3, 30e3)) + labs(title = "Gapminder for 1992", x = "Gross Domestic Product (log scale)", y = "Life Expectancy at birth (years)", color = "Continent", size = "Population\n(millions)") ``` <img src="ggplot2_files/figure-html/unnamed-chunk-40-1.svg" width="78%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Thanks! <br> Presentation available at https://privefl.github.io/R-presentation/ggplot2.html <br> Twitter and GitHub: [@privefl](https://twitter.com/privefl) .footnote[Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).]