Additional Resources
Curated Learning Materials for R and Data Analysis
Overview
This page provides additional learning resources organized by course topic. These materials complement the DataTax R Course and allow you to deepen your understanding or explore advanced topics.
- Beginners: Focus on the Core Resources section
- During the course: Refer to topic-specific resources as you complete each module
- After completion: Explore Advanced Topics to continue your learning journey
Core Resources
These comprehensive books and tutorials form the foundation of modern R practice. We recommend them for all learners.
R for Data Science (2nd Edition)
Authors: Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
Link: https://r4ds.hadley.nz/
The definitive guide to doing data science with R. Our course follows the pedagogical structure of R4DS (2e), adapted for tax administration contexts. This is the single most important reference for learning R.
Best for: Complete beginners through intermediate users
Covers: All topics in our course plus modeling, communication, and workflow
Key chapters aligned with our course: - Ch 1-2: Introduction and Workflow basics → Module 1 - Ch 7-8: Data import → Module 2
- Ch 3-5, 12: Data transformation with dplyr → Module 3 - Ch 6: Data tidying (reshaping) → Module 4 - Ch 9-11: Visualization with ggplot2 → Module 5
Data Science for Economists (and Everyone Else)
Author: Grant McDermott
Link: https://grantmcdermott.com/ds4e/
Excellent course materials from Grant McDermott covering data science workflows with R. Particularly strong on modern R practices, version control, and reproducible research.
Best for: Learners who want practical, workflow-oriented instruction
Covers: R basics, data wrangling, visualization, regression, spatial data, big data
Highlights: - Clear explanations of the tidyverse ecosystem - Integration with Git/GitHub for reproducibility - Practical examples from economics and policy research - Excellent introduction to parallel computing and big data tools
Introduction to Econometrics with R
Author: Florian Oswald
Link: https://scpoecon.github.io/ScPoEconometrics/
Interactive econometrics textbook with R. Excellent for tax administrators interested in causal inference, policy evaluation, and regression analysis.
Best for: Users ready to move beyond descriptive analysis to econometric modeling
Covers: Linear regression, causal inference, panel data, instrumental variables
Why it’s valuable: - Policy-relevant examples and intuition - Interactive tutorials and exercises - Bridges the gap between data wrangling and econometric analysis - Focus on understanding causal relationships in administrative data
swirl: Learn R, in R
Link: https://swirlstats.com/
Interactive R tutorials that run directly in your R console. No setup needed—just install the package and start learning.
Best for: Complete beginners who want hands-on, guided practice
Covers: R programming basics, getting and cleaning data, regression models, statistical inference
How to use:
install.packages("swirl")
library(swirl)
swirl()Why it’s great: - Learn by doing, directly in R - Immediate feedback on your code - Completely free and self-paced - Multiple courses available for different skill levels
YaRrr! The Pirate’s Guide to R
Author: Nathaniel Phillips
Link: https://bookdown.org/ndphillips/YaRrr/
Fun, accessible introduction to R with a pirate theme. Don’t let the playful presentation fool you—this is a comprehensive guide.
Best for: Beginners who want an engaging, less formal introduction
Covers: R basics, data structures, visualization, statistics, writing functions
Highlights: - 250-page free e-book - Accompanying video tutorials - Introduces “pirate plots” for better data visualization - Great examples and clear explanations
Big Book of R
Author: Oscar Baruffa
Link: https://www.bigbookofr.com/
A curated collection of over 400 free R books organized by topic. Your ultimate R library.
Best for: Finding specialized resources on any R-related topic
Covers: Everything from basics to advanced topics like machine learning, spatial analysis, Shiny apps, and domain-specific applications
Why bookmark this: - Searchable by topic - Regularly updated with new books - Most resources are free - Covers specialized topics not found elsewhere
Interactive Learning Platforms
DataCamp
Link: https://www.datacamp.com/
Popular interactive platform with video lessons and hands-on coding exercises in the browser.
Introduction to R (Free course)
Perfect starting point with interactive exercises and real-world data
Pros: - No software installation needed - Immediate feedback on exercises - Structured learning paths - Progress tracking
Note: Free courses available; subscription required for full access
Harvard Data Science: R Basics
Instructor: Rafael Irizarry
Link: edX - Harvard Data Science Series
Free course from Harvard analyzing a real crime dataset while learning R fundamentals.
Best for: Visual and practical learners who want university-quality instruction
Covers: Data types, vectors, sorting, basic plotting, data wrangling
What makes it special: - Learn by analyzing real data - Excellent instructor with clear explanations
- Part of a comprehensive data science series - Free to audit (certificate costs extra)
Resources by Course Module
Module 1: Introduction to R
R for Data Science - Workflow Basics
Chapter 2 & Chapter 4
Foundation for writing clear, readable R code
Grant McDermott - Introduction to R
Lecture slides and notes
Quick-start guide to R with modern best practices
swirl - R Programming: The basics
Interactive lessons in R console covering fundamentals
YaRrr! - Chapters 3-6
Introduction to R, vectors, and basic data structures
Hands-On Programming with R
https://rstudio-education.github.io/hopr/
Great complement focusing on R programming fundamentals
Module 2: Data Import & Export
R for Data Science - Data Import
Chapter 7 & Chapter 8
Comprehensive coverage of reading files and databases
Grant McDermott - Data I/O
Working with different data formats, APIs, and databases
readr documentation
https://readr.tidyverse.org/
Official documentation for reading rectangular data
data.table import
https://rdatatable.gitlab.io/data.table/
Fast import for very large datasets (millions of rows)
swirl - Getting and Cleaning Data
Interactive lessons on data import and tidying
Module 3: Data Wrangling with dplyr
R for Data Science - Data Transformation
Chapter 3 & Chapter 4
Core reference for dplyr verbs: filter, select, mutate, summarize, group_by
R for Data Science - Numbers & Strings
Chapter 12 & Chapter 13
Working with numeric data and text manipulation
Grant McDermott - Data Wrangling
Practical examples and workflows
Real-world data cleaning scenarios
dplyr documentation
https://dplyr.tidyverse.org/
Official reference with examples for every function
Data Transformation Cheat Sheet
PDF download from RStudio
Quick reference for dplyr functions
YaRrr! - Chapters 7-9
Data wrangling with base R and tidyverse approaches
Module 4: Reshaping & Joining Data
R for Data Science - Data Tidying
Chapter 5
Pivoting data between wide and long formats
R for Data Science - Joins
Chapter 19
Combining datasets with joins and set operations
tidyr documentation
https://tidyr.tidyverse.org/
Official documentation for reshaping functions
Grant McDermott - Data Wrangling (continued)
Advanced data reshaping and joining techniques
Module 5: Data Visualization with ggplot2
R for Data Science - Data Visualization
Chapters 9-11
Complete introduction to ggplot2 with layers, aesthetics, and facets
Data Visualization: A Practical Introduction
Author: Kieran Healy
Link: http://socviz.co/
Outstanding book on making effective visualizations with ggplot2. Goes beyond mechanics to discuss principles of good data visualization.
Best for: Anyone creating charts for reports, presentations, or publications
Covers: ggplot2 fundamentals, graph design principles, maps, publication-quality output
Why read this: - Teaches you to think about what makes visualizations effective - Real datasets and policy-relevant examples - How to refine plots for professional communication - Excellent for tax administrators preparing reports for leadership
ggplot2 Book
https://ggplot2-book.org/
Comprehensive reference by Hadley Wickham
R Graphics Cookbook
https://r-graphics.org/
Recipe-based approach to creating specific chart types
Grant McDermott - Data Visualization
Practical examples and modern ggplot2 workflows
Data Visualization Cheat Sheet
PDF download from RStudio
Quick reference for ggplot2 geoms and aesthetics
YaRrr! - Chapter 10
Introduction to base R plotting and pirate plots
Advanced Topics
Ready to go beyond the basics? These resources cover advanced R programming and specialized applications.
Advanced R Programming
Advanced R (2nd Edition)
Author: Hadley Wickham
Link: https://adv-r.hadley.nz/
Deep dive into R’s programming concepts. Essential reading for anyone wanting to master R.
Best for: Intermediate to advanced users
Covers: Functions, environments, object-oriented programming, functional programming, performance optimization
When to read: - After completing this course - When you want to write your own functions and packages - When you need to optimize code performance - To understand how R actually works “under the hood”
Key topics for tax administrators: - Chapter 6: Functions (writing reusable code) - Chapter 9: Functionals (map, reduce, apply functions) - Chapter 23: Measuring performance - Chapter 24: Improving performance
Statistical Learning and Modeling
An Introduction to Statistical Learning
Authors: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
Link: https://www.statlearning.com/
The go-to introduction to machine learning and statistical modeling. Now with R code and labs.
Best for: Users ready for predictive modeling and machine learning
Covers: Linear regression, classification, resampling, regularization, tree-based methods, clustering
Applications for tax administration: - Risk scoring for audit selection - Predicting tax compliance behavior
- Classification of taxpayer types - Anomaly detection in VAT chains - Forecasting tax revenue
How to use: - Start after mastering Modules 1-4 of this course - Work through labs in R - Focus on chapters 2-4 for regression/classification fundamentals - Chapters 5-6 for cross-validation and model selection
Spatial Data and GIS
Geocomputation with R
Authors: Robin Lovelace, Jakub Nowosad, Jannes Muenchow
Link: https://r.geocompx.org/
Comprehensive guide to geographic data analysis with R.
Best for: Working with spatial tax data (geographic regions, districts)
Covers: Spatial data structures, coordinate systems, mapping, spatial operations
Tax administration applications: - Mapping tax collection by region - Analyzing spatial patterns in compliance - Visualizing tax office coverage - Geographic risk analysis
Prerequisites: Complete this course first, especially Module 5 (visualization)
Text Mining and NLP
Text Mining with R: A Tidy Approach
Authors: Julia Silge and David Robinson
Link: https://www.tidytextmining.com/
Learn text mining using tidy data principles with the tidytext package.
Best for: Analyzing unstructured text data (comments, descriptions, documents)
Covers: Sentiment analysis, term frequency, text relationships, topic modeling
Tax administration applications: - Analyzing taxpayer correspondence - Categorizing business descriptions - Mining audit notes and reports - Sentiment analysis of feedback
Prerequisites: Complete Modules 1-4 first
Reproducible Reports and Documents
R Markdown: The Definitive Guide
Authors: Yihui Xie, J.J. Allaire, Garrett Grolemund
Link: https://bookdown.org/yihui/rmarkdown/
Comprehensive guide to creating reproducible reports, presentations, and websites with R Markdown.
Best for: Automating reports and creating professional documents
Covers: PDF/HTML/Word documents, presentations, dashboards, websites, books
Tax administration applications: - Monthly compliance reports - Audit summaries - Data dashboards - Internal documentation
R Markdown Cookbook
https://bookdown.org/yihui/rmarkdown-cookbook/
Practical recipes and tips for common R Markdown tasks
Big Data and Performance
Grant McDermott - Big Data in Economics
Handling large datasets efficiently with data.table, databases, and parallel computing
data.table
https://rdatatable.gitlab.io/data.table/
Fast manipulation of large data (10M+ rows)
Best for: Working with complete taxpayer registries or transaction-level data
Efficient R Programming
https://csgillespie.github.io/efficientR/
Learn to write faster, more efficient R code
Community and Help
Getting Help
RStudio Community
https://community.rstudio.com/
Friendly Q&A forum for R users
Stack Overflow - R tag
https://stackoverflow.com/questions/tagged/r
Technical Q&A for specific programming questions
R for Data Science Online Learning Community
https://www.rfordatasci.com/
Study groups and community support
Staying Current
R Weekly
https://rweekly.org/
Weekly newsletter of R news and resources
R-bloggers
https://www.r-bloggers.com/
Aggregator of R blog posts and tutorials
#RStats on Twitter/X
Follow hashtag #RStats for R tips, news, and community updates
Quick Reference Materials
Cheat Sheets (PDF Downloads)
All available at https://rstudio.github.io/cheatsheets/
Essential for this course: - RStudio IDE - Data Import (readr, readxl) - Data Transformation (dplyr) - Data Tidying (tidyr) - Data Visualization (ggplot2)
For advanced topics: - String Manipulation (stringr) - Dates and Times (lubridate) - Factors (forcats) - R Markdown / Quarto