Additional Resources

Curated Learning Materials for R and Data Analysis

Overview

This page provides additional learning resources organized by course topic. These materials complement the DataTax R Course and allow you to deepen your understanding or explore advanced topics.

NoteUsing These Resources
  • Beginners: Focus on the Core Resources section
  • During the course: Refer to topic-specific resources as you complete each module
  • After completion: Explore Advanced Topics to continue your learning journey

Core Resources

These comprehensive books and tutorials form the foundation of modern R practice. We recommend them for all learners.

R for Data Science (2nd Edition)

Authors: Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
Link: https://r4ds.hadley.nz/

The definitive guide to doing data science with R. Our course follows the pedagogical structure of R4DS (2e), adapted for tax administration contexts. This is the single most important reference for learning R.

Best for: Complete beginners through intermediate users
Covers: All topics in our course plus modeling, communication, and workflow

Key chapters aligned with our course: - Ch 1-2: Introduction and Workflow basics → Module 1 - Ch 7-8: Data import → Module 2
- Ch 3-5, 12: Data transformation with dplyr → Module 3 - Ch 6: Data tidying (reshaping) → Module 4 - Ch 9-11: Visualization with ggplot2 → Module 5


Data Science for Economists (and Everyone Else)

Author: Grant McDermott
Link: https://grantmcdermott.com/ds4e/

Excellent course materials from Grant McDermott covering data science workflows with R. Particularly strong on modern R practices, version control, and reproducible research.

Best for: Learners who want practical, workflow-oriented instruction
Covers: R basics, data wrangling, visualization, regression, spatial data, big data

Highlights: - Clear explanations of the tidyverse ecosystem - Integration with Git/GitHub for reproducibility - Practical examples from economics and policy research - Excellent introduction to parallel computing and big data tools


Introduction to Econometrics with R

Author: Florian Oswald
Link: https://scpoecon.github.io/ScPoEconometrics/

Interactive econometrics textbook with R. Excellent for tax administrators interested in causal inference, policy evaluation, and regression analysis.

Best for: Users ready to move beyond descriptive analysis to econometric modeling
Covers: Linear regression, causal inference, panel data, instrumental variables

Why it’s valuable: - Policy-relevant examples and intuition - Interactive tutorials and exercises - Bridges the gap between data wrangling and econometric analysis - Focus on understanding causal relationships in administrative data


swirl: Learn R, in R

Link: https://swirlstats.com/

Interactive R tutorials that run directly in your R console. No setup needed—just install the package and start learning.

Best for: Complete beginners who want hands-on, guided practice
Covers: R programming basics, getting and cleaning data, regression models, statistical inference

How to use:

install.packages("swirl")
library(swirl)
swirl()

Why it’s great: - Learn by doing, directly in R - Immediate feedback on your code - Completely free and self-paced - Multiple courses available for different skill levels


YaRrr! The Pirate’s Guide to R

Author: Nathaniel Phillips
Link: https://bookdown.org/ndphillips/YaRrr/

Fun, accessible introduction to R with a pirate theme. Don’t let the playful presentation fool you—this is a comprehensive guide.

Best for: Beginners who want an engaging, less formal introduction
Covers: R basics, data structures, visualization, statistics, writing functions

Highlights: - 250-page free e-book - Accompanying video tutorials - Introduces “pirate plots” for better data visualization - Great examples and clear explanations


Big Book of R

Author: Oscar Baruffa
Link: https://www.bigbookofr.com/

A curated collection of over 400 free R books organized by topic. Your ultimate R library.

Best for: Finding specialized resources on any R-related topic
Covers: Everything from basics to advanced topics like machine learning, spatial analysis, Shiny apps, and domain-specific applications

Why bookmark this: - Searchable by topic - Regularly updated with new books - Most resources are free - Covers specialized topics not found elsewhere


Interactive Learning Platforms

DataCamp

Link: https://www.datacamp.com/

Popular interactive platform with video lessons and hands-on coding exercises in the browser.

Introduction to R (Free course)
Perfect starting point with interactive exercises and real-world data

Pros: - No software installation needed - Immediate feedback on exercises - Structured learning paths - Progress tracking

Note: Free courses available; subscription required for full access


Harvard Data Science: R Basics

Instructor: Rafael Irizarry
Link: edX - Harvard Data Science Series

Free course from Harvard analyzing a real crime dataset while learning R fundamentals.

Best for: Visual and practical learners who want university-quality instruction
Covers: Data types, vectors, sorting, basic plotting, data wrangling

What makes it special: - Learn by analyzing real data - Excellent instructor with clear explanations
- Part of a comprehensive data science series - Free to audit (certificate costs extra)


Resources by Course Module

Module 1: Introduction to R

R for Data Science - Workflow Basics
Chapter 2 & Chapter 4
Foundation for writing clear, readable R code

Grant McDermott - Introduction to R
Lecture slides and notes
Quick-start guide to R with modern best practices

swirl - R Programming: The basics
Interactive lessons in R console covering fundamentals

YaRrr! - Chapters 3-6
Introduction to R, vectors, and basic data structures

Hands-On Programming with R
https://rstudio-education.github.io/hopr/
Great complement focusing on R programming fundamentals


Module 2: Data Import & Export

R for Data Science - Data Import
Chapter 7 & Chapter 8
Comprehensive coverage of reading files and databases

Grant McDermott - Data I/O
Working with different data formats, APIs, and databases

readr documentation
https://readr.tidyverse.org/
Official documentation for reading rectangular data

data.table import
https://rdatatable.gitlab.io/data.table/
Fast import for very large datasets (millions of rows)

swirl - Getting and Cleaning Data
Interactive lessons on data import and tidying


Module 3: Data Wrangling with dplyr

R for Data Science - Data Transformation
Chapter 3 & Chapter 4
Core reference for dplyr verbs: filter, select, mutate, summarize, group_by

R for Data Science - Numbers & Strings
Chapter 12 & Chapter 13
Working with numeric data and text manipulation

Grant McDermott - Data Wrangling
Practical examples and workflows
Real-world data cleaning scenarios

dplyr documentation
https://dplyr.tidyverse.org/
Official reference with examples for every function

Data Transformation Cheat Sheet
PDF download from RStudio
Quick reference for dplyr functions

YaRrr! - Chapters 7-9
Data wrangling with base R and tidyverse approaches


Module 4: Reshaping & Joining Data

R for Data Science - Data Tidying
Chapter 5
Pivoting data between wide and long formats

R for Data Science - Joins
Chapter 19
Combining datasets with joins and set operations

tidyr documentation
https://tidyr.tidyverse.org/
Official documentation for reshaping functions

Grant McDermott - Data Wrangling (continued)
Advanced data reshaping and joining techniques


Module 5: Data Visualization with ggplot2

R for Data Science - Data Visualization
Chapters 9-11
Complete introduction to ggplot2 with layers, aesthetics, and facets

Data Visualization: A Practical Introduction
Author: Kieran Healy
Link: http://socviz.co/

Outstanding book on making effective visualizations with ggplot2. Goes beyond mechanics to discuss principles of good data visualization.

Best for: Anyone creating charts for reports, presentations, or publications
Covers: ggplot2 fundamentals, graph design principles, maps, publication-quality output

Why read this: - Teaches you to think about what makes visualizations effective - Real datasets and policy-relevant examples - How to refine plots for professional communication - Excellent for tax administrators preparing reports for leadership

ggplot2 Book
https://ggplot2-book.org/
Comprehensive reference by Hadley Wickham

R Graphics Cookbook
https://r-graphics.org/
Recipe-based approach to creating specific chart types

Grant McDermott - Data Visualization
Practical examples and modern ggplot2 workflows

Data Visualization Cheat Sheet
PDF download from RStudio
Quick reference for ggplot2 geoms and aesthetics

YaRrr! - Chapter 10
Introduction to base R plotting and pirate plots


Advanced Topics

Ready to go beyond the basics? These resources cover advanced R programming and specialized applications.

Advanced R Programming

Advanced R (2nd Edition)
Author: Hadley Wickham
Link: https://adv-r.hadley.nz/

Deep dive into R’s programming concepts. Essential reading for anyone wanting to master R.

Best for: Intermediate to advanced users
Covers: Functions, environments, object-oriented programming, functional programming, performance optimization

When to read: - After completing this course - When you want to write your own functions and packages - When you need to optimize code performance - To understand how R actually works “under the hood”

Key topics for tax administrators: - Chapter 6: Functions (writing reusable code) - Chapter 9: Functionals (map, reduce, apply functions) - Chapter 23: Measuring performance - Chapter 24: Improving performance


Statistical Learning and Modeling

An Introduction to Statistical Learning
Authors: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
Link: https://www.statlearning.com/

The go-to introduction to machine learning and statistical modeling. Now with R code and labs.

Best for: Users ready for predictive modeling and machine learning
Covers: Linear regression, classification, resampling, regularization, tree-based methods, clustering

Applications for tax administration: - Risk scoring for audit selection - Predicting tax compliance behavior
- Classification of taxpayer types - Anomaly detection in VAT chains - Forecasting tax revenue

How to use: - Start after mastering Modules 1-4 of this course - Work through labs in R - Focus on chapters 2-4 for regression/classification fundamentals - Chapters 5-6 for cross-validation and model selection


Spatial Data and GIS

Geocomputation with R
Authors: Robin Lovelace, Jakub Nowosad, Jannes Muenchow
Link: https://r.geocompx.org/

Comprehensive guide to geographic data analysis with R.

Best for: Working with spatial tax data (geographic regions, districts)
Covers: Spatial data structures, coordinate systems, mapping, spatial operations

Tax administration applications: - Mapping tax collection by region - Analyzing spatial patterns in compliance - Visualizing tax office coverage - Geographic risk analysis

Prerequisites: Complete this course first, especially Module 5 (visualization)


Text Mining and NLP

Text Mining with R: A Tidy Approach
Authors: Julia Silge and David Robinson
Link: https://www.tidytextmining.com/

Learn text mining using tidy data principles with the tidytext package.

Best for: Analyzing unstructured text data (comments, descriptions, documents)
Covers: Sentiment analysis, term frequency, text relationships, topic modeling

Tax administration applications: - Analyzing taxpayer correspondence - Categorizing business descriptions - Mining audit notes and reports - Sentiment analysis of feedback

Prerequisites: Complete Modules 1-4 first


Reproducible Reports and Documents

R Markdown: The Definitive Guide
Authors: Yihui Xie, J.J. Allaire, Garrett Grolemund
Link: https://bookdown.org/yihui/rmarkdown/

Comprehensive guide to creating reproducible reports, presentations, and websites with R Markdown.

Best for: Automating reports and creating professional documents
Covers: PDF/HTML/Word documents, presentations, dashboards, websites, books

Tax administration applications: - Monthly compliance reports - Audit summaries - Data dashboards - Internal documentation

R Markdown Cookbook
https://bookdown.org/yihui/rmarkdown-cookbook/
Practical recipes and tips for common R Markdown tasks


Big Data and Performance

Grant McDermott - Big Data in Economics
Handling large datasets efficiently with data.table, databases, and parallel computing

data.table
https://rdatatable.gitlab.io/data.table/
Fast manipulation of large data (10M+ rows)

Best for: Working with complete taxpayer registries or transaction-level data

Efficient R Programming
https://csgillespie.github.io/efficientR/
Learn to write faster, more efficient R code


Community and Help

Getting Help

RStudio Community
https://community.rstudio.com/
Friendly Q&A forum for R users

Stack Overflow - R tag
https://stackoverflow.com/questions/tagged/r
Technical Q&A for specific programming questions

R for Data Science Online Learning Community
https://www.rfordatasci.com/
Study groups and community support


Staying Current

R Weekly
https://rweekly.org/
Weekly newsletter of R news and resources

R-bloggers
https://www.r-bloggers.com/
Aggregator of R blog posts and tutorials

#RStats on Twitter/X
Follow hashtag #RStats for R tips, news, and community updates


Quick Reference Materials

Cheat Sheets (PDF Downloads)

All available at https://rstudio.github.io/cheatsheets/

Essential for this course: - RStudio IDE - Data Import (readr, readxl) - Data Transformation (dplyr) - Data Tidying (tidyr) - Data Visualization (ggplot2)

For advanced topics: - String Manipulation (stringr) - Dates and Times (lubridate) - Factors (forcats) - R Markdown / Quarto

Back to top