[1] 100
R: the basicsR Training
We will learn to use the R programming language!
Using administrative data familiar to tax administrations.
Some pre-requirements
❌ The training does not require any background in statistical programming.
✅ A computer with R and RStudio installed is required to complete the exercises.
✅ Internet connection is required to download training materials.
R?R is a programming language with powerful statistical and graphic capabilities.


R?R is very flexible and powerful—adaptable to nearly any task, (data cleaning, data visualization, econometrics, spatial data analysis, machine learning, web scraping, etc.)R is open source and free to use - allowing both you and your institution to save money!R has been growing rapidly in popularity.
R offers a great interface - RStudio.
Excel?✅ Easy to use.
❌ Only good for small datasets.
❌ We don’t keep track of what we do.
❌ Not straightforward to merge data.
❌ And the list goes on…

STATA?✅ Stata is widely used in economics.
✅ Easy to learn.
❌ Only good for small datasets.
❌ Expensive!
❌ Lack of flexibility… do you hate keep, preserve, and restore too?

If you don’t, make sure you opened RStudio and not R!
Let’s begin by writing your R scripts (source code) in the Source pane.
You can use the menubar or Ctrl + Shift + N to create new R scripts.
Scripts help us document and organize the steps we want to perform.
To run a command, type it in the Source pane and press Ctrl+Enter (Windows) to execute it in the Console.
The result will appear in the Console pane (bottom-left panel).
The Environment pane displays all the objects you’ve created during your session.
A simple sum:
Subtraction, multiplication, division:
Instead of just calculating, we can save results for later use.
_) - this is called snake_caseA function is a reusable piece of code that performs a specific task.
Think of functions as tools in a toolbox 🧰
Structure of a function call:
function_name(argument1, argument2, ...)
Arguments are inputs to the function.
Positional arguments - order matters:
Named arguments - order doesn’t matter (recommended!):
Tip
Using named arguments makes your code clearer and less error-prone.
Three ways to learn about functions:
Use class() to check the type:
Creating logical values through comparisons:
[1] TRUE
[1] TRUE
[1] TRUE
A vector is a sequence of data elements of the same type.
Think of it like a column in a spreadsheet!
Note
All elements in a vector must be the same type (all numbers, all text, or all logical values).
Numeric vectors:
Character vectors:
Creating sequences with :
More control with seq()
length()[1] 5
R performs calculations element-by-element:
[1] 55000 82500 99000 49500 90200
Functions that work on entire vectors:
Get a single element by position:
Find elements that meet a condition:
[1] FALSE TRUE TRUE FALSE TRUE
Use logical vectors to filter:
In real tax administration data, missing values are common:
Math with NA returns NA!
Warning
Any calculation involving NA will return NA unless you explicitly handle it!
na.rm ArgumentMost statistical functions have an na.rm argument (NA remove):
[1] 68400
[1] 342000
[1] 90000
Tip
Always check your data for missing values and decide how to handle them!
[1] 50000 75000 90000 45000 82000
Practical example: Compliance rate
Compliance Rate: 71.4 %
[1] "FIRM_3" "FIRM_6"
10:00 Part 1: Objects and Calculations
base_amount with value 125000tax_rate with value 0.15 (15%)tax_owedtax_owed and store in total_paymentPart 2: Using Functions
total_payment to the nearest integerbase_amountabs() function to get the absolute value of -500010:00 Part 1: Creating Vectors
firm_ids: “FIRM_001” through “FIRM_006” (use paste0() and 1:6)vat_amounts: 50000, 75000, NA, 90000, 45000, NAyears: 2020 to 2025 using :standard_rate: repeat 0.15 six times using rep()Part 2: Vector Operations
vat_amounts by 1.05 to add 5% penaltyPart 3: Indexing
Part 4: Missing Data
# Part 1: Creating Vectors
firm_ids = paste0("FIRM_", sprintf("%03d", 1:6))
vat_amounts = c(50000, 75000, NA, 90000, 45000, NA)
years = 2020:2025
standard_rate = rep(0.15, times = 6)
# Part 2: Vector Operations
mean(vat_amounts, na.rm = TRUE)
sum(vat_amounts, na.rm = TRUE)
vat_amounts * 1.05
# Part 3: Indexing
vat_amounts[3]
vat_amounts[c(2, 4, 5)]
vat_amounts[vat_amounts > 60000 & !is.na(vat_amounts)]
# Part 4: Missing Data
sum(is.na(vat_amounts))
is.na(vat_amounts)
firm_ids[is.na(vat_amounts)]R packages are collections of functions created by the community.
Think of base R as a smartphone, and packages as apps you install!
Two steps to use a package:
install.packages("packageName")library(packageName)Note
Think of it like this:
Some packages you’ll use in this course:
dplyr - Data manipulation and transformation
ggplot2 - Creating professional visualizations
readr / readxl - Reading CSV and Excel files
data.table - Fast operations on large datasets
lubridate - Working with dates
Tip
We’ll learn these packages in upcoming modules!
Always save your work in scripts - not just the console!
Functions are your friends - use help() when unsure
Vectors are everywhere - they’re the foundation of data in R
Handle NAs explicitly - use na.rm = TRUE in calculations
Comment your code - your future self will thank you!
Comments: Explaining Your Code
Use
#to add comments - R will ignore everything after#Tip
Good practice: Comment your code to explain WHY you’re doing something, not just WHAT you’re doing.