Regression discontinuity experiments with Google Trends data
Between November 2024 and February 2025 the federal government of Canada proposed, passed and executed legislation (Bill C-78, the Tax Break for All Canadians Act) making essentially all food and many holiday essentials tax-free (GST/HST) for two months from December 14, 2024 to February 15, 2025. In this example, we focus around the date this legislation was originally proposed (November 21, 2024) to examine how this news impacted the search volume related to tax terms in Canada, specifically the search term 'GST tax'.
Code:
devtools::install_github("PMassicotte/gtrendsR")
library(gtrendsR)
library(ggplot2)
#read helper functions
source(file.path(getwd(),"get_gtrends_data.R"))
source(file.path(getwd(),"get_rdd_graph.R"))
#get Google Trends data
gtrends_data <- get_gtrends_data("GST tax", search_geo = "CA", search_time = "2024-10-15 2024-12-08")
#write data for future reference
write.csv(gtrends_data, "gst_tax_example.csv", row.names = FALSE)
#get regression discontinuity graph
rdd_graph <- get_rdd_graph(gtrends_data, "Tax break proposed", geo_label = "Canada",
disc_datetime = as.POSIXct("2024-11-21 GMT"))
plot(rdd_graph)
#save plot
ggsave("gst_tax_example.png")
- Google trends reports search volume on a relative scale (interest over time) defined as follows; Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.
- Depending on the start and end points of your Google Trends query the time scale of data returned can vary (hour, day, week, etc.)
- Some queries return values of "<1" for certain times, currently we impute a value of 0.5 for these times.
- Brodeur, Clark, Fleche and Powdthavee (2021): COVID-19, lockdowns and well-being: Evidence from Google Trends
- Holzl, Keusch and Sajons (2024): The (mis)use of Google Trends data in the social sciences - A systematic review, critique, and recommendations
- Carneiro and Mylonakis (2009): Google trends: a web-based tool for real-time surveillance of disease outbreaks
- Cattaneo, Idrobo and Titiunik (2020): A Practical Introduction to Regression Discontinuity Designs: Foundations
- Cattaneo, Idrobo and Titiunik (2024): A Practical Introduction to Regression Discontinuity Designs: Extensions
- Hausman and Rapson (2018): Regression Discontinuity in Time: Considerations for Empirical Applications
- Gelman and Imbens (2014): Why High-order Polynomials Should not be Used in Regression Discontinuity Designs
- Cunningham (2021): The Mixtape, Chapter 6 - Regression Discontinuity
- Huntington-Klein (2022): The Effect, Chapter 20 - Regression Discontinuity
- RDD, allow user to specify discontinuity date
- RDD, add in percent change option
- RDD, automatically calculate optimal polynomial order
- RDD, calculate CIs for discontinuity estimate
- RDD, add fuzzy RDD
- RDD, add local polynomial regression (bandwidths, kernels, etc.)
- RDD, regression kink design (Tools of the Trade: The Regression Kink Design)
- Implement difference in differences (will require >1 time series)
- Basic time series methods (e.g. seasonality)
- example search term: 'flu shot'