|
6 | 6 | "description": "Some additional details about the website",
|
7 | 7 | "author": [],
|
8 | 8 | "contents": "\r\n\r\n\r\n\r\n",
|
9 |
| - "last_modified": "2023-06-02T16:14:51-07:00" |
| 9 | + "last_modified": "2023-06-20T15:15:59-07:00" |
10 | 10 | },
|
11 | 11 | {
|
12 | 12 | "path": "functions.html",
|
13 | 13 | "title": "Functions",
|
14 | 14 | "description": "Repeating things? Let's functionalize it!\n",
|
15 | 15 | "author": [],
|
16 | 16 | "contents": "\r\n\r\nContents\r\nWhat’s in a function?\r\nLet’s create a custom function!\r\nCall the function\r\n\r\nBenefits of creating functions\r\n\r\nWhen we see code being repeated more than once, functions are a great way to reduce duplication. Even if we call a function only once, they can be a nice way to break up large complicated processes.\r\nWhat’s in a function?\r\nThe Formals\r\nThe Body\r\nThe Environment\r\nTo define a function here’s the basic skeleton\r\n\r\n\r\nmy_function_name <- function() {\r\n \r\n}\r\n\r\n\r\nLet’s create a custom function!\r\nHere’s a CHAS table. Each csv will look similar to this:\r\n\r\n\r\nfile_01 <- read_csv(here('data', '050', 'Table9.csv'))\r\nhead(file_01, 10)\r\n\r\n# A tibble: 10 × 152\r\n source sumlevel geoid name st cnty T9_est1 T9_est2 T9_est3\r\n <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>\r\n 1 2015thru2… 050 0500… Auta… 01 001 21395 15680 12835\r\n 2 2015thru2… 050 0500… Bald… 01 003 80930 60895 54425\r\n 3 2015thru2… 050 0500… Barb… 01 005 9345 5690 3460\r\n 4 2015thru2… 050 0500… Bibb… 01 007 6890 5130 4330\r\n 5 2015thru2… 050 0500… Blou… 01 009 20845 16425 15090\r\n 6 2015thru2… 050 0500… Bull… 01 011 3520 2505 750\r\n 7 2015thru2… 050 0500… Butl… 01 013 6505 4550 2925\r\n 8 2015thru2… 050 0500… Calh… 01 015 44605 31255 25110\r\n 9 2015thru2… 050 0500… Cham… 01 017 13450 9070 5745\r\n10 2015thru2… 050 0500… Cher… 01 019 10735 8305 7730\r\n# ℹ 143 more variables: T9_est4 <dbl>, T9_est5 <dbl>, T9_est6 <dbl>,\r\n# T9_est7 <dbl>, T9_est8 <dbl>, T9_est9 <dbl>, T9_est10 <dbl>,\r\n# T9_est11 <dbl>, T9_est12 <dbl>, T9_est13 <dbl>, T9_est14 <dbl>,\r\n# T9_est15 <dbl>, T9_est16 <dbl>, T9_est17 <dbl>, T9_est18 <dbl>,\r\n# T9_est19 <dbl>, T9_est20 <dbl>, T9_est21 <dbl>, T9_est22 <dbl>,\r\n# T9_est23 <dbl>, T9_est24 <dbl>, T9_est25 <dbl>, T9_est26 <dbl>,\r\n# T9_est27 <dbl>, T9_est28 <dbl>, T9_est29 <dbl>, T9_est30 <dbl>, …\r\n\r\nSuppose we’d like to do some cleaning to each CHAS table in the same manner. Let’s create one that does the following:\r\nfilter for WA state and PSRC counties\r\npivot longer (so columns that start with ‘T’ are not across the table)\r\ncreate 3 more columns that dissect the column containing the former ‘T…’ headers:\r\ncreate ‘table’ field extracting T and the numbers before the underscore\r\ncreate a ‘type’ field to identify whether values are ‘est’ or ‘moe’\r\ncreate a ‘sort’ field extracting the numeric digits at the end\r\n\r\n\r\n\r\n# define the skeleton of our function\r\n# add table as a parameter\r\nclean_table <- function(table) {\r\n \r\n # fill it in!\r\n \r\n}\r\n\r\n\r\nFill in the body with the argument to clean\r\n\r\n\r\nclean_table <- function(table) {\r\n table %>% \r\n filter(st == 53 & cnty %in% c('033', '035', '053', '061')) %>% \r\n pivot_longer(cols = str_subset(colnames(table), \"^T.*\"), \r\n names_to = 'header', \r\n values_to = 'value') %>% \r\n mutate(table = str_extract(header, \"^T\\\\d*(?=_)\"), \r\n type = str_extract(header, \"(?<=_)\\\\w{3}\"), \r\n sort = str_extract(header, \"\\\\d+$\")) \r\n}\r\n\r\n# Regex used:\r\n# table: \"^T\\\\d*(?=_)\" string starting with T and numeric digits followed by _\r\n# type: \"(?<=_)\\\\w{3}\" 3 letters preceded by _\r\n# sort: \"\\\\d+$\" last numeric digits at the end of the string\r\n\r\n\r\n\r\nFunctions will generally return the last evaluated expression. With the piping (%>%) in dplyr, our example is essentially a one liner expression. You can always add return(<name of object>) to explicitly return a specific object whenever your function is called.\r\nCall the function\r\n\r\n\r\nt9 <- clean_table(file_01)\r\n\r\n\r\nTry with other files\r\n\r\n\r\nfile_02 <- read_csv(here('data', '050', 'Table10.csv'))\r\nfile_03 <- read_csv(here('data', '050', 'Table11.csv'))\r\n\r\nt10 <- clean_table(file_02)\r\nt11 <- clean_table(file_03)\r\n\r\n\r\nIf we forgot a step in the cleaning process, we can always edit the function and re-run our script\r\n\r\n\r\n# Let's make this edit to our function that will convert the sort column from string to numeric\r\nsort = as.numeric(str_extract(header, \"\\\\d+$\"))\r\n\r\n\r\nBenefits of creating functions\r\nEasier editing of code\r\nReduce redundancy\r\nBreak long processes into chunks\r\n\r\n\r\n\r\n",
|
17 |
| - "last_modified": "2023-06-02T16:14:55-07:00" |
| 17 | + "last_modified": "2023-06-20T15:16:02-07:00" |
18 | 18 | },
|
19 | 19 | {
|
20 | 20 | "path": "index.html",
|
21 | 21 | "title": "Code Organization",
|
22 | 22 | "description": "Functions, Lists, and Loops\n",
|
23 | 23 | "author": [],
|
24 | 24 | "contents": "\r\nIt’s never too late to start organizing code! It’s a step towards cleaner scripts which not only benefit the machines that execute it but the humans who are reading and understanding the logic. We’ll go over three essentials of programming (in R) that can help us remove redundant code and efficiently process and store multiple objects. With these tools: functions, lists, and loops, your scripts can become more concise and easier to navigate.\r\n\r\n\r\n\r\n\r\n\r\nShortcut Keys\r\nf: full-screen\r\nEsc: exit full-screen\r\no: tile view\r\nleft or right arrow: advance slide\r\n\r\n\r\n\r\n",
|
25 |
| - "last_modified": "2023-06-02T16:14:59-07:00" |
| 25 | + "last_modified": "2023-06-20T15:16:03-07:00" |
26 | 26 | },
|
27 | 27 | {
|
28 | 28 | "path": "lists.html",
|
29 | 29 | "title": "Lists",
|
30 | 30 | "description": "Lists are our friend!\n",
|
31 | 31 | "author": [],
|
32 | 32 | "contents": "\r\nLists can not only store a mix of data types, but also more complex data (e.g. data frames, even lists themselves!). It’s an ideal option for a group of similar complex data, and decently sets us up for iteration! So in preparation for for loops, let’s examine lists and how to extract data from them.\r\nCreate a list\r\n\r\n\r\n# an empty list\r\nl <- list() \r\n\r\n# populated with objects\r\nl <- list(file_01, file_02, file_03) \r\n\r\n\r\nAnatomy\r\nThere’s several layers to a list, like a container within a container. To extract a specific element’s data, use double brackets instead one.\r\n\r\n\r\nl # the whole list and all its elements\r\n\r\nl[1] # the first element in its container; contains the name/index of element and the data\r\n\r\nl[[1]] # the data of the first element\r\n\r\n\r\nHadley Wickham’s pepper analogy\r\n\r\nNames\r\nLike vectors, lists can also be named\r\n\r\n\r\nnames(l) <- c('table9', 'table10', 'table11')\r\n\r\n\r\nNow you can also access the data of a specific element by name using $ or [[]]\r\n\r\n\r\nl$table9\r\n\r\nl[['table9']]\r\n\r\n# saving processes back into the list\r\nl$table9 <- l$table9 %>% filter(cnty == '033')\r\n\r\n\r\nNow if you extract an element with only one [] like l[1], you can see that it’s a container holding the name of the element and the data itself.\r\n\r\n\r\n\r\n\r\n",
|
33 |
| - "last_modified": "2023-06-02T16:15:00-07:00" |
| 33 | + "last_modified": "2023-06-20T15:16:04-07:00" |
34 | 34 | },
|
35 | 35 | {
|
36 | 36 | "path": "loops.html",
|
37 | 37 | "title": "Loops",
|
38 | 38 | "description": "Over and Over again...\n",
|
39 | 39 | "author": [],
|
40 | 40 | "contents": "\r\n\r\nContents\r\nAnatomy\r\nSequence\r\nOutput\r\n\r\nPut it all together\r\nSequencing alternatives\r\n\r\nAlter the Flow\r\nBreak\r\nNext\r\n\r\n\r\nWe could explicitly call the function as many times need be to clean the tables we’re interested in.\r\n\r\n\r\ntable9 <- clean_table(file_01)\r\n\r\ntable10 <- clean_table(file_02)\r\n\r\ntable11 <- clean_table(file_03)\r\n\r\n\r\nBut that would be redundant! Imagine if we were reading in 15 of those csvs! There would be extra lines of code and as many extra variable names in your global environment to keep track of. Let’s see how loops paired with lists can help us!\r\nAnatomy\r\nA for loop has three parts:\r\nThe Output: where the stuff will be stored\r\nThe Sequence: code within () that shows what to loop over\r\nThe Body: code within {} that does the work\r\nSequence\r\nThe sequence lies within the (). It will follow this structure:\r\n([variable name of your choice] in [list or vector])\r\nThe sequence tells the machine what to loop over. The variable name of your choice will represent a single element within the list or vector in a loop.\r\nIn the example below, with every iteration, df will be a counter and represent a different data frame in l\r\n\r\n\r\nl <- list(file_01, file_02, file_03)\r\n\r\n# for every data frame (df) in list (l)...\r\nfor(df in l) {\r\n \r\n}\r\n\r\n\r\nTry printing the head() of each data frame in our list\r\n\r\n\r\nl <- list(file_01, file_02, file_03) \r\n\r\nfor(df in l) {\r\n print(head(df))\r\n}\r\n\r\n\r\nTry printing a version of each data frame with clean_table()\r\n\r\n\r\nfor(df in l) {\r\n print(clean_table(df))\r\n}\r\n\r\n\r\nOutput\r\nNow instead of printing stuff, let’s store stuff in a list!\r\nA way to use both lists and loops is to read data. Let’s create a loop to read-in csvs 1 through 11 and store them into a list.\r\nInitiate the Output (dfs) to store the end result (data frames from csvs)\r\nConstruct the sequence you’ll be looping over (csv) (file names of csvs)\r\nRead csv (t)\r\n\r\n\r\ndfs <- list()\r\n\r\ncsv <- paste0('Table', 1:11, '.csv')\r\n\r\nfor(c in csv) {\r\n t <- read_csv(here('data', '050', c))\r\n dfs[[c]] <- t\r\n}\r\n\r\n\r\nRename the elements in the list\r\n\r\n\r\nnames(dfs) <- paste0('Table', 1:11)\r\n\r\n\r\nPut it all together\r\nEdit the loop so that we clean the tables as we’re reading in the csvs.\r\n\r\n\r\nfor(c in csv) {\r\n t <- read_csv(here('data', '050', c))\r\n ct <- clean_table(t)\r\n dfs[[c]] <- ct\r\n}\r\n\r\n\r\nWith loops and a list to store the output, we’ve removed code redundancy. Instead of calling clean_table() for every table in our list, it just required editing a couple lines within the loop to make that adjustment.\r\nSequencing alternatives\r\n\r\n\r\nfor(df in 1:length(l)) {\r\n \r\n}\r\n\r\n\r\nAlter the Flow\r\nBreak\r\nNext\r\n\r\n\r\n\r\n",
|
41 |
| - "last_modified": "2023-06-02T16:15:01-07:00" |
| 41 | + "last_modified": "2023-06-20T15:16:05-07:00" |
42 | 42 | },
|
43 | 43 | {
|
44 | 44 | "path": "prework.html",
|
45 | 45 | "title": "Pre-work",
|
46 | 46 | "author": [],
|
47 |
| - "contents": "\r\n\r\nContents\r\nSession Files on Local Drive\r\nOpen .Rproj\r\n\r\nInstall\r\nTest\r\n\r\nWe’ll be going over Functions, Lists, and Loops in this session. Some familiarity with using R and opening RStudio is helpful if you are actively participating, otherwise all are welcome to watch and learn.\r\nR and the RStudio IDE are required. See the first module on R Basics for guidance.\r\nSession Files on Local Drive\r\nClone the repo https://github.com/psrc/intro-code-org onto your local drive.\r\nDownload the CHAS zip file of the 2015-2019 ACS 5-year average data for Census places from here.\r\nExtract the zipfile in the data sub-directory of the cloned repo.\r\nAll data files will be found in data/050.\r\n\r\nMake sure that the repo and other files for the session are on your local drive. If they are on PSRC’s network, you may experience extreme sluggishness when using a .Rproj file.\r\nOpen .Rproj\r\nIn the RStudio IDE, open the .Rproj file. Project files will automatically set our working directory–in this case, the root of the directory. No need for setwd() and dealing with file paths!\r\n\r\nAfter opening the .Rproj file, you’ll see some changes to your IDE. Your console and Files pane will reflect the new working directory, and the project name is listed in the top right corner.\r\n\r\nTo close out of the project, click the project name at the top right corner and select Close Project. On the day of the session, you can access the dropdown in that area of the IDE and select ``.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nInstall\r\nIf the following packages have not been installed, install the following by running the following code snippet in the console of your RStudio IDE. Ignore any warnings regarding Rtools and if you are asked to install from sources which needs compilation, click ‘No’.\r\n\r\nSome may have already been installed from previous modules. You can adapt the code snippet accordingly.\r\n\r\n\r\ninstall.packages(c('tidyverse', 'here'))\r\n\r\n\r\nTest\r\nTest to make sure you can read csvs from the CHAS dataset that was downloaded. Adjust the line below according to how you’ve stored your files in the data sub-directory.\r\n\r\n\r\nfile_01 <- read_csv(here('data', '050', 'Table9.csv'))\r\n\r\n\r\n\r\n\r\n\r\n", |
48 |
| - "last_modified": "2023-06-02T16:15:02-07:00" |
| 47 | + "contents": "\r\n\r\nContents\r\nSession Files on Local Drive\r\nOpen .Rproj\r\n\r\nInstall\r\nTest\r\n\r\nWe’ll be going over Functions, Lists, and Loops in this session. Some familiarity with using R and opening RStudio is helpful if you are actively participating, otherwise all are welcome to watch and learn.\r\nR and the RStudio IDE are required. See the first module on R Basics for guidance.\r\nSession Files on Local Drive\r\nClone the repo https://github.com/psrc/intro-code-org onto your local drive.\r\nDownload the CHAS zip file of the 2015-2019 ACS 5-year average data for Census counties from here.\r\nExtract the zipfile in the data sub-directory of the cloned repo.\r\nAll data files will be found in data/050.\r\n\r\nMake sure that the repo and other files for the session are on your local drive. If they are on PSRC’s network, you may experience extreme sluggishness when using a .Rproj file.\r\nOpen .Rproj\r\nIn the RStudio IDE, open the .Rproj file in the repo. Project files will automatically set our working directory–in this case, the root of the directory. No need for setwd() and dealing with file paths!\r\nAfter opening the .Rproj file, you’ll see some changes to your IDE. Your console and Files pane will reflect the new working directory, and the project name is listed in the top right corner.\r\nTo close out of the project, click the project name at the top right corner and select Close Project. On the day of the session, you can access the dropdown in that area of the IDE and select intro-code-org.\r\nInstall\r\nIf the following packages have not been installed, install the following by running the following code snippet in the console of your RStudio IDE. Ignore any warnings regarding Rtools and if you are asked to install from sources which needs compilation, click ‘No’.\r\n\r\nSome may have already been installed from previous modules. You can adapt the code snippet accordingly.\r\n\r\n\r\ninstall.packages(c('tidyverse', 'here'))\r\n\r\n\r\nTest\r\nTest to make sure you can read csvs from the CHAS dataset that was downloaded. Adjust the line below according to how you’ve stored your files in the data sub-directory.\r\n\r\n\r\nfile_01 <- read_csv(here('data', '050', 'Table9.csv'))\r\n\r\n\r\n\r\n\r\n\r\n", |
| 48 | + "last_modified": "2023-06-20T15:16:05-07:00" |
49 | 49 | }
|
50 | 50 | ],
|
51 | 51 | "collections": []
|
|
0 commit comments