1
1
---
2
2
title : " Alabama Lobbyists"
3
- author : " First Last "
3
+ author : " Kiernan Nicholls & Yanqi Xu "
4
4
date : " `r Sys.time()`"
5
5
output :
6
6
github_document :
@@ -142,49 +142,38 @@ legislative body or regulatory body or any committee thereof."
142
142
## Import
143
143
144
144
While the AEC _ does_ provide two Excel files listing [ registered lobbyists] [ 06 ]
145
- and [ registered principal clients] [ 07 ] for 2020 , these two files do not show
146
- the relationship between each lobbyist and those entites for which they lobby.
145
+ and [ registered principal clients] [ 07 ] , these two files do not show
146
+ the relationship between each lobbyist and those entities for which they lobby.
147
147
148
148
Instead, that relationship is documents on annual filings for each individual
149
149
lobbyist. These reports are given as PDF documents and can be searched from the
150
150
[ AEC search page] [ 08 ] .
151
151
152
- The PDF statements can be then be viewed one at a time. Each yearly PDF has a
152
+ The PDF statements can be then be viewed one at a time. Each yearly PDF of a lobyist has a
153
153
unique lobbyist ID (` lid ` ), which can be passed to an ` httr::GET() ` request to
154
154
save the PDF.
155
155
156
- ``` {r get, eval=FALSE}
157
- GET(
158
- url = "http://ethics.alabama.gov/search/ViewReports.aspx",
159
- write_disk(path, overwrite = TRUE),
160
- query = list(
161
- lid = 21,
162
- rpt = "rptLobbyistRegistration"
163
- )
164
- )
165
- ```
166
-
167
156
[ 06 ] : https://ethics-form.alabama.gov/entity/FileUpload2015/RegisteredLobbyist/WebDataForExcel_2010.aspx
168
157
[ 07 ] : https://ethics-form.alabama.gov/entity/FileUpload2015/RegisteredLobbyist/rptPrincipalsListing_Excel.aspx
169
158
[ 08 ] : http://ethics.alabama.gov/search/PublicEmployeeSearch.aspx
170
159
### Download
171
160
172
- Opening random PDF's from 2008 to 2020 , it seems as though their are valid
173
- lobbyist ID's from 1 to 11,000 (with roughly 25% inbetween leading to "empty"
161
+ Opening random PDF's from 2008 to 2023 , it seems as though their are valid
162
+ lobbyist ID's from 1 to 14,900 (with roughly 25% inbetween leading to "empty"
174
163
files without any information).
175
164
176
165
This takes ** hours** , but we can loop through each ID and write the file to
177
166
disk.
178
167
179
168
``` {r raw_dir, eval=TRUE}
180
- raw_dir <- dir_create(here("al", "lobby", "data", "raw"))
169
+ raw_dir <- dir_create(here("state"," al", "lobby", "data", "raw"))
181
170
```
182
171
183
172
``` {r download, eval=FALSE}
184
- n <- 11100
173
+ n <- 14900
185
174
start_time <- Sys.time()
186
175
if (length(dir_ls(raw_dir)) < 5000) {
187
- for (i in seq(min, n)) {
176
+ for (i in seq(n)) {
188
177
path <- glue("{raw_dir}/reg_{str_pad(i, nchar(n), pad = '0')}.pdf")
189
178
loop_start <- Sys.time()
190
179
# make get request
@@ -354,13 +343,23 @@ frame_pdf <- function(file) {
354
343
We can then apply this function to every PDF downloaded and combine the results
355
344
of each into a single giant data frame.
356
345
357
- ``` {r}
346
+ ``` {r, eval=FALSE }
358
347
allr <- map_df(
359
348
.x = dir_ls(raw_dir),
360
349
.f = frame_pdf
361
350
)
362
351
```
363
352
353
+ ``` {r, eval=FALSE, echo=FALSE}
354
+ allr %>% write_csv(path(raw_dir, "allr_from_pdf.csv"), na = "")
355
+ ```
356
+
357
+ ``` {r, echo=FALSE}
358
+ allr <- read_csv(path(raw_dir, "allr_from_pdf.csv"))
359
+ ```
360
+
361
+
362
+
364
363
## Explore
365
364
366
365
``` {r glimpse}
@@ -399,13 +398,13 @@ allr <- allr %>%
399
398
separate(
400
399
col = lob_city,
401
400
into = c("lob_city", "lob_state"),
402
- sep = ",\\s(?=[:upper: ])",
401
+ sep = ",\\s(?=[A-Z ])",
403
402
extra = "merge"
404
403
) %>%
405
404
mutate_at(
406
405
.vars = vars(lob_state),
407
406
.funs = str_remove,
408
- pattern = "(.*,\\s)(?=[:upper: ])"
407
+ pattern = "(.*,\\s)(?=[A-Z ])"
409
408
) %>%
410
409
separate(
411
410
col = lob_state,
@@ -452,8 +451,7 @@ allr <- allr %>%
452
451
mutate_if(
453
452
.predicate = is_character,
454
453
.funs = str_trim
455
- ) %>%
456
- na_if("")
454
+ )
457
455
```
458
456
459
457
``` {r echo=FALSE}
@@ -629,7 +627,7 @@ progress_table(
629
627
## Export
630
628
631
629
``` {r create_proc_dir}
632
- proc_dir <- dir_create(here("al", "lobby", "data", "processed"))
630
+ proc_dir <- dir_create(here("state"," al", "lobby", "data", "processed"))
633
631
```
634
632
635
633
``` {r write_clean}
@@ -643,7 +641,7 @@ allr %>%
643
641
pri_city_norm = pri_city_swap,
644
642
) %>%
645
643
write_csv(
646
- path = glue("{proc_dir}/al_lobbyists .csv"),
644
+ path = glue("{proc_dir}/al_lobby_reg .csv"),
647
645
na = ""
648
646
)
649
647
```
0 commit comments