Skip to content

Commit ce4cf3b

Browse files
committed
WI contribs, AL lobby
1 parent 915421e commit ce4cf3b

15 files changed

+467
-427
lines changed

state/al/lobby/docs/al_lobby_diary.Rmd

Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Alabama Lobbyists"
3-
author: "First Last"
3+
author: "Kiernan Nicholls & Yanqi Xu"
44
date: "`r Sys.time()`"
55
output:
66
github_document:
@@ -142,49 +142,38 @@ legislative body or regulatory body or any committee thereof."
142142
## Import
143143

144144
While the AEC _does_ provide two Excel files listing [registered lobbyists][06]
145-
and [registered principal clients][07] for 2020, these two files do not show
146-
the relationship between each lobbyist and those entites for which they lobby.
145+
and [registered principal clients][07], these two files do not show
146+
the relationship between each lobbyist and those entities for which they lobby.
147147

148148
Instead, that relationship is documents on annual filings for each individual
149149
lobbyist. These reports are given as PDF documents and can be searched from the
150150
[AEC search page][08].
151151

152-
The PDF statements can be then be viewed one at a time. Each yearly PDF has a
152+
The PDF statements can be then be viewed one at a time. Each yearly PDF of a lobyist has a
153153
unique lobbyist ID (`lid`), which can be passed to an `httr::GET()` request to
154154
save the PDF.
155155

156-
```{r get, eval=FALSE}
157-
GET(
158-
url = "http://ethics.alabama.gov/search/ViewReports.aspx",
159-
write_disk(path, overwrite = TRUE),
160-
query = list(
161-
lid = 21,
162-
rpt = "rptLobbyistRegistration"
163-
)
164-
)
165-
```
166-
167156
[06]: https://ethics-form.alabama.gov/entity/FileUpload2015/RegisteredLobbyist/WebDataForExcel_2010.aspx
168157
[07]: https://ethics-form.alabama.gov/entity/FileUpload2015/RegisteredLobbyist/rptPrincipalsListing_Excel.aspx
169158
[08]: http://ethics.alabama.gov/search/PublicEmployeeSearch.aspx
170159
### Download
171160

172-
Opening random PDF's from 2008 to 2020, it seems as though their are valid
173-
lobbyist ID's from 1 to 11,000 (with roughly 25% inbetween leading to "empty"
161+
Opening random PDF's from 2008 to 2023, it seems as though their are valid
162+
lobbyist ID's from 1 to 14,900 (with roughly 25% inbetween leading to "empty"
174163
files without any information).
175164

176165
This takes **hours**, but we can loop through each ID and write the file to
177166
disk.
178167

179168
```{r raw_dir, eval=TRUE}
180-
raw_dir <- dir_create(here("al", "lobby", "data", "raw"))
169+
raw_dir <- dir_create(here("state","al", "lobby", "data", "raw"))
181170
```
182171

183172
```{r download, eval=FALSE}
184-
n <- 11100
173+
n <- 14900
185174
start_time <- Sys.time()
186175
if (length(dir_ls(raw_dir)) < 5000) {
187-
for (i in seq(min, n)) {
176+
for (i in seq(n)) {
188177
path <- glue("{raw_dir}/reg_{str_pad(i, nchar(n), pad = '0')}.pdf")
189178
loop_start <- Sys.time()
190179
# make get request
@@ -354,13 +343,23 @@ frame_pdf <- function(file) {
354343
We can then apply this function to every PDF downloaded and combine the results
355344
of each into a single giant data frame.
356345

357-
```{r}
346+
```{r, eval=FALSE}
358347
allr <- map_df(
359348
.x = dir_ls(raw_dir),
360349
.f = frame_pdf
361350
)
362351
```
363352

353+
```{r, eval=FALSE, echo=FALSE}
354+
allr %>% write_csv(path(raw_dir, "allr_from_pdf.csv"), na = "")
355+
```
356+
357+
```{r, echo=FALSE}
358+
allr <- read_csv(path(raw_dir, "allr_from_pdf.csv"))
359+
```
360+
361+
362+
364363
## Explore
365364

366365
```{r glimpse}
@@ -399,13 +398,13 @@ allr <- allr %>%
399398
separate(
400399
col = lob_city,
401400
into = c("lob_city", "lob_state"),
402-
sep = ",\\s(?=[:upper:])",
401+
sep = ",\\s(?=[A-Z])",
403402
extra = "merge"
404403
) %>%
405404
mutate_at(
406405
.vars = vars(lob_state),
407406
.funs = str_remove,
408-
pattern = "(.*,\\s)(?=[:upper:])"
407+
pattern = "(.*,\\s)(?=[A-Z])"
409408
) %>%
410409
separate(
411410
col = lob_state,
@@ -452,8 +451,7 @@ allr <- allr %>%
452451
mutate_if(
453452
.predicate = is_character,
454453
.funs = str_trim
455-
) %>%
456-
na_if("")
454+
)
457455
```
458456

459457
```{r echo=FALSE}
@@ -629,7 +627,7 @@ progress_table(
629627
## Export
630628

631629
```{r create_proc_dir}
632-
proc_dir <- dir_create(here("al", "lobby", "data", "processed"))
630+
proc_dir <- dir_create(here("state","al", "lobby", "data", "processed"))
633631
```
634632

635633
```{r write_clean}
@@ -643,7 +641,7 @@ allr %>%
643641
pri_city_norm = pri_city_swap,
644642
) %>%
645643
write_csv(
646-
path = glue("{proc_dir}/al_lobbyists.csv"),
644+
path = glue("{proc_dir}/al_lobby_reg.csv"),
647645
na = ""
648646
)
649647
```

0 commit comments

Comments
 (0)