-
Notifications
You must be signed in to change notification settings - Fork 0
/
data_analysis.html
208 lines (158 loc) · 9.36 KB
/
data_analysis.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<meta name="copyright" content="MACode ID, https://macodeid.com/">
<title>YA LIT FP</title>
<link rel="stylesheet" href="stylesheets/stylesheet.css">
<link rel="stylesheet" href="stylesheets/normalize.css">
<link rel="stylesheet" href="stylesheets/github-light.css">
<link rel="stylesheet" href="assets/css/maicons.css">
<link rel="stylesheet" href="assets/css/bootstrap.css">
<link rel="stylesheet" href="assets/vendor/animate/animate.css">
<link rel="stylesheet" href="assets/css/theme.css">
<script src="components/page/footer.js" type="text/javascript" defer></script>
<script src="components/page/header.js" type="text/javascript" defer></script>
</head>
<body>
<!-- Back to top button -->
<div class="back-to-top"></div>
<header-component pagename="Process"></header-component>
<div class="page-section">
<div class="container">
<div class="row align-items-center">
<div class="col py-3">
<h2 class="title-section">Process and Works Cited</h2>
<div class="divider"></div>
<h2>Annotated Bibliography</h2>
<p><b>Moore-Porter, Terri. "Diversity in Young-Adult Literature and its Impact on Self-Identity in Minority
and
Majority Students in the Secondary English Classroom." Order No. 27698645 The University of Findlay, 2019.
Ann Arbor: ProQuest. Web. 9 Nov. 2022.</b></p>
<a type="button" class="btn btn-light" href="https://etd.ohiolink.edu/apexprod/rws_olink/r/1501/10?clear=10&p10_accession_num=findlay1570618161175647">Visit Paper</a>
<p>This article, written by Terri Moore-Porter, discusses how reading Young Adult literature impacts both
minority and majority high school students. It discusses the danger of a single story, and how this time
period in students' development is critical to their ideas around self-identity. It then analyzes the
experiences of a classroom of honors seniors and how their self-identity changes as they read more diverse
stories. This article was published by the University of Findlay. This article is particularly relevant to
my topic, as it addresses the core topics that I hope to better understand through my research.
</p>
<p style="font:bold"><b>The New York Times Developer Network. Books API. 09 Nov. 2022. Raw data. The New York
Times, New York.</b></p>
<a type="button" class="btn btn-light" href="https://developer.nytimes.com/docs/books-product/1/routes/lists.json/get">View Webpage</a>
<p>This database is generated by the New York Times Bestseller list, and contains information about all of the
books that have been on the NYT Bestseller list in a variety of categories. This data is unbiased and
presents the raw data in a JSON format. The data is published by the New York Times Developer Network and is
available to all publicly. This data format is particularly valuable because of its ease of access,
manipulability, and mainly because the NYT Bestseller list is seen to many as a common measure of a book's
success. The data gathered here will provide a good insight into what are the most common reads and what is
widely read among American readers.
</p>
<h2>Code Process</h2>
<h3>Getting the Books from the NYT Database</h3>
<pre>
<code>
from urllib.request import urlopen
import requests
import pandas as pd
import json
import time
api_key = ""
years= ['2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022']
months=['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
def map_to_isbn(list):
return list[0]['primary_isbn13']
def map_to_title(list):
return list[0]['title']
def map_to_author(list):
return list[0]['author']
def map_to_name(obj):
return obj['name']
def get_df_from_json(month, year):
print("{}/{}".format(month, year))
url = "https://api.nytimes.com/svc/books/v3/lists.json?list=young-adult-hardcover&published-date={}-{}-01&api-key={}".format(year, month, api_key)
response = requests.get(url)
time.sleep(8)
data_json = response.json()
books=data_json['results']
books_df= pd.DataFrame.from_records(books)
books_df['primary_isbn13']=books_df['book_details'].map(map_to_isbn)
books_df['title']=books_df['book_details'].map(map_to_title)
books_df['author']=books_df['book_details'].map(map_to_author)
books_df['year']=year
books_df=books_df.drop(columns=['display_name', 'isbns', 'asterisk', 'dagger', 'bestsellers_date', 'amazon_product_url', 'reviews', 'list_name', 'book_details', 'rank_last_week'])
return books_df
def get_genres(isbn):
url = "https://openlibrary.org/api/books?bibkeys=ISBN:{}&jscmd=data&format=json".format(isbn)
response = requests.get(url)
time.sleep(5)
data_json = response.json()
subjects = data_json['ISBN:{}'.format(isbn)]['subjects']
subjects = map(map_to_name, subjects)
return list(subjects)
books_df=pd.DataFrame()
for year in years:
for month in months:
books_df = books_df.append(get_df_from_json(month, year))
</code>
</pre>
<h3>Getting the subjects for each book from the OpenLibrary Database</h3>
<pre>
<code>
def get_genres(isbn):
subjects = []
try:
print(isbn)
url = "https://openlibrary.org/api/books?bibkeys=ISBN:{}&jscmd=data&format=json".format(isbn)
response = requests.get(url)
time.sleep(8)
data_json = response.json()
subjects = data_json['ISBN:{}'.format(isbn)]['subjects']
subjects = map(map_to_name, subjects)
except:
print('errored')
return list(subjects)
books_df['subjects'] = books_df['primary_isbn13'].map(get_genres)
</code>
</pre>
<h3>Processing the subjects to see which ones have diverse topics</h3>
<pre>
<code>
import collections
print(books_df)
contains=['New York Times bestseller', 'Large type books', 'nyt:young-adult-e-book', 'YOUNG ADULT FICTION', 'nyt:young-adult-paperback', 'nyt:young-adult-hardcover', 'JUVENILE FICTION', 'collectionID', 'Young adult fiction', 'YA fiction', 'Ficción juvenil' ]
mental_health=['depression', 'suicide', 'anxiety', 'mental illness', 'depressive', 'emotional problems', 'obsessive-compulsive']
racial_issues=['immigrants', 'immigration', 'african', 'asia', 'korean', 'japanese', 'chinese', 'jamaican', 'racial', 'mexican', 'latino', 'latina', 'indians', 'native america', 'black lives matter']
queer=['lgbt', 'gay', 'lesbian', 'nonbinary', 'asexual', 'transgender', 'non-binary']
def filter_list(filter_list, str, contains_val):
for word in filter_list:
if word in str.lower():
return contains_val
return not contains_val
def remove_values(string):
return filter_list(contains, string, False)
def get_queer(string):
return filter_list(queer, string, True)
def get_race(string):
return filter_list(racial_issues, string, True)
def get_mental(string):
return filter_list(mental_health, string, True)
# initializing the list
subject_list = books_df['subjects'].sum()
subject_list = list(filter(remove_values, subject_list))
# using Counter to find frequency of elements
#frequency = collections.Counter(subject_list)
queer_freq = dict(collections.Counter(filter(get_queer, subject_list.copy())))
race_freq = dict(collections.Counter(filter(get_race, subject_list.copy())))
mental_freq = dict(collections.Counter(filter(get_mental, subject_list.copy())))
</code>
</pre>
</div>
</div>
</div>
</div>
<footer-component></footer-component>
</body>
</html>