data_analysis.html

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <meta http-equiv="X-UA-Compatible" content="ie=edge">

  <meta name="copyright" content="MACode ID, https://macodeid.com/">

  <title>YA LIT FP</title>

  <link rel="stylesheet" href="stylesheets/stylesheet.css">
  <link rel="stylesheet" href="stylesheets/normalize.css">
  <link rel="stylesheet" href="stylesheets/github-light.css">

  <link rel="stylesheet" href="assets/css/maicons.css">

  <link rel="stylesheet" href="assets/css/bootstrap.css">

  <link rel="stylesheet" href="assets/vendor/animate/animate.css">

  <link rel="stylesheet" href="assets/css/theme.css">
  <script src="components/page/footer.js" type="text/javascript" defer></script>
  <script src="components/page/header.js" type="text/javascript" defer></script>

</head>

<body>
  <!-- Back to top button -->
  <div class="back-to-top"></div>
  <header-component pagename="Process"></header-component>
  <div class="page-section">
    <div class="container">
      <div class="row align-items-center">
        <div class="col py-3">
          <h2 class="title-section">Process and Works Cited</h2>
          <div class="divider"></div>

          <h2>Annotated Bibliography</h2>
          <p><b>Moore-Porter, Terri. "Diversity in Young-Adult Literature and its Impact on Self-Identity in Minority
              and
              Majority Students in the Secondary English Classroom." Order No. 27698645 The University of Findlay, 2019.
              Ann Arbor: ProQuest. Web. 9 Nov. 2022.</b></p>
              <a type="button" class="btn btn-light" href="https://etd.ohiolink.edu/apexprod/rws_olink/r/1501/10?clear=10&p10_accession_num=findlay1570618161175647">Visit Paper</a>
          <p>This article, written by Terri Moore-Porter, discusses how reading Young Adult literature impacts both
            minority and majority high school students. It discusses the danger of a single story, and how this time
            period in students' development is critical to their ideas around self-identity. It then analyzes the
            experiences of a classroom of honors seniors and how their self-identity changes as they read more diverse
            stories. This article was published by the University of Findlay. This article is particularly relevant to
            my topic, as it addresses the core topics that I hope to better understand through my research.

          </p>
          <p style="font:bold"><b>The New York Times Developer Network. Books API. 09 Nov. 2022. Raw data. The New York
              Times, New York.</b></p>

              <a type="button" class="btn btn-light" href="https://developer.nytimes.com/docs/books-product/1/routes/lists.json/get">View Webpage</a>
          <p>This database is generated by the New York Times Bestseller list, and contains information about all of the
            books that have been on the NYT Bestseller list in a variety of categories. This data is unbiased and
            presents the raw data in a JSON format. The data is published by the New York Times Developer Network and is
            available to all publicly. This data format is particularly valuable because of its ease of access,
            manipulability, and mainly because the NYT Bestseller list is seen to many as a common measure of a book's
            success. The data gathered here will provide a good insight into what are the most common reads and what is
            widely read among American readers.

          </p>

          <h2>Code Process</h2>

          <h3>Getting the Books from the NYT Database</h3>
          <pre>
            <code>
              from urllib.request import urlopen
              import requests
              import pandas as pd
              import json
              import time
              
              api_key = ""
              years= ['2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022']
              months=['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
              
              
              def map_to_isbn(list):
                  return list[0]['primary_isbn13']
              
              def map_to_title(list):
                  return list[0]['title']
              
              def map_to_author(list):
                  return list[0]['author']
              
              def map_to_name(obj):
                  return obj['name']
              
              def get_df_from_json(month, year):
                  print("{}/{}".format(month, year))
                  url = "https://api.nytimes.com/svc/books/v3/lists.json?list=young-adult-hardcover&published-date={}-{}-01&api-key={}".format(year, month, api_key)
                  response = requests.get(url)
                  time.sleep(8)
                  data_json = response.json()
                  books=data_json['results']
                  books_df= pd.DataFrame.from_records(books)
                  books_df['primary_isbn13']=books_df['book_details'].map(map_to_isbn)
                  books_df['title']=books_df['book_details'].map(map_to_title)
                  books_df['author']=books_df['book_details'].map(map_to_author)
                  books_df['year']=year
                  books_df=books_df.drop(columns=['display_name', 'isbns', 'asterisk', 'dagger', 'bestsellers_date', 'amazon_product_url', 'reviews', 'list_name', 'book_details', 'rank_last_week'])
                  return books_df
              
              def get_genres(isbn):
                  url = "https://openlibrary.org/api/books?bibkeys=ISBN:{}&jscmd=data&format=json".format(isbn)
                  response = requests.get(url)
                  time.sleep(5)
                  data_json = response.json()
                  subjects = data_json['ISBN:{}'.format(isbn)]['subjects']
                  subjects = map(map_to_name, subjects)
                  return list(subjects)
              
              books_df=pd.DataFrame()
              
              for year in years:
                  for month in months:
                      books_df = books_df.append(get_df_from_json(month, year))
            </code>
          </pre>

          <h3>Getting the subjects for each book from the OpenLibrary Database</h3>
          <pre>
            <code>
              def get_genres(isbn):
                subjects = []
                try:
                    print(isbn)
                    url = "https://openlibrary.org/api/books?bibkeys=ISBN:{}&jscmd=data&format=json".format(isbn)
                    response = requests.get(url)
                    time.sleep(8)
                    data_json = response.json()
                    subjects = data_json['ISBN:{}'.format(isbn)]['subjects']
                    subjects = map(map_to_name, subjects)
                except:
                    print('errored')
                return list(subjects)
            
            books_df['subjects'] = books_df['primary_isbn13'].map(get_genres)              
            </code>
          </pre>

          <h3>Processing the subjects to see which ones have diverse topics</h3>
          <pre>
            <code>
              import collections


              print(books_df)
              
              contains=['New York Times bestseller', 'Large type books', 'nyt:young-adult-e-book', 'YOUNG ADULT FICTION', 'nyt:young-adult-paperback', 'nyt:young-adult-hardcover', 'JUVENILE FICTION', 'collectionID', 'Young adult fiction', 'YA fiction', 'Ficción juvenil' ]
              mental_health=['depression', 'suicide', 'anxiety', 'mental illness', 'depressive', 'emotional problems', 'obsessive-compulsive']
              racial_issues=['immigrants', 'immigration', 'african', 'asia', 'korean', 'japanese', 'chinese', 'jamaican', 'racial', 'mexican', 'latino', 'latina', 'indians', 'native america', 'black lives matter']
              queer=['lgbt', 'gay', 'lesbian', 'nonbinary', 'asexual', 'transgender', 'non-binary']
              
              
              def filter_list(filter_list, str, contains_val):
                  for word in filter_list:
                      if word in str.lower():
                          return contains_val
                  return not contains_val
              
              def remove_values(string):
                  return filter_list(contains, string, False)
                  
              def get_queer(string):
                  return filter_list(queer, string, True)
              
              def get_race(string):
                  return filter_list(racial_issues, string, True)
              
              def get_mental(string):
                  return filter_list(mental_health, string, True)
              
              # initializing the list
              subject_list = books_df['subjects'].sum()
              subject_list = list(filter(remove_values, subject_list))
              
              
              # using Counter to find frequency of elements
              #frequency = collections.Counter(subject_list)
              queer_freq = dict(collections.Counter(filter(get_queer, subject_list.copy())))
              
              race_freq = dict(collections.Counter(filter(get_race, subject_list.copy())))
              
              mental_freq = dict(collections.Counter(filter(get_mental, subject_list.copy())))
              
            </code>
          </pre>

        </div>
      </div>
    </div>
  </div>


  <footer-component></footer-component>

</body>

</html>