Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
4789d3a
Create main.yml
phonymacoroni Mar 15, 2021
d93d68f
add lambda function
phonymacoroni Mar 15, 2021
2f04f76
Merge branch 'main' of github.com:PhilologyBot/final into main
phonymacoroni Mar 15, 2021
ffe26de
update pipenv things
phonymacoroni Mar 15, 2021
aa37a38
update pip3
phonymacoroni Mar 15, 2021
ca4aa04
maybe this one
phonymacoroni Mar 15, 2021
c860478
S3 integration
phonymacoroni Mar 15, 2021
3bcbe1c
dot slash
phonymacoroni Mar 15, 2021
3ae1b50
s3 additional
phonymacoroni Mar 15, 2021
4314bbd
remove id
phonymacoroni Mar 15, 2021
2c66f6e
phil typo
phonymacoroni Mar 15, 2021
4a0b97e
base ngram code
phonymacoroni Mar 15, 2021
51665aa
lambda function
phonymacoroni Mar 15, 2021
f21fc7e
rename github actions
phonymacoroni Mar 15, 2021
3e315ce
index false remove
phonymacoroni Mar 15, 2021
80d892b
debug lambda
phonymacoroni Mar 15, 2021
51d7e74
updated to new ngrams
phonymacoroni Mar 15, 2021
2eaea44
remove duplicate imports
phonymacoroni Mar 15, 2021
d51e8f5
rename github action
phonymacoroni Mar 15, 2021
038f981
adding html and css
alvaradoblancouribe Mar 15, 2021
5a4b8d6
update lambda and add test call
phonymacoroni Mar 15, 2021
8002b95
fixing html and css
alvaradoblancouribe Mar 15, 2021
38f344e
adding helper function for api
alvaradoblancouribe Mar 15, 2021
1e68842
test data cucumber
phonymacoroni Mar 17, 2021
6ad9cd6
cors support
phonymacoroni Mar 17, 2021
d939e6d
add headers return
phonymacoroni Mar 17, 2021
4daac6d
add allowed headers
phonymacoroni Mar 17, 2021
342d061
global allow origin
phonymacoroni Mar 17, 2021
cdcfa11
stringify
phonymacoroni Mar 17, 2021
1461067
parse the things
phonymacoroni Mar 17, 2021
05f6c30
files
masun77 Mar 17, 2021
751bc99
files
masun77 Mar 17, 2021
9cbda5c
files
masun77 Mar 17, 2021
c94fc27
font names in legend
masun77 Mar 17, 2021
6a865c9
update headers
phonymacoroni Mar 17, 2021
a398ea5
small changes to css
alvaradoblancouribe Mar 17, 2021
dd794ab
Merge pull request #6 from PhilologyBot/colorsFonts
phonymacoroni Mar 17, 2021
f72cbe8
update workflow path
phonymacoroni Mar 17, 2021
70d7aa7
changed lambda
phonymacoroni Mar 17, 2021
d871a26
add comment
phonymacoroni Mar 17, 2021
5f08e95
comment2
phonymacoroni Mar 17, 2021
facf12f
Merge remote-tracking branch 'origin/apiStuff' into colorsFonts
masun77 Mar 17, 2021
934581e
add word jsons to main json
masun77 Mar 17, 2021
57dc630
working sets of 12
phonymacoroni Mar 17, 2021
0118b09
Merge branch 'main' into colorsFonts
phonymacoroni Mar 17, 2021
20d9b39
Merge pull request #7 from PhilologyBot/colorsFonts
phonymacoroni Mar 17, 2021
131e870
grey outline
masun77 Mar 17, 2021
9edac48
switch fonts on and off
masun77 Mar 18, 2021
87ac5aa
buttons
masun77 Mar 18, 2021
03efc46
Merge remote-tracking branch 'origin/colorsFonts' into colorsFonts
masun77 Mar 18, 2021
5f51551
cleaned a bit
masun77 Mar 18, 2021
f37d4a1
change to index
masun77 Mar 18, 2021
b19ec9f
Create sentencevsagegraph.html
Mar 18, 2021
a01f831
center legend
masun77 Mar 18, 2021
0925b7d
Merge pull request #8 from PhilologyBot/colorsFonts
masun77 Mar 18, 2021
f495be9
Merge remote-tracking branch 'origin/main' into colorsFonts
masun77 Mar 18, 2021
22ec10b
idiot fixes
Mar 18, 2021
c36c2dd
adding sentence pasing function
alvaradoblancouribe Mar 18, 2021
e3b5ba3
Merge branch 'main' into sentenceagegraph
phonymacoroni Mar 18, 2021
8e770c6
center legend
masun77 Mar 18, 2021
459ebbc
Merge remote-tracking branch 'origin/sentenceagegraph' into colorsFonts
masun77 Mar 18, 2021
c213df2
Merge remote-tracking branch 'origin/sentenceagegraph' into colorsFonts
masun77 Mar 18, 2021
9846615
update readme with API
phonymacoroni Mar 18, 2021
a22faf3
xaxis
masun77 Mar 18, 2021
87858a6
axes
masun77 Mar 18, 2021
0c1d070
Merge pull request #9 from PhilologyBot/apiREADME
phonymacoroni Mar 18, 2021
8a87249
graph
masun77 Mar 18, 2021
ff6c255
add link README
phonymacoroni Mar 18, 2021
db9dfa3
graph
masun77 Mar 18, 2021
3c9c1b1
graph
masun77 Mar 18, 2021
4b249ad
graph
masun77 Mar 18, 2021
2704967
title & hover
masun77 Mar 18, 2021
441ab4e
Merge pull request #10 from PhilologyBot/colorsFonts
phonymacoroni Mar 18, 2021
3d5277c
add ref for base python code:
phonymacoroni Mar 18, 2021
a08335c
add title
phonymacoroni Mar 18, 2021
3488787
delete unused files
masun77 Mar 18, 2021
c4b8edf
readme
masun77 Mar 18, 2021
8de2403
video
masun77 Mar 18, 2021
bb5297b
process book
phonymacoroni Mar 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# This is a basic workflow to help you get started with Actions

name: Build lambda function

# Controls when the action will run.
on:
# Triggers the workflow on push or pull request events but only for the main branch
push:
branches: [ main ]
paths:
- 'api/**'
pull_request:
branches: [ main ]
paths:
- 'api/**'

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v2

- uses: actions/setup-python@v2
with:
python-version: '3.8' # Version range or exact version of a Python version to use, using SemVer's version range syntax
architecture: 'x64' # optional x64 or x86. Defaults to x64 if not specified

# Runs a set of commands using the runners shell
- name: Setup and install packages and create zip
run: |
python -m pip install --upgrade pip
cd api
pip3 install -r requirements.txt -t .
zip -r ../package.zip .

- name: Upload file to bucket
uses: zdurham/s3-upload-github-action@master
env:
FILE: ./package.zip
AWS_REGION: 'us-east-2'
S3_BUCKET: philologybot
S3_KEY: package.zip
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
14 changes: 14 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
requests = "*"
matplotlib = "*"
pandas = "*"

[dev-packages]

[requires]
python_version = "3.8"
258 changes: 258 additions & 0 deletions Pipfile.lock

Large diffs are not rendered by default.

Binary file added Process Book.pdf
Binary file not shown.
109 changes: 8 additions & 101 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,22 @@
Final Project - Interactive Data Visualization
===

The key learning experience of this course is the final project.
You will design a web site and interactive visualizations that answer questions you have or provide an exploratory interface to some topic of your own choosing.
You will acquire the data, design your visualizations, implement them, and critically evaluate the results.

The path to a good visualization is going to involve mistakes and wrong turns.
It is therefore important to recognize that mistakes are valuable in finding the path to a solution, to broadly explore the design space, and to iterate designs to improve possible solutions.
To help you explore the design space, we will hold events such as feedback sessions in which you propose your idea and initial designs and receive feedback from the class and staff.

Proposal
---

Submit project proposals using [this Google Form](https://docs.google.com/forms/d/e/1FAIpQLSc9DFlcClPArC1RKNFsXzfJfauZA57ksU85kT0hX2OEEDlxqw/viewform?usp=sf_link).
You may submit more than one proposal.
1-3 folks per team.

Final Project Materials
---
For your final project you must hand in the following items.

### Process Book

An important part of your project is your process book. Your process book details your steps in developing your solution, including the alternative designs you tried, and the insights you got. Develop your process book out of the project proposal. Equally important to your final results is how you got there! Your process book is the place you describe and document the space of possibilities you explored at each step of your project. It is not, however, a journal or lab notebook that describes every detail - you should think carefully about the important decisions you made and insights you gained and present your reasoning in a concise way.

We strongly advise you to include many figures in your process book, including photos of your sketches of potential designs, screen shots from different visualization tools you explored, inspirations of visualizations you found online, etc. Several images illustrating changes in your design or focus over time will be far more informative than text describing those changes. Instead, use text to describe the rationale behind the evolution of your project.

Your process book should include the following topics. Depending on your project type the amount of discussion you devote to each of them will vary:

- Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
- Related Work: Anything that inspired you, such as a paper, a web site, visualizations we discussed in class, etc.
- Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?
- Data: Source, scraping method, cleanup, etc.
- Exploratory Data Analysis: What visualizations did you use to initially look at your data? What insights did you gain? How did these insights inform your design?
- Design Evolution: What are the different visualizations you considered? Justify the design decisions you made using the perceptual and design principles you learned in the course. Did you deviate from your proposal?
- Implementation: Describe the intent and functionality of the interactive visualizations you implemented. Provide clear and well-referenced images showing the key design and interaction elements.
- Evaluation: What did you learn about the data by using your visualizations? How did you answer your questions? How well does your visualization work, and how could you further improve it?

As this will be your only chance to describe your project in detail make sure that your process book is a standalone document that fully describes your results and the final design.
[Here](http://dataviscourse.net/2015/assets/process_books/bansal_cao_hou.pdf) are a [few examples](http://dataviscourse.net/2015/assets/process_books/walsh_trevino_bett.pdf) of process books from a similar course final.

### Project Website

You will create a public website for your project using GitHub pages or any other web hosting service of your choice.
The web site should contain your interactive visualization, summarize the main results of the project, and tell a story.
Consider your audience (the site should be public public) and keep the level of discussion at the appropriate level.
Your process book and data should be linked from the web site as well.
Also embed your interactive visualization and your screen-cast in your website.
If you are not able to publish your work (e.g., due to confidential data) please let us know in your project proposal.

### Project Screen-Cast

Each team will create a two minute screen-cast with narration showing a demo of your visualization and/or some slides.
You can use any screencast tool of your choice, such as Camtasia.
Please make sure that the sound quality of your video is good - it may be worthwhile to invest in an external USB microphone.
Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page.
We will show some of the best videos in class.

We will strictly enforce the two minute time limit for the video, so please make sure you are not running longer.
Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details.
What do you feel is the best part of your project?
What insights did you gain?
What is the single most important thing you would like your audience to take away? Make sure it is front and center rather than at the end.

Outside Libraries/References
---

For this project you *do not* have to write everything from scratch.

You may *reference* demo programs from books or the web, and *include* popular web libraries like Bootstrap, JQuery, Backbone, React, Meteor, etcetera.

Please *do not* use libraries on top of d3, however. Libraries like nvd3.js look tempting, but such libraries often have poor defaults and result in poor visualizations.
Instead, draw from the numerous existing d3 examples on the web.

If you use outside sources please provide a References section with links at the end of your Readme.

Resources
Description
---
The "[Data is Plural](https://tinyletter.com/data-is-plural/archive)" weekly letter often contains interesting datasets.

Think of something you're interested in, go find data on it! Include data processing as part of your work on this project.
[Link to live website](https://philologybot.github.io/final/)

Requirements
---
[Link to video](https://youtu.be/fh30fuPlaDo)

Store the following in your GitHub repository:
By using PhilologyBot you are able to see what time a word was most frequently used.

- Code - All web site files and libraries assuming they are not too big to include
- Data - Include all the data that you used in your project. If the data is too large for github store it on a cloud storage provider, such as Dropbox or Yousendit.
- Process Book- Your Process Book in PDF format.
- README - The README file must give an overview of what you are handing in: which parts are your code, which parts are libraries, and so on. The README must contain URLs to your project websites and screencast videos. The README must also explain any non-obvious features of your interface.

GitHub Details
API
---

- Fork the repo. You now have a copy associated with your username.
- Make changes to index.html to fulfill the project requirements.
- Make sure your "master" branch matches your "gh-pages" branch. See the GitHub Guides referenced above if you need help.
- Edit the README.md with a link to your gh-pages or other external site: for example http://YourUsernameGoesHere.github.io/DataVisFinal/index.html
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.
To acquire the data, we created an AWS Lambda function which can be called through an API that gets data from the Google nGram viewer. To program our API, whenever a change is made to the `lambda_function.py` in the GitHub repository, GitHub Actions creates a zip of the `api/` folder and sends that to an S3 bucket for use.

Grading
---

- Process Book - Are you following a design process that is well documented in your process book?
- Solution - Is your visualization effective in answering your intended questions? Was it designed following visualization principles?
- Implementation - What is the quality of your implementation? Is it appropriately polished, robust, and reliable?
- Presentation - Are your web site and screencast clear, engaging, and effective?
Your individual project score will also be influenced by your peer evaluations.

References
---
The data is processed in the API, which returns a JSON object containing a 201xN dictionary, where N is the number of unique words passed into the API and 201 is the number of years of data that is present. The API is only able to get data for twelve words at a time, so we break up each API call into groups of 12 words and merge them into the main data table containing years and word use. For this collection we are using a modified version of [this google ngram script](https://github.com/zslwyuan/google-ngrams) by zslwyuan.

- This final project is adapted from https://www.dataviscourse.net/2020/project/
Though this process is not as fast as we would like, we believe it is far superior to loading in the raw dataset which is multiple gigabytes in size.
20 changes: 20 additions & 0 deletions api/helper-functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import pandas as pd

# input: string of sentences/words
# output: string of unique words seperated by commas
def unique_words(passage):
result = ""
a= pd.Series(passage)
passage = a.str.split(' ')
history = []
for word in passage.array[0]:
if not(('!' in word) or (',' in word) or ('?' in word)):
if not(word in history):
result = result + word + ","
history.append(word)
else:
if not(word[:-1] in history):
result = result + word[:-1] + ","
history.append(word[:-1])
return(result)

Loading