Bird Strikes Visualization (CSC302 Project)

Successfully completed Assignment 2.
Please see the related comments in a separate Assignment 2 Discussion.
Please see Assignment 1 Postmortem here

Quick Links

Project Overview

Note: To read about how we arrived at this design, please see the sections below

Our task was to find meaningful dataset that we could visualize and draw interesting conclusions from. We decided to use a governmental dataset called FAA Wildlife Strike Database. It contains records of reported wildlife strikes since 1990.

We will be developing a web application that displays important and interesting statistics from the above-mentioned dataset. Among the questions that we are going to answer, we include but are not limited to the following:

Where are bird strikes most frequent?
How fatal are bird strikes accidents?
Safety of different types of aircrafts?
When (trends over time)?

Team

Our team consists of three members, 4th year UofT students.

Jacky Yang. Experience in UX design and full stack development with especial proficiency in React and Python.
Henning Lindig. Experience in full-stack web development and software infrastructure
Dmytro Lopushanskyy. Significant experience in Python, Data Engineering, Databases ( gres, Cassandra)

Tech Stack

Web application: React TypeScript NodeJS
Visualizations: Chart JS
Database: Postgres
AWS: Amplify RDS
Containerization: Docker

Running Application

To setup: cd ./setup sh ./setup-script.sh Add the given db connection string to a file ./backend/.env as per ./backend/.env-example

To spin up development application, in project's root run docker compose -f docker-compose.dev.yml up.

To rebuild the docker images , run docker compose -f docker-compose.dev.yml build.

To tear down, run docker compose -f docker-compose.dev.yml down --volumes.

To run tests, run docker compose -f docker-compose.dev.yml run client npm test.

Roadmap

Our roadmap currently consists of three milestones.

Milestone 1: (Done in Assignment 1)

Bootstrapped app
Functionality to dockerize and easily spin up app
Basic tests
Documentation and processes in place

Milestone 2: (Done in Assignment 2)

Data visualizations for our Top-3 questions
Deployed app
CI & CD implementation
Hosting the database and integrating it with backend

Milestone 3: (Done in Assignment 3)

Add more visulizations (at least 3 more) to show characteristics of the data set
Frontend to include relevant Twitter feed (missed feature, see Assignment 3 discussion section)
Improve CI&CD: Add code linting and style checks
Segragate the database into dev and prod instances. Allow the local db instantiation. (missed feature, see Assignment 3 discussion section)

At every stage we expect to add tests that fail on the previous stage but also succeed on the new code version with the latest functionality

Team Member's Responsibilities

Technical responsibilities:

Database, dealing with data, querying, database deployment - Dmytro
FrontEnd (Charts, visualizations) - Jacky
App deployment, Backend, Docker - Henning

Everyone is responsible for writing tests for their own features, and will be called out in PRs.

Management responsibilities:

Plan and develop the project idea
Lead the team discussion, talk to professor
Monitor project progress and set deadlines, take notes

We expect to rotate the above management roles every two weeks. Our meeting agendas will contain role assignments.

Detailed Process Documentation

Work Organization

Tasks are organized on GitHub Projects board. We have three columns: To Do, In Progress, and Done, which helps us organize the work in a Kanban way.

Meeting Structure

One large weekly meeting on Wednesday. One additional small meeting to update on progress, usually during weekends.

Application Design

For this project, we have been considering three approaches:

Data + Tableau / Datasette
Data + FrontEnd
Data + FrontEnd + BackEnd

All of these can be viable solutions, although they represent different levels of complexity.

We have discarded the first option as that one does not satisfy our assignment requirements. We are going to need to implement CI/CD, testing, etc, and it almost does not make sense to do it when you are working with ready software solutions such as Tableau or Datasette. The implementation does not require any significant coding so it did not fit our expectations as well.

Data + FrontEnd was the sweet spot that we have initially selected. Unfortunately, it did not go as planned. We quickly understood the need to query the database and implement some complex logic that we did not want to put into our FrontEnd. Moreover, application security is required. In the future phases we are going to connect to the db server, and it is not possible to store credentials reliably in a FrontEnd application.

Thus, we are going to use database + FrontEnd + BackEnd. Most of the web solutions nowadays follow this principle, and we are just going to follow the common practices.

Technology Stack Solutions

Web Application

Here are the solutions we considered for FrontEnd:

Vanilla JavaScript or JQuery, HTML/CSS. Cons: too complex to implement, no one uses this stack today to develop modern apps.
Angular. Cons: Is more suited for more complex applications, no one in the team has experience with it.
Vue vs React. This one is debatable and has not clear winner from the technical side for our app. Since all team members worked with React before, it became the winner.

We also preferred using Typescipt over JavaScript because of its language features, reference validation, project scalability, and code maintainability.

Here are the solutions we considered for BackEnd:

Python (libs as Flask, Django, FastAPI) Cons: Python is single-flow, and requests are processed quite slowly
Java vs NodeJS. Java dominates enterprise computing applications and offers top performance and security but it is more difficult to use. Team members also lack experience with it, and we did not want to focus on digging into learning Java.
NodeJS with TypeScript. Everyone has experience with it, it offers quick and easy development, great features. And there is also a huge precedent in industry for React + NodeJS apps leading to greater support.

Visualizations

For visualizations, we considered software solutions such as Datasette, Tableau and library as ChartJS.

Since we decided / were required to to write our own software and not rely on other platforms that provide too much out of the box, Datasette and Tableau did not satisfy this requirement. However, writing custom visualization in the browser from scratch would be very time consuming and overkill given the scope of our application. Thus, ChartJS was a perfect compromise to use for the FrontEnd app to reuse a nice chart builder functionality, while still maintaining a good amount of control ourselves. Since the implementation for the visualizations is set for a future milestone, we are aware that we that we may have to reassess this decision.

Databases

No DB

We have thought about bundling the CSV file with the app directly. This is not a viable solution since the app would become too large, especially given the fact that we have a 200MB+ dataset.

SQLite

Since we are building our website with scale in mind, we pushed back on using SQLite. A small excerpt from SQLite has a nice guide that we used to make a decisiom:

Is the data separated from the application by a network? → choose client/server.
Many concurrent writers? → choose client/server.
Big data? → choose client/server.
Otherwise → choose SQLite!

Our app is going to be designed to be highly scalable so SQLite is not an option.

NoSQL

NoSQL databases are great for unstructured data and Big Data workloads that require horizontal scaling, but these benefits are not very relevant for our app. We also do not forsee these items becoming issues in the future.

Relational Server DBs

Since our data is already well-structured and tabluar, a SQL-based db is a good candiadate. Combined with the ACID properties of SQL dbs, the better querying against complex and structured data, and the team members' familiarity with SQL, we decided that going this route is a great choice.

There are a number of relation DBs out there (Postgres, MySQL, Oracle, RDS, etc). For our app, there are no difference which one to use from the technical side. Since the team is most familiar with postgresql, we decide to use Postgres.

Cloud

In the future, we are going to host our apps. Both AWS and Azure and GCP offer all the resources we might need so there is no clear winner there. Since we have a free student tier on AWS, this is the cloud we are going to use.

Containers

Docker vs Vagrant: Vagrant allows you to isolate all the necessary resources completely. However, compared to Docker, it requires more resources initially. Compared to Vagrant, Docker wins on this criterion because it spends fewer resources, and you can create Docker images faster than Vagrant virtual machine

This is why we chose Docker.

Dataset Selection

During the initial meeting we decided to work with aviation datasets, since all of us had interest in it. We used the following criteria for dataset evaluation:

Contains enough data
Not overused
Correct and representative
On the topic of aviation

Below is the list of datasets we found and used the above criteria against:

https://www.kaggle.com/datasets/thedevastator/airplane-crashes-and-fatalities https://www.kaggle.com/datasets/saurograndi/airplane-crashes-since-1908 https://www.kaggle.com/code/ruslankl/airplane-crashes-data-visualization https://github.com/quankiquanki/skytrax-reviews-dataset
https://wildlife.faa.gov/home

Out of all of these, some are wildly popular, thus not very interesting to explore.
Some do not contain enough data to draw any interesting conclusions.

The only dataset that all of us liked was the last one, which we decided to use.

Assignment 2 Discussion

Next Features and Steps

Priority #1 features:

Create new visualizations of our data:
- More single number stats: total repair costs, # of dead birds, etc
- Plot of the passenger distribution on the planes
- Most vulnerable flight phase to be impacted by bird strikes
- Most affected bird species by the incidents

Priority #2 features:

Frontend to include relevant Twitter feed -- Jacky
Improve CI&CD: Add code linting and style checks -- Henning
Segragate the database into dev and prod env -- Dmytro

How those features have been prioritized?

To determine the priorities, we went back to our original goal, which is to showcase interesting features of our dataset.

Therefore, the most important feature for us is to improve and add new visualization of our data to the website.

The next features have been determined based on our votes and opinions of what could be most useful. We all believed that having a live Twitter feed on the website reporting plane crashes would be interesting to the users and serve our main purpose.

Additionally, we wanted to improve our development, that is why we decided to improve CI&CD and segregate db environment. This is meant to make our development stronger to enable more flexible, robust, and fast delivery.

What needs to be done to deliver each of these features?

To implement new visualizations, the work needs to be done on all three layers of our app: FrontEnd, BackEnd, and Database. Below is a break-down of tasks:

Work on FrontEnd code to create new charts and display the data responsively -- Jacky, due December 1.
Develop Backend API endpoints to query the data from -- Henning, due November 25.
Create database aggregations, tables, and queries to extract the stats from raw data -- Dmytro, due November 20.

Acceptance Criteria

The FrontEnd displays all 6 visualizations for the data and is viewable from all devices. Moreover, it needs to pass all tests, which cover at least 60% of the code.
The Backend has endpoints for all visualization, responds to requests, and passes all tests
The database has two environments, and contains enough data to satisfy the visualization needs

Artifacts of the decision-making process are our weeky meeting notes. Please see Meeting Notes for more details.

Demonstrated progress towards one or more of your next milestones

Here are our previous next steps, determined after Assignment 1 completion:

Database server setup, loading the data into it, query preparation - Dmytro
Chart creation in the React app, communicating with Backend to get the data - Jacky
Backend implementation to talk to the database and provide endpoints for Frontend, app deployment - Henning

How do you know that you have achieved this milestone?
All of above tasks are successfully completed based on our acceptance criteria.

Our database is now hosted on AWS RDS and is accessible and available. Queries successfully return the data
The database contains all data that we rely on as confirmed by several tests
Charts are now developed and deployed at http://bird-strike-frontend-bucket.s3-website.us-east-2.amazonaws.com/
The Frontend is responsive and passes tests
The backend talks to the database and exposes all required endpoints accessible at http://3.16.78.249:3001/

Our success in this milestone is determined by completing all steps that we planned, which can be clearly verified. Our app is now deployed and is already useful to explore the dataset.

Team Member's Responsibilities for reaching the milestone, with a status (complete, in progress, not started)

Dmytro: focused on data-related work (db setup, deployment, data transformations, aggregations, etc)
Jacky: focused on FrontEnd tasks, which included charts development and tests
Henning: worked on BackEnd, creating endpoints and did CI/CD setup

The status of the tasks can also be tracked on our GitHub Projects dashboard

Assignment 3 Discussion

What features were not delivered?

Did not implement Twitter feed in the app, due to priorities change. We decided that improving our UI is a more important next step than adding new features.
Did not segragate the database into dev and prod instances. We realized that having local database and refilling it with data every time is too complicated. Moreover, our database is read-only, so there was no point in having multiple environments for it.

What are the results?

We have successfully deployed our application, significantly improved the UI (did a complete redesign)

Validation, Verification, and Acceptance criteria

FrontEnd displays all six visualizations for the data and is responsive
Backend has endpoints for all visualization, responds to requests
Database is deployed, available and accessible with SSL enabled
Code needs to pass all unit tests, which cover at least 60% of the code.
PRs must include any manual test criterias.

Our tests directly verify how well the code conforms to the criteria.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.aws		.aws
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
db		db
docs		docs
frontend		frontend
setup		setup
.gitignore		.gitignore
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bird Strikes Visualization (CSC302 Project)

Quick Links

Project Overview

Team

Tech Stack

Running Application

Roadmap

Team Member's Responsibilities

Detailed Process Documentation

Work Organization

Meeting Structure

Application Design

Technology Stack Solutions

Web Application

Visualizations

Databases

No DB

SQLite

NoSQL

Relational Server DBs

Cloud

Containers

Dataset Selection

Assignment 2 Discussion

Next Features and Steps

Demonstrated progress towards one or more of your next milestones

Team Member's Responsibilities for reaching the milestone, with a status (complete, in progress, not started)

Assignment 3 Discussion

What features were not delivered?

What are the results?

Validation, Verification, and Acceptance criteria

About

Uh oh!

Releases

Packages

Languages

Hennmeister/bird-strike-data-visualization

Folders and files

Latest commit

History

Repository files navigation

Bird Strikes Visualization (CSC302 Project)

Quick Links

Project Overview

Team

Tech Stack

Running Application

Roadmap

Team Member's Responsibilities

Detailed Process Documentation

Work Organization

Meeting Structure

Application Design

Technology Stack Solutions

Web Application

Visualizations

Databases

No DB

SQLite

NoSQL

Relational Server DBs

Cloud

Containers

Dataset Selection

Assignment 2 Discussion

Next Features and Steps

Demonstrated progress towards one or more of your next milestones

Team Member's Responsibilities for reaching the milestone, with a status (complete, in progress, not started)

Assignment 3 Discussion

What features were not delivered?

What are the results?

Validation, Verification, and Acceptance criteria

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages