Final Project Reports (Due Dec 9)

Similar to proposals, but note additional sections:

Objective (research question)
Data that was used: how obtained, how processed, integrated, and validated
What models or algorithms were used
Results: A description of the results
Primary issues encountered during the project
Future work: ideas generated, improvements that would make sense, etc
Org chart: rough timeline and responsibilities for each member

Dec 3: Full (<7min presentation + questions)

Tetris Assistant: Sage Gray, Natalie Harris, Cam Witt, Jackson Burns
DataMiningtheIPL: Dasari Sai Deepika, Shivaji Moparthy, Yashraj Gaikwad
body-building-builder: Ryan Franqui, Anthony Roman, Skye Nidiffer
Countries in the News: Faith Chernowski, Harshvardhan, Colin Canonaco, Jeremiah Augustine
Heart Disease Detection: Mikayla McCormack, Chase Woodfill, Tully Fitzpatrick, Jayden Leuciuc, Devanshi Patel
SportsStats: Ghanshyam Patel, John Kutbay, Eric Zhao, Dorothy Wang, Malika Arifova
Flickpedia: Shashank Bandaru, Jason Choi, Ann McClure, Isha Bhandari, Nayana Patil, Pooja Masani, Justin Henley
cs_job_trends: Casey Stefanick, Brien Tolson, Dylan Delrosa, Brycen Hodges, Kush Patel
NFL Predictor: Alex Fowler, Joseph Taylor, Nolan Patton
Knox Crime Predictions: Noah Van Fleet and Connor Cotturone
Impacts of Remote Work on Physical and Mental Health : Shayana Shrestha
Impacts from Major Events on U.S Vessel Traffic: Peyton Moore and Andrew Mueller

Nov 26: Full (<8min presentation + questions)

Song Keys: Ryan Peruski, Michael Villareal, Maria Hernandez, Jonathan Tran
Recipe Ingredients Analysis: Diego Ferrer, Seungwoo An, Jackson Muncy
App Profile: Kien Nguyen, Nolan Coffey, Tyler Catuncan, William Duff
Generateed-Research: Jake Marlow, Thanya Nguyen, Sam Lavey, Jeff Chen
Sales Analysis: Farzin Gholamrezae
Car Compare: Gabe Lapham, Jonathan Clark, Faithful Odoi, Rob King
DDoS Detection: William Winslade, Logan Scott, Katie Moffit, Ethan Head, Md Saif Hassan Onim, Anna Weis
medical school inequity analysis: Brent Maples, Max Marcum, and Randy Lin

Nov 21 (<8min presentation + questions)

Fantasy Predictions: Fort Hunter, Vincent Broda, Austin Smith, Dillon Frankenstein
Tune Tweakers: Alex Warden, Margaret Kelley, Ethan Maness, Yaren Dogan
Price Alert: Griffin Lee, Braden Hechmer, Eli Dayney, Matthew Webb
Auto Price Analyst: Caleb Kornegay, Aaron King, Cody Allen, and Chase Walsh
twitter-sentiment-analysis: Venkat Gopu
StockBot: Blake Milstead and Turner Heath
Closing remarks on the course

Still missing/with errors MP4

https://huggingface.co/datasets/fdac24/MP4/blob/main/README.md

Still missing/with errors MP3

https://huggingface.co/datasets/fdac24/MP3/blob/main/README.md
Please use this notebook to check if your MP3-4 are correct

Class on Nov 19

Introducing MP5: Due Dec 3
Bugs in MP3/4
Vector databases
Logistic regression

Class on Nov 14

Introducing MP5
Bugs in MP3/4

Class on Nov 12

MP4 is now finalized
MP4 is due on Nov 19
If github complains about the size of output from MP3 or LFS charges, please put the MP3/output/netid.json.gz and your python file on HGF dataset MP3 then delete your fork

Class on Nov 7

MP4 will be described
If github complains about the size of output from MP3, please put the MP3/output/netid.json.gz file your python file on on HGF dataset MP3

Class on Nov 5

officially election day with no classes
MP3, as a result, is due on Nov 6

Class on Nov 3

work on finala project
work on MP3 (due on Nov 6)

Class on Oct 30

Finish status reports for the final projects (that have not reported on Oct 28)
MP3 - followup questions
Assignmemnt weights in the overal score
- Prelim - 1
- MP1 - 8
- MP2 - 8
- MP3 - 12
- MP4 - 8
- MP5 - 8
- Class participation - 10
- Final Project - 45
  - Proposal - 15
  - Presentation - 10
  - Final Report - 20

Class on Oct 28

MP3 (scraping and parsing assignment)

Class on Oct 22, 24

Work on final projects

Class on Oct 17

Engineering day (no class)

Class on Oct 15

Discuss MP2

Class on Oct 10

Work on Final Project (no lecture)
MP2 is due

Class on Oct 8 - Spring Break

Class on Oct 3

Introduced MP2
Finish FP Proposal, see instructions

Class on Oct 1

Work on class project proposal

Class on Sep 26

Work on class project proposal

Class on Sep 24

Finish/Questions on project proposal

Class on Sep 19

Data discovery
Data storage
Cloud computing

Class on Sep 17

Presenting MP1 results by the representatives of each group for entire class
- the presentations will go in group order (the representative from the first group, the second group...)
- G1 - rking61
- G2 - jaugust4
- G3 - slavey
- G4 - Zephrius
- G5 - edayney
- G6 - bmaples6
- G7 - dpate139
- G8 - mherna21
- G9 - aroman01
- G10 -
- G11 - btolson1
- G12 -
- G13 - mmccor23
- G14 - tcatunca
- G15 - rfranqui
- G16 - bfitzpa8/tullyfitz
- G17 - jhenley9
start work on project proposal

Class on Sep 12

Presenting MP1 results in the assigned 17 groups (see below)
if you can can not attend disuss options on discord

Class on Sep 10

Still 3 MP1 forks missing!
Class(final) project boasters
Finalize class project assignments
Continue work on on MP1

Class on Sep 5

Work on MP1, including discussing with your assigned peer
Make sure you have
1. Forked fdac23/Miniproject1
2. Posted the idea for your analysis on your peer's fork
3. Responded to the idea that was posted by your peer

Class on Sep 3

Question regarding MP1
Boasters for class project
World of Code dataset

17 Groups for MP1

harshvar mzg857 rking61 vgopu cwoodfil jleuciu1
aking100 amcclu13 ckornega dhodge12 jaugust4
san6 cvy221 snidiff1 sdasari7 slavey sshres25
awarden9 jburns46 ehechmer glee30 zyr546 amarlow6
edayney cstefani npatton4 ccotturo jchoi38 dmoffit1
lscott32 bmaples6 cwalsh25 mdv623 calle102 sgray38
pmasani tvillarr dpate139 nvanflee thatngu1 aweis3
mherna21 smoparth monim mwebb51 wwinslad
ygaikwad dferrer1 aroman vbroda bmilstea emaness
kchmayss ibhandar jtayl219 glapham mkelle37
ccanonac fgholamr pmoore34 btolson1 jmuncy2 cwitt8
pkx959 spatil12 gpatel8 jkutbay marifova
mmccor23 dfranke2 ddelrosa ezhao1 yarddoga
alecfowl ncoffey3 mmarcu10 tcatunca wduff
knguye34 rlin8 rfranqui yhg461 dwang58 fchernow
jnd547 bfitzpa8 lhunte21 jclar166 hchen73
jhenley9 ehead3 sbandar1 amuell11 jkenne60

Class on Aug 29

See the simple text analysis of your descriptions
Introducing the MiniProject1 process and template
Think about selecting the course project (see course projects for the last eight years at fdac2[0-3], fdac1[6-9], fdac for inspiration)
Boasters for class project (if you have an idea for the class project, please commit to fdac24/FinalProjectPitches)

Class on Aug 27

Work on fdac24/Practice0: due before class on Sep 7
- It involves
  - forking
  - completing notebook in your browser
  - creating pull request from your fork
- If you need a refresher on unix tools: edX on unix for data science
Critical Tools
Version Control
Magic of Internet

Class on Aug 22

Only 72 (of 94 registered) have created forks so far: see instructions for the previous class
Please accept your invitation to fdac24 organization while logged in to GH via handle you used to submit pull request
If you have not done so yet, please accept github fdac24 invitation
Introductory lecture
Critical Tools
Version Control

Class on Aug 20

Create a HuggingFace account at https://huggingface.co/
- search for organisations by name (fdac24) in the search bar on the hub. the name will appear under the “Organizations” section
- request to join fdac24 by clicking on the button shown in the screenshot below or
- here
Create your github account
- fork repo students
- create your utid.md file providing your name and interests and what you want to get out of the course (at least a full paragraph, see example): see per fdac24/students/README.md, and
- include your hugging face id like this on a separate line: hfh: Audris
- include your github id like this on a separate line: gh: audrism
- [upload your your public ssh key to your account on github](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account}. Once done, please
- submit a pull request to fdac24/students
Make sure you do it a day before the next class so we can start ready

Information for remote participation via Zoom / Discord

Recorded lectures
Join from a PC, Mac, iPad, iPhone or Android device: Please click this URL to start or join. https://tennessee.zoom.us/j/2766448345 Or, go to https://tennessee.zoom.us/join and enter class session/meeting ID: 276 644 8345
Join from dial-in phone line: (Note: these are NOT toll-free numbers); Dial: +1 646 558 8656 or +1 408 638 0968 Meeting ID: 276 644 8345; Participant ID: Shown after joining the meeting; International numbers available: https://tennessee.zoom.us/zoomconference?m=leg4C6yjhpfGHE-_Q9EYRNHXCUMBC-2T
Join the Discord server from this link

Syllabus for "Fundamentals of Digital Archeology"

Course: [COSCS-445/COSCS-545]
** Zoom link above ** and in MK524
** TTh 09:45-11:00
Instructors: Audris Mockus, [email protected] (office hours - upon request)
TAs: Oktay Ozturk [email protected]
- ** Syllabus **
Need help?

Simple rules:

There are no stupid questions. However, it may be worth going over the following steps:
Think of what the right answer may be.
Search online: stack overflow, etc.
- code snippets: On GH gist.github.com or, if anyone contributes, for this class
- answers to questions: Stack Overflow
Look through issues
Post the question as an issue.
Ask instructor: email for 1-on-1 help, or to set up a time to meet

Objectives

The course will combine theoretical underpinning of big data with intense practice. In particular, approaches to ethical concerns, reproducibility of the results, absence of context, missing data, and incorrect data will be both discussed and practiced by writing programs to discover the data in the cloud, to retrieve it by scraping the deep web, and by structuring, storing, and sampling it in a way suitable for subsequent decision making. At the end of the course students will be able to discover, collect, and clean digital traces, to use such traces to construct meaningful measures, and to create tools that help with decision making.

Expected Outcomes

Upon completion, students will be able to discover, gather, and analyze digital traces, will learn how to avoid mistakes common in the analysis of low-quality data, and will have produced a working analytics application.

In particular, in addition to practicing critical thinking, students will acquire the following skills:

Use Python and other tools to discover, retrieve, and process data.
Use data management techniques to store data locally and in the cloud.
Use data analysis methods to explore data and to make predictions.

Course Description

A great volume of complex data is generated as a result of human activities, including both work and play. To exploit that data for decision making it is necessary to create software that discovers, collects, and integrates the data.

Digital archeology relies on traces that are left over in the course of ordinary activities, for example the logs generated by sensors in mobile phones, the commits in version control systems, or the email sent and the documents edited by a knowledge worker. Understanding such traces is complicated in contrast to data collected using traditional measurement approaches.

Traditional approaches rely on a highly controlled and well-designed measurement system. In meteorology, for example, the temperature is taken in specially designed and carefully selected locations to avoid direct sunlight and to be at a fixed distance from the ground. Such measurement can then be trusted to represent these controlled conditions and the analysis of such data is, consequently, fairly straightforward.

The measurements from geolocation or other sensors in mobile phones are affected by numerous (yet not recorded) factors: was the phone kept in the pocket, was it indoors or outside? The devices are not calibrated or may not work properly, so the corresponding measurements would be inaccurate. Locations (without mobile phones) may not have any measurement, yet may be of the greatest interest. This lack of context and inaccurate or missing data necessitates fundamentally new approaches that rely on patterns of behavior to correct the data, to fill in missing observations, and to elucidate unrecorded context factors. These steps are needed to obtain meaningful results from a subsequent analysis.

The course will cover basic principles and effective practices to increase the integrity of the results obtained from voluminous but highly unreliable sources.

Ethics: legal aspects, privacy, confidentiality, governance
Reproducibility: version control, ipython notebook
Fundamentals of big data analysis: extreme distributions, transformations, quantiles, sampling strategies, and logistic regression
The nature of digital traces: lack of context, missing values, and incorrect data

Prerequisites

Students are expected to have basic programming skills, in particular, be able to use regular expressions, programming concepts such as variables, functions, loops, and data structures like lists and dictionaries (for example, COSC 365)

Being familiar with version control systems (e.g., COSC 340), Python (e.g., COSC 370), and introductory level probability (e.g., ECE 313) and statistics, such as, random variables, distributions and regression would be beneficial but is not expected. Everyone is expected, however, to be willing and highly motivated to catch up in the areas where they have gaps in the relevant skills.

All the assignments and projects for this class will use github and Python. Knowledge of Python is not a prerequisite for this course, provided you are comfortable learning on your own as needed. While we have strived to make the programming component of this course straightforward, we will not devote much time to teaching programming, Python syntax, or any of the libraries and APIs. You should feel comfortable with:

How to look up Python syntax on Google and StackOverflow.
Basic programming concepts like functions, loops, arrays, dictionaries, strings, and if statements.
How to learn new libraries by reading documentation and reusing examples
Asking questions on StackOverflow or as a GitHub issue.

Requirements

These apply to real life, as well.

Must apply "good programming style" learned in class
- Optimize for readability
Bonus points for:
- Creativity (as long as requirements are fulfilled)

Teaming Tips

Agree on an editor and environment that you're comfortable with
The person who's less experienced/comfortable should have more keyboard time
Switch who's "driving" regularly
Make sure to save the code and send it to others on the team

Evaluation

Class Participation – 15%: students are expected to read all material covered in a week and come to class prepared to take part in the classroom discussions (online). Asking and responding to other student questions (issues) counts as a key factor for classroom participation. With online format and collaborative nature of the projects, this should not be hard to accomplish.
Assignments - 40%: Each assignment will involve writing (or modifying a template of) a small Python program.
Project - 45%: one original project done alone or in a group of 2 or 3 students. The project will explore one or more of the themes covered in the course that students find particularly compelling. The group needs to submit a project proposal (2 pages IEEE format) approximately 1.5 months before the end of term. The proposal should provide a brief motivation of the project, detailed discussion of the data that will be obtained or used in the project, along with a time-line of milestones, and expected outcome.
Scale

letter	percent
a	95
a-	93
b+	90
b	88
b-	85
c+	83
c	79
c-	75

Other considerations

As a programmer you will never write anything from scratch, but will reuse code, frameworks, or ideas. You are encouraged to learn from the work of your peers. However, if you don't try to do it yourself, you will not learn. deliberate-practice (activities designed for the sole purpose of effectively improving specific aspects of an individual's performance) is the only way to reach perfection.

Please respect the terms of use and/or license of any code you find, and if you re-implement or duplicate an algorithm or code from elsewhere, credit the original source with an inline comment.

Resources

Materials

This class assumes you are confident with this material, but in case you need a brush-up...

Python for beginners and Python Dictionaries

Other

Mining the Social Web, 2nd Edition

Databases

A MongoDB Schema Analyzer. One JavaScript file that you run with the mongo shell command on a database collection and it attempts to come up with a generalized schema of the datastore. It was also written about on the official MongoDB blog.

R and data analysis

Modern Applied Statistics with S (4th Edition) by William N. Venables, Brian D. Ripley. ISBN0387954570
R
Code School
Quick-R

Tutorials written as ipython-notebooks

GitHub

Git and GitHub
GitHub Pages
- Official site
- Thinkful guide

Final Project Report outline

Similar to proposals, but note additional sections:

Objective (research question)
Data that was used: how obtained, how processed, integrated, and validated
What models or algorithms were used
Results: A description of the results
Primary issues encountered during the project
Future work: ideas generated, improvements that would make sense, etc
Org chart: rough timeline and responsibilities for each member

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Preliminary.md		Preliminary.md
PuttyGen.png		PuttyGen.png
README.md		README.md
course.pdf		course.pdf
puttyauth.png		puttyauth.png
puttykey.png		puttykey.png
puttyport.png		puttyport.png
puttysession.png		puttysession.png
winscp.png		winscp.png

fdac24/news

Folders and files

Latest commit

History

Repository files navigation

Final Project Reports (Due Dec 9)

Dec 3: Full (<7min presentation + questions)

Nov 26: Full (<8min presentation + questions)

Nov 21 (<8min presentation + questions)

Still missing/with errors MP4

Still missing/with errors MP3

Class on Nov 19

Class on Nov 14

Class on Nov 12

Class on Nov 7

Class on Nov 5

Class on Nov 3

Class on Oct 30

Class on Oct 28

Class on Oct 22, 24

Class on Oct 17

Class on Oct 15

Class on Oct 10

Class on Oct 8 - Spring Break

Class on Oct 3

Class on Oct 1

Class on Sep 26

Class on Sep 24

Class on Sep 19

Class on Sep 17

Class on Sep 12

Class on Sep 10

Class on Sep 5

Class on Sep 3

17 Groups for MP1

Class on Aug 29

Class on Aug 27

Class on Aug 22

Class on Aug 20

Information for remote participation via Zoom / Discord

Syllabus for "Fundamentals of Digital Archeology"

Objectives

Expected Outcomes

Course Description

Prerequisites

Requirements

Teaming Tips

Evaluation

Other considerations

Resources

Materials

Other

Databases

R and data analysis

Tutorials written as ipython-notebooks

GitHub

Final Project Report outline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages