Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
3874fe7
Add files via upload
nshedd Mar 7, 2021
3de5da8
Update CerebellarHem_embeddings.txt
nshedd Mar 8, 2021
858525a
Update FrontalCortex_embeddings.txt
nshedd Mar 8, 2021
d32010b
Reupload data
nshedd Mar 8, 2021
0681fc9
photos
nshedd Mar 8, 2021
fda99a5
Add .html with scatterplots
nshedd Mar 8, 2021
5801658
combined brain regions and embeddings/expressions into one csv
jalovering Mar 8, 2021
b96172c
Update combined.csv
nshedd Mar 8, 2021
90d1010
Update combined.csv
nshedd Mar 8, 2021
ab598a0
add first header
nshedd Mar 8, 2021
0a2bf41
removed index from data/combined.csv
jalovering Mar 8, 2021
498d11f
Merge branch 'main' of https://github.com/nshedd/final
jalovering Mar 8, 2021
448dd23
Fix expression plot colors
nshedd Mar 8, 2021
f175d8b
Merge branch 'main' of https://github.com/nshedd/final into main
nshedd Mar 8, 2021
528ada1
Add buttons and update plots
vygrasso Mar 10, 2021
3df642d
functionalized plotter for use with drop-down menus
jalovering Mar 10, 2021
f9abcae
added title header, moved dropdowns to header, changed functionality …
jalovering Mar 11, 2021
654633e
commented out resetting right plot circles upon selecting new gene
jalovering Mar 11, 2021
7140b28
charts linked upon hover over right-hand plot. gene selection updates…
jalovering Mar 11, 2021
46a1521
removed console.log
jalovering Mar 11, 2021
57cc532
increased r to 5 for hover over circle
jalovering Mar 11, 2021
847a0ee
Add cluster highlight
nshedd Mar 12, 2021
44b16d4
Fix left to right hover
nshedd Mar 13, 2021
49fa809
Right hand cluster opacity
nshedd Mar 13, 2021
e2dee24
change grey cluster color
nshedd Mar 13, 2021
b92b4ee
Create celltypedictionary.txt
nshedd Mar 13, 2021
a7818c8
Added legend with new SVG
vygrasso Mar 13, 2021
ebf6143
Merge branch 'main' into legend
vygrasso Mar 13, 2021
d8ffad8
increased opacity change for left to right interactivity
jalovering Mar 14, 2021
0a25c3c
Merge branch 'main' into legend
vygrasso Mar 15, 2021
a151ec0
merged csv and txt
jalovering Mar 15, 2021
2877360
Tooltip shows up but text is not correct
vygrasso Mar 16, 2021
2557d27
Merge branch 'main' into tooltip
vygrasso Mar 16, 2021
73defb2
fixed tooltip data call
jalovering Mar 16, 2021
9986f95
added cluster number to tooltip
jalovering Mar 16, 2021
0e75efa
fixed data null
jalovering Mar 16, 2021
080036d
moved legend to header
jalovering Mar 17, 2021
52ad2e7
Add files via upload
nshedd Mar 17, 2021
d68db47
added index page
jalovering Mar 17, 2021
142c9b1
Merge branch 'main' of https://github.com/nshedd/final
jalovering Mar 17, 2021
8bf8aca
added progress book
jalovering Mar 17, 2021
a281a9f
removed commented-out code
jalovering Mar 17, 2021
11aec2a
Update README.md
vygrasso Mar 17, 2021
4fe0b6e
added information to index.html
jalovering Mar 17, 2021
970d54f
Update README.md
vygrasso Mar 17, 2021
f140eda
Merge branch 'main' of https://github.com/nshedd/final
jalovering Mar 17, 2021
077020d
Update README.md
vygrasso Mar 17, 2021
6c83eae
Update README.md
vygrasso Mar 17, 2021
befc2b9
readme linebreaks after links
jalovering Mar 17, 2021
2641733
Merge branch 'main' of https://github.com/nshedd/final
jalovering Mar 17, 2021
eeceb64
Updated progress book
vygrasso Mar 17, 2021
788b312
Update README.md
vygrasso Mar 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
158 changes: 158 additions & 0 deletions .ipynb_checkpoints/preprocessing-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Combine region + expression/embedding CSVs"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"ch_embeddings = pd.read_csv('data/CerebellarHem_embeddings.txt')\n",
"ch_expression = pd.read_csv('data/CerebellarHem_expression.txt')\n",
"fc_embeddings = pd.read_csv('data/FrontalCortex_embeddings.txt')\n",
"fc_expression = pd.read_csv('data/FrontalCortex_expression.txt')\n",
"vc_embeddings = pd.read_csv('data/VisualCortex_embeddings.txt')\n",
"vc_expression = pd.read_csv('data/VisualCortex_expression.txt')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"ch = pd.merge(ch_embeddings, ch_expression, on=\"barcode\")\n",
"fc = pd.merge(fc_embeddings, fc_expression, on=\"barcode\")\n",
"vc = pd.merge(vc_embeddings, vc_expression, on=\"barcode\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"ch['brain_region'] = 'CerebellarHem'\n",
"fc['brain_region'] = 'FrontalCortex'\n",
"vc['brain_region'] = 'VisualCortex'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"data = pd.concat([ch,fc,vc])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Add corresponding cell names and marker genes"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\lover\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
}
],
"source": [
"df = pd.read_csv('data/celltypedictionary.txt', sep = ',\\'')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df = df.rename(columns={\"'Clusternum'\": \"celltype\", \n",
" \"brainregion'\": \"brain_region\", \n",
" \"cellname'\":'cell_name',\n",
" \"markergenes'\":'marker_genes'})"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"for i, r in df.iterrows():\n",
" df.loc[i, 'brain_region'] = r['brain_region'].strip('\\'')\n",
" df.loc[i, 'cell_name'] = r['cell_name'].strip(',').strip('\\'')\n",
" if r['marker_genes'] != None:\n",
" df.at[i, 'marker_genes'] = r['marker_genes'].strip('\\'').split(',')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"result = pd.merge(data, df, how=\"outer\", on=[\"brain_region\", \"celltype\"])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"result.to_csv(\"data/combined.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
125 changes: 22 additions & 103 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,34 @@
Final Project - Interactive Data Visualization
Final Project - Visualizing Gene Expression in the Brain
===
Victoria Grasso, Josh Lovering, Nicole Shedd
-

The key learning experience of this course is the final project.
You will design a web site and interactive visualizations that answer questions you have or provide an exploratory interface to some topic of your own choosing.
You will acquire the data, design your visualizations, implement them, and critically evaluate the results.
Link to project website: https://nshedd.github.io/final/index

The path to a good visualization is going to involve mistakes and wrong turns.
It is therefore important to recognize that mistakes are valuable in finding the path to a solution, to broadly explore the design space, and to iterate designs to improve possible solutions.
To help you explore the design space, we will hold events such as feedback sessions in which you propose your idea and initial designs and receive feedback from the class and staff.
Direct link to screencast video on YouTube: https://www.youtube.com/watch?v=_Dpsas_n7SE

Proposal
---
All the necessary files for this project are in this repository:
- index.html and final.html are the code files; index.html has the embedded video displayed with a link to final.html which has the visualization
- The data is found in the /data folder in the combined.csv file
- The process book is saved as final_progress_book.pdf

Submit project proposals using [this Google Form](https://docs.google.com/forms/d/e/1FAIpQLSc9DFlcClPArC1RKNFsXzfJfauZA57ksU85kT0hX2OEEDlxqw/viewform?usp=sf_link).
You may submit more than one proposal.
1-3 folks per team.
The goal of this project is to develop a gene expression analysis tool used to compare brain regions and gene expression. This tool will be used by bioinformaticians working in transcriptomics to analyze gene expression in the frontal cortex, visual cortex, and cerebellum. It was built from a healthy dataset containing 34,899 cells to serve as a control sample. We used JavaScript D3 to create interactive scatter plots with multiple views for each brain region and gene type.

Final Project Materials
Technical Achievements
---
For your final project you must hand in the following items.

### Process Book

An important part of your project is your process book. Your process book details your steps in developing your solution, including the alternative designs you tried, and the insights you got. Develop your process book out of the project proposal. Equally important to your final results is how you got there! Your process book is the place you describe and document the space of possibilities you explored at each step of your project. It is not, however, a journal or lab notebook that describes every detail - you should think carefully about the important decisions you made and insights you gained and present your reasoning in a concise way.

We strongly advise you to include many figures in your process book, including photos of your sketches of potential designs, screen shots from different visualization tools you explored, inspirations of visualizations you found online, etc. Several images illustrating changes in your design or focus over time will be far more informative than text describing those changes. Instead, use text to describe the rationale behind the evolution of your project.

Your process book should include the following topics. Depending on your project type the amount of discussion you devote to each of them will vary:

- Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
- Related Work: Anything that inspired you, such as a paper, a web site, visualizations we discussed in class, etc.
- Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?
- Data: Source, scraping method, cleanup, etc.
- Exploratory Data Analysis: What visualizations did you use to initially look at your data? What insights did you gain? How did these insights inform your design?
- Design Evolution: What are the different visualizations you considered? Justify the design decisions you made using the perceptual and design principles you learned in the course. Did you deviate from your proposal?
- Implementation: Describe the intent and functionality of the interactive visualizations you implemented. Provide clear and well-referenced images showing the key design and interaction elements.
- Evaluation: What did you learn about the data by using your visualizations? How did you answer your questions? How well does your visualization work, and how could you further improve it?

As this will be your only chance to describe your project in detail make sure that your process book is a standalone document that fully describes your results and the final design.
[Here](http://dataviscourse.net/2015/assets/process_books/bansal_cao_hou.pdf) are a [few examples](http://dataviscourse.net/2015/assets/process_books/walsh_trevino_bett.pdf) of process books from a similar course final.

### Project Website
- Drop down buttons to change plots
- Using a tooltip to display information about clusters, cell types, and gene markers

You will create a public website for your project using GitHub pages or any other web hosting service of your choice.
The web site should contain your interactive visualization, summarize the main results of the project, and tell a story.
Consider your audience (the site should be public public) and keep the level of discussion at the appropriate level.
Your process book and data should be linked from the web site as well.
Also embed your interactive visualization and your screen-cast in your website.
If you are not able to publish your work (e.g., due to confidential data) please let us know in your project proposal.

### Project Screen-Cast

Each team will create a two minute screen-cast with narration showing a demo of your visualization and/or some slides.
You can use any screencast tool of your choice, such as Camtasia.
Please make sure that the sound quality of your video is good - it may be worthwhile to invest in an external USB microphone.
Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page.
We will show some of the best videos in class.

We will strictly enforce the two minute time limit for the video, so please make sure you are not running longer.
Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details.
What do you feel is the best part of your project?
What insights did you gain?
What is the single most important thing you would like your audience to take away? Make sure it is front and center rather than at the end.

Outside Libraries/References
Design Achievements
---

For this project you *do not* have to write everything from scratch.

You may *reference* demo programs from books or the web, and *include* popular web libraries like Bootstrap, JQuery, Backbone, React, Meteor, etcetera.

Please *do not* use libraries on top of d3, however. Libraries like nvd3.js look tempting, but such libraries often have poor defaults and result in poor visualizations.
Instead, draw from the numerous existing d3 examples on the web.

If you use outside sources please provide a References section with links at the end of your Readme.
- Perceptually different colors in the color scheme
- Including a legend for users to understand the difference in color in the gene expression plot

Resources
---
The "[Data is Plural](https://tinyletter.com/data-is-plural/archive)" weekly letter often contains interesting datasets.

Think of something you're interested in, go find data on it! Include data processing as part of your work on this project.

Requirements
---

Store the following in your GitHub repository:

- Code - All web site files and libraries assuming they are not too big to include
- Data - Include all the data that you used in your project. If the data is too large for github store it on a cloud storage provider, such as Dropbox or Yousendit.
- Process Book- Your Process Book in PDF format.
- README - The README file must give an overview of what you are handing in: which parts are your code, which parts are libraries, and so on. The README must contain URLs to your project websites and screencast videos. The README must also explain any non-obvious features of your interface.

GitHub Details
---

- Fork the repo. You now have a copy associated with your username.
- Make changes to index.html to fulfill the project requirements.
- Make sure your "master" branch matches your "gh-pages" branch. See the GitHub Guides referenced above if you need help.
- Edit the README.md with a link to your gh-pages or other external site: for example http://YourUsernameGoesHere.github.io/DataVisFinal/index.html
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.

Grading
---

- Process Book - Are you following a design process that is well documented in your process book?
- Solution - Is your visualization effective in answering your intended questions? Was it designed following visualization principles?
- Implementation - What is the quality of your implementation? Is it appropriately polished, robust, and reliable?
- Presentation - Are your web site and screencast clear, engaging, and effective?
Your individual project score will also be influenced by your peer evaluations.

References
---

- This final project is adapted from https://www.dataviscourse.net/2020/project/
https://www.d3-graph-gallery.com/graph/line_select.html <br>
https://stackoverflow.com/questions/22452112/nvd3-clear-svg-before-loading-new-chart <br>
https://bl.ocks.org/d3noob/a22c42db65eb00d4e369 <br>
https://www.d3-graph-gallery.com/graph/scatter_tooltip.html <br>
https://stackoverflow.com/questions/39023154/how-to-make-a-color-gradient-bar-using-d3js <br>
https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform <br>
Loading