cs480x-21c · jalovering · Mar 7, 2021 · Mar 8, 2021 · Mar 8, 2021 · Mar 8, 2021
diff --git a/.DS_Store b/.DS_Store
diff --git a/.ipynb_checkpoints/preprocessing-checkpoint.ipynb b/.ipynb_checkpoints/preprocessing-checkpoint.ipynb
@@ -0,0 +1,158 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Combine region + expression/embedding CSVs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ch_embeddings = pd.read_csv('data/CerebellarHem_embeddings.txt')\n",
+    "ch_expression = pd.read_csv('data/CerebellarHem_expression.txt')\n",
+    "fc_embeddings = pd.read_csv('data/FrontalCortex_embeddings.txt')\n",
+    "fc_expression = pd.read_csv('data/FrontalCortex_expression.txt')\n",
+    "vc_embeddings = pd.read_csv('data/VisualCortex_embeddings.txt')\n",
+    "vc_expression = pd.read_csv('data/VisualCortex_expression.txt')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ch = pd.merge(ch_embeddings, ch_expression, on=\"barcode\")\n",
+    "fc = pd.merge(fc_embeddings, fc_expression, on=\"barcode\")\n",
+    "vc = pd.merge(vc_embeddings, vc_expression, on=\"barcode\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ch['brain_region'] = 'CerebellarHem'\n",
+    "fc['brain_region'] = 'FrontalCortex'\n",
+    "vc['brain_region'] = 'VisualCortex'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = pd.concat([ch,fc,vc])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Add corresponding cell names and marker genes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\lover\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.\n",
+      "  \"\"\"Entry point for launching an IPython kernel.\n"
+     ]
+    }
+   ],
+   "source": [
+    "df = pd.read_csv('data/celltypedictionary.txt', sep = ',\\'')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = df.rename(columns={\"'Clusternum'\": \"celltype\", \n",
+    "                        \"brainregion'\": \"brain_region\", \n",
+    "                        \"cellname'\":'cell_name',\n",
+    "                        \"markergenes'\":'marker_genes'})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "for i, r in df.iterrows():\n",
+    "    df.loc[i, 'brain_region'] = r['brain_region'].strip('\\'')\n",
+    "    df.loc[i, 'cell_name'] = r['cell_name'].strip(',').strip('\\'')\n",
+    "    if r['marker_genes'] != None:\n",
+    "        df.at[i, 'marker_genes'] = r['marker_genes'].strip('\\'').split(',')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = pd.merge(data, df, how=\"outer\", on=[\"brain_region\", \"celltype\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "result.to_csv(\"data/combined.csv\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/README.md b/README.md
@@ -1,115 +1,34 @@
-Final Project - Interactive Data Visualization  
+Final Project - Visualizing Gene Expression in the Brain  
 ===
+Victoria Grasso, Josh Lovering, Nicole Shedd
+-
 
-The key learning experience of this course is the final project. 
-You will design a web site and interactive visualizations that answer questions you have or provide an exploratory interface to some topic of your own choosing. 
-You will acquire the data, design your visualizations, implement them, and critically evaluate the results. 
+Link to project website: https://nshedd.github.io/final/index
 
-The path to a good visualization is going to involve mistakes and wrong turns. 
-It is therefore important to recognize that mistakes are valuable in finding the path to a solution, to broadly explore the design space, and to iterate designs to improve possible solutions. 
-To help you explore the design space, we will hold events such as feedback sessions in which you propose your idea and initial designs and receive feedback from the class and staff.
+Direct link to screencast video on YouTube: https://www.youtube.com/watch?v=_Dpsas_n7SE
 
-Proposal
----
+All the necessary files for this project are in this repository:
+- index.html and final.html are the code files; index.html has the embedded video displayed with a link to final.html which has the visualization
+- The data is found in the /data folder in the combined.csv file
+- The process book is saved as final_progress_book.pdf
 
-Submit project proposals using [this Google Form](https://docs.google.com/forms/d/e/1FAIpQLSc9DFlcClPArC1RKNFsXzfJfauZA57ksU85kT0hX2OEEDlxqw/viewform?usp=sf_link).
-You may submit more than one proposal.
-1-3 folks per team.
+The goal of this project is to develop a gene expression analysis tool used to compare brain regions and gene expression. This tool will be used by bioinformaticians working in transcriptomics to analyze gene expression in the frontal cortex, visual cortex, and cerebellum. It was built from a healthy dataset containing 34,899 cells to serve as a control sample. We used JavaScript D3 to create interactive scatter plots with multiple views for each brain region and gene type. 
 
-Final Project Materials
+Technical Achievements
 ---
-For your final project you must hand in the following items.
-
-### Process Book
-
-An important part of your project is your process book. Your process book details your steps in developing your solution, including the alternative designs you tried, and the insights you got. Develop your process book out of the project proposal. Equally important to your final results is how you got there! Your process book is the place you describe and document the space of possibilities you explored at each step of your project. It is not, however, a journal or lab notebook that describes every detail - you should think carefully about the important decisions you made and insights you gained and present your reasoning in a concise way.
-
-We strongly advise you to include many figures in your process book, including photos of your sketches of potential designs, screen shots from different visualization tools you explored, inspirations of visualizations you found online, etc. Several images illustrating changes in your design or focus over time will be far more informative than text describing those changes. Instead, use text to describe the rationale behind the evolution of your project.
-
-Your process book should include the following topics. Depending on your project type the amount of discussion you devote to each of them will vary:
-
-- Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
-- Related Work: Anything that inspired you, such as a paper, a web site, visualizations we discussed in class, etc.
-- Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?
-- Data: Source, scraping method, cleanup, etc.
-- Exploratory Data Analysis: What visualizations did you use to initially look at your data? What insights did you gain? How did these insights inform your design?
-- Design Evolution: What are the different visualizations you considered? Justify the design decisions you made using the perceptual and design principles you learned in the course. Did you deviate from your proposal?
-- Implementation: Describe the intent and functionality of the interactive visualizations you implemented. Provide clear and well-referenced images showing the key design and interaction elements.
-- Evaluation: What did you learn about the data by using your visualizations? How did you answer your questions? How well does your visualization work, and how could you further improve it?
-
-As this will be your only chance to describe your project in detail make sure that your process book is a standalone document that fully describes your results and the final design. 
-[Here](http://dataviscourse.net/2015/assets/process_books/bansal_cao_hou.pdf) are a [few examples](http://dataviscourse.net/2015/assets/process_books/walsh_trevino_bett.pdf) of process books from a similar course final.
-
-### Project Website
+- Drop down buttons to change plots 
+- Using a tooltip to display information about clusters, cell types, and gene markers
 
-You will create a public website for your project using GitHub pages or any other web hosting service of your choice. 
-The web site should contain your interactive visualization, summarize the main results of the project, and tell a story. 
-Consider your audience (the site should be public public) and keep the level of discussion at the appropriate level. 
-Your process book and data should be linked from the web site as well. 
-Also embed your interactive visualization and your screen-cast in your website. 
-If you are not able to publish your work (e.g., due to confidential data) please let us know in your project proposal.
-
-### Project Screen-Cast
-
-Each team will create a two minute screen-cast with narration showing a demo of your visualization and/or some slides. 
-You can use any screencast tool of your choice, such as Camtasia. 
-Please make sure that the sound quality of your video is good - it may be worthwhile to invest in an external USB microphone. 
-Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page. 
-We will show some of the best videos in class.
-
-We will strictly enforce the two minute time limit for the video, so please make sure you are not running longer. 
-Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details. 
-What do you feel is the best part of your project? 
-What insights did you gain? 
-What is the single most important thing you would like your audience to take away? Make sure it is front and center rather than at the end.
-
-Outside Libraries/References
+Design Achievements
 ---
-
-For this project you *do not* have to write everything from scratch.
-
-You may *reference* demo programs from books or the web, and *include* popular web libraries like Bootstrap, JQuery, Backbone, React, Meteor, etcetera. 
-
-Please *do not* use libraries on top of d3, however. Libraries like nvd3.js look tempting, but such libraries often have poor defaults and result in poor visualizations.
-Instead, draw from the numerous existing d3 examples on the web.
-
-If you use outside sources please provide a References section with links at the end of your Readme.
+- Perceptually different colors in the color scheme
+- Including a legend for users to understand the difference in color in the gene expression plot
 
 Resources
 ---
-The "[Data is Plural](https://tinyletter.com/data-is-plural/archive)" weekly letter often contains interesting datasets.
-
-Think of something you're interested in, go find data on it! Include data processing as part of your work on this project.
-
-Requirements
----
-
-Store the following in your GitHub repository:
-
-- Code - All web site files and libraries assuming they are not too big to include
-- Data - Include all the data that you used in your project. If the data is too large for github store it on a cloud storage provider, such as Dropbox or Yousendit.
-- Process Book- Your Process Book in PDF format.
-- README - The README file must give an overview of what you are handing in: which parts are your code, which parts are libraries, and so on. The README must contain URLs to your project websites and screencast videos. The README must also explain any non-obvious features of your interface.
-
-GitHub Details
----
-
-- Fork the repo. You now have a copy associated with your username.
-- Make changes to index.html to fulfill the project requirements. 
-- Make sure your "master" branch matches your "gh-pages" branch. See the GitHub Guides referenced above if you need help.
-- Edit the README.md with a link to your gh-pages or other external site: for example http://YourUsernameGoesHere.github.io/DataVisFinal/index.html
-- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.
-
-Grading
----
-
-- Process Book - Are you following a design process that is well documented in your process book?
-- Solution - Is your visualization effective in answering your intended questions? Was it designed following visualization principles?
-- Implementation - What is the quality of your implementation? Is it appropriately polished, robust, and reliable?
-- Presentation - Are your web site and screencast clear, engaging, and effective?
-Your individual project score will also be influenced by your peer evaluations.
-
-References
----
-
-- This final project is adapted from https://www.dataviscourse.net/2020/project/
+https://www.d3-graph-gallery.com/graph/line_select.html <br> 
+https://stackoverflow.com/questions/22452112/nvd3-clear-svg-before-loading-new-chart <br> 
+https://bl.ocks.org/d3noob/a22c42db65eb00d4e369 <br> 
+https://www.d3-graph-gallery.com/graph/scatter_tooltip.html <br> 
+https://stackoverflow.com/questions/39023154/how-to-make-a-color-gradient-bar-using-d3js <br> 
+https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform <br>