Skip to content

Homework_1 #150

@The-Paul2002

Description

@The-Paul2002

Exercise 1: Web Scraping, Portfolio Analysis & Visualization (10 pts)
Objective: Scrape top stock gainers from Yahoo Finance, analyze their historical performance, construct a simulated portfolio, and visualize its risk and return.

Part 1: Web Scraping Top Stock Gainers from Yahoo Finance (4pts)
Target URL: https://finance.yahoo.com/markets/stocks/gainers
Steps:

  1. Environment Setup:
  • Install necessary libraries:
    pip install selenium pandas yfinance matplotlib seaborn beautifulsoup4
  1. Scrape Top 50 Gainers:
  • Task: Use Selenium to navigate to the Yahoo Finance "Top Gainers" page.
  • Extraction: Identify and scrape the Symbol and Name for the top listed stocks.
  • Hint: The page might load dynamically. You may need to wait for elements to be visible using WebDriverWait and expected_conditions.
  • Hint: You might need to scroll the page or click a "Show More" button if all aren't visible initially. Inspect the HTML to find the relevant elements (table rows, cells for symbol and name).
  • Storage: Store the scraped symbols and names in a Pandas DataFrame.

Part 2: Historical Data Retrieval (4pts)
Objective: For each of the scraped stocks, retrieve 12 months of monthly historical adjusted close prices.
Steps:

  1. Data Retrieval (using yfinance):
  • Task: Use the yfinance library (a convenient way to access Yahoo Finance data programmatically) to download historical data for each of the 50 symbols you scraped.
    Parameters:
  • period="1y" (for 1 year of data)
  • interval="1mo" (for monthly data)
  • Data Structure: Store the monthly adjusted close prices for all 50 stocks in a single Pandas DataFrame, where columns are stock symbols and the index is the date.

Part 3: Portfolio Construction & Analysis (2pts)
Objective: Construct a 10-stock portfolio based on the first 6 months of data, and then analyze its performance over the subsequent 6 months.
Steps:

  1. Portfolio Selection (First 6 Months):
  • Task: From all stocks, select 10 stocks for your portfolio based on a simple strategy using their performance in the first 6 months of your 12-month data.
  • Strategy Idea (Example): You could choose the 10 stocks with the highest cumulative return in the first 6 months, or the 10 with the lowest volatility, or a mix. Clearly state your selection criteria.
  • Assumption: Assume an equal-weighted portfolio (e.g., 10% in each stock).
  1. Calculate Returns:
  • Individual Stock Monthly Returns: Calculate the monthly percentage return for each of your 10 selected stocks for the last 6 months.
  • Portfolio Monthly Returns: Calculate the monthly percentage return for your 10-stock portfolio for the last 6 months.

Exercise 2: Reddit API Data Collection & Sentiment Analysis (10pts)
Objective: Collect post and comment data from political subreddits using the Reddit API (PRAW), identify the most common posts and their comments
Part 1: Reddit API Setup & Data Collection (2pts)
Steps:

  1. Reddit API Credentials - Developer Account :
  • Go to www.reddit.com/prefs/apps/
  • Click "are you a developer? create an app..."
  • Name: Give it a descriptive name (e.g., "PoliticalSentimentAnalyzer")
  • Type: Choose "script"
  • Description: (Optional)
  • About URL: (Optional, can be anything, e.g., http://localhost)
  • Redirect URI: Crucial for script apps, use http://localhost:8080
  • Click "create app".
  • You will get a client_id (under "personal use script") and a client_secret (a long string, click "reveal" if hidden). Keep these confidential!
  1. Environment Setup:
  • Install necessary libraries:
    pip install praw
  1. API Connection (PRAW):
  • Task: Use your client_id, client_secret, and a user agent to connect to the Reddit API via PRAW. You'll also need your Reddit username and password (only for script apps).
    o User Agent: This is a string that identifies your script. Make it descriptive, e.g., f"Python:PoliticalSentimentAnalyzer:v1.0 (by /u/{your_reddit_username})"

Part 2: Collect Data and Storage (8pts)

  1. Collect Posts from Subreddits (3pts) :* Target Subreddits:* r/politics* r/PoliticalDiscussion
  • r/worldnews
  • Task: For each of the three subreddits, collect 20 "hot" or "top" posts per subreddit.
  • Extraction: For each post, extract:
  • title
  • score (upvotes)
  • num_comments
  • id (unique identifier)
  • url
  • Storage: Store this post data
  1. Collect Comments (3pts):
  • Task: For the subset of the most relevant posts collect 5 comments per post
  • Extraction: For each comment, extract:
  • body (the comment text)
  • score (upvotes on the comment)
  • post_id (to link back to the original post)
  • Storage this comment data
  1. Storage: Store this comment data linking each comment to its parent post (3pts)   
    

Some hints for your work.

  1. You have to create repo called web_scrapping
  2. Create a folder called code and there you have to save all your scripts. For part1 create a script named web_scraping_yahoo, for part 2 create a script named redit_api.
  3. Create a folder called output and there you have to save all your outpues, csvs or datasets.

You will also have to upload a video of 3 minutes explaining all the stepos you followed when creating the code. Use youtube and make the link open so we can watch it without restriction. Paste the link in the google sheet.

The students can work in pairs (2), as they preffered. But they will have to create theiw own repos an send repository [link]. (https://docs.google.com/spreadsheets/d/1HhY0njW25S47PP3BIDEkIjQUe35dwQFmDWg9gTn0vI8/edit?gid=0&utm_source=chatgpt.com#gid=0)

Deadline September 3, 23:59.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions