diff --git a/AI/Day01/1 - python/Python.ipynb b/AI/Day01/1 - python/Python.ipynb new file mode 100644 index 0000000..9216acb --- /dev/null +++ b/AI/Day01/1 - python/Python.ipynb @@ -0,0 +1,1084 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Python3](./images/python.jpg)\n", + "\n", + "# Python\n", + "\n", + "During this first preliminary activity, you will learn the basics of Python.\\\n", + "Python is vast and we will only look at the most important notions of the language.\\\n", + "Therefore, it is more than likely that during this week, you will observe a notion that is not present in this activity.\\\n", + "In which case you will have to look for the solution yourself.\n", + "\n", + "[Python Documentation](https://docs.python.org/3/)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Introduction\n", + "\n", + "Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable.\n", + "It uses English keywords frequently where as other languages use punctuation, and it has fewer syntactical constructions than other languages.\n", + "Here are some of the most important features of Python:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Types\n", + "\n", + "How to manage the types in python is a bit different than in other languages like C or Java for example:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "source": [ + "# First you don't need to declare variables or function before using them\n", + "# Python is dynamically typed, so you don't need to specify the type of variable\n", + "\n", + "my_var = 1 # my_var is an integer\n", + "print(my_var, type(my_var), end='\\n')\n", + "\n", + "my_var = 1.0 # my_var is now a float\n", + "print(my_var, type(my_var), end='\\n')\n", + "\n", + "my_var = \"Hello world\" # my_var is now a string\n", + "print(my_var, type(my_var), end='\\n')\n", + "\n", + "my_var = [1, 2, 3] # my_var is now a list\n", + "print(my_var, type(my_var), end='\\n')" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "To declare a function you just need to use the keyword def" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "source": [ + "def my_func():\n", + " print(\"Hello world\")\n", + "my_func() # you can put in comment this line to see the difference" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "You can also force the type of a variable.\n", + "And in a function you can force a variable to have value if not call in a function and force the return type of that function" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "source": [ + "def my_func2(repetition, end_sentence, string : str = \"Hello world\\n\", show_a_new_mechanic=0) -> str:\n", + " result = \"\"\n", + " for i in range(repetition):\n", + " result += (string)\n", + " result += end_sentence\n", + " print(show_a_new_mechanic)\n", + " return result" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "Now we will see a bit all things who can happen with a that kind of function" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(2, \"end\",show_a_new_mechanic=\"You are doing great !\")) # You can call a varrialbe by his name" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(2, \"end\", \"You are uncredible\\n\", 1)) # You don't need if you don't want to, but you need to respect the order" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(2, \"end\")) # You can also call just the necessary variable" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(\"That will be an error, it's important to give a type\", \"end\")) # The error is here because the first variable should be an integer" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(2, \"end\", 1)) # The error is here because the third variable should be a string" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "source": [ + "print(my_func2(0, 3)) # The error is here because the return type should be a string" + ], + "metadata": { + "collapsed": false + }, + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Strings\n", + "\n", + "In Python, strings are arrays containing smaller strings which represent characters.\n", + "\n", + "For example, by using the `type()` method we learned about earlier, you'll notice that `\"apple\"` and `'a'` are both of the same data type:\n", + "\n", + "> In Jupyter Notebooks, you can run each cell of code by clicking the ▶️ button or by pressing `SHIFT+ENTER` on your keyboard." + ] + }, + { + "cell_type": "code", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "source": [ + "this_is_a_string = \"apple\"\n", + "\n", + "type(this_is_a_string)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "source": [ + "this_is_also_a_string = this_is_a_string[0]\n", + "\n", + "type(this_is_also_a_string)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> As you can see, the output of `type()` is displayed beneath the cell.\n", + "> However, if you had both `type()` calls in the same cell, it would only display the last one so you would need to use `print()` if you wish to avoid creating multiple Jupyter cells." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conditions and loops \n", + "Because strings are arrays, multiple operations can be applied to them, like looping !\n", + "\n", + "```py\n", + "string = \"hello world\"\n", + "\n", + "for i in range(len(string)):\n", + " print(string[i])\n", + "```\n", + "\n", + "```py\n", + "for character in string:\n", + " print(character)\n", + "```\n", + "\n", + "These two blocks of code achieve the same result but they do so by different means.\n", + "\n", + "In the first example, we use an **iterable object** (we will learn how to make our own later) called `range` which contains an iterable from the provided arguments which we can use for our loop. Here, we provide the `len`gth of our `string` so range returns an iterable the size of `len`.\n", + "\n", + "In the second example, we pick each value from the `string` array inside a variable we chose to name `character` and then print it. You can do the same with any array." + ] + }, + { + "cell_type": "code", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "source": [ + "my_range = range(len(\"Hello world\"))\n", + "print(type(my_range), my_range, sep='\\n')\n", + "print(type(iter(my_range)), iter(my_range), sep='\\n')" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's say you want to print the \"Hello world\" string but you only want a new line when there is a space between the two words:\n", + "\n", + "```\n", + "Hello\n", + "world\n", + "```\n", + "\n", + "You can use an `if else` statement:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "string = 'hello world'\n", + "\n", + "for c in string:\n", + " if c == ' ': # no need for (parantheses) in python\n", + " print() # the print() method prints a new line by default\n", + " else:\n", + " print(c, end='') # thankfully, you can overwrite the 'end' argument" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Practice: Fizzbuzz\n", + "\n", + "This wouldn't be a pool if we only showed you cool stuff so it's time for you to use what you've learned so far to code the `Fizzbuzz` algorithm in Python.\n", + "\n", + "**Exercice :**\n", + "\n", + "Display the numbers **from 1 to nb_iterations** with a **for** loop.\\\n", + "If a number is a **multiples of 3**, write **\"Fizz\" instead of the number.**\\\n", + "If a number is a **multiples of 5**, write **\"Buzz\" instead of the number.**\\\n", + "If a number is a **multiple of both 3 and 5**, write **\"FizzBuzz\" instead of the number.**\n", + "\n", + "You must follow the **diagram** :\n", + "\n", + "![schema](./images/diagramme.png)" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "### enter your code here:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "###" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Lists, tuples, sets and dictionaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are four different built-in data types which are used to store groups of data in Python, they are Lists, Tuples, Sets and Dictionaries.\n", + "\n", + "| |List |Tuple |Set|Dictionary\n", + "|-------------------|-----|------|---|---------|\n", + "|`new_instance =` |`[]` or `list()`|`()` or `tuple()`|`set()`|`{}` or `dict()`|\n", + "|Mutable |✅|❌|✅|✅\n", + "|Ordered |✅|✅|❌|✅\n", + "|Allows duplicates |✅|✅|❌|❌" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We've made a series of operations to each of these data types, analyse each of these cells to see what operations were made and why each result " + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_list = list()\n", + "my_list.append(1) ## adding '1' to our list\n", + "my_list.append(1) ## adding '1' again to our list\n", + "my_list.append(2) ## adding '2' to our list\n", + "my_list.pop() ## removing the last element from our list\n", + "\n", + "my_list" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> You can also provide an index to `pop()` in order to remove a value at a certain index" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_tuple = tuple(my_list) # once a tuple is defined, you can no longer modify its contents\n", + "\n", + "my_tuple" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_set = set()\n", + "my_set.add(1) # adding '1' to our set\n", + "my_set.add(1) # adding '1' again to our set\n", + "my_set.add(2) # adding '2' to our set\n", + "my_set.remove(2) # removing the last element from our list (since sets are unordered, we remove '2' manually because we know it is the last element we added)\n", + "\n", + "my_set" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_dict = dict()\n", + "my_dict[1] = \"one\" # adding '1' to our dictionary\n", + "my_dict[1] = \"ONE\" # adding '1' again to our dictionary (we also changed its value)\n", + "my_dict[2] = \"two\" # adding '2' to our dictionary\n", + "my_dict.popitem() # removing the last element from our list\n", + "\n", + "my_dict" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Although dictionaries are ordered since python version 3.7, `popitem()` might not be the most useful method when dealing with dictionaries, so keep in mind that you can delete any dictionary entry by using `del my_dict[key]`:\n", + "\n", + "```py\n", + "my_dict = {}\n", + "my_dict[\"apple\"] = \"red\"\n", + "my_dict[\"banana\"] = \"yellow\"\n", + "\n", + "del my_dict[\"apple\"] ## this will delete the \"apple\" key\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Practice: Occurences in sentence\n", + "\n", + "With all of this, you should be able to create a dictionary containing the amount of times each character makes an appearance inside a given string.\n", + "\n", + "For example, with \"Hello world\", your dictionary should be:\n", + "\n", + "```\n", + "{\n", + " \"h\": 1,\n", + " \"e\": 1,\n", + " \"l\": 3,\n", + " \"o\": 2,\n", + " \" \": 1,\n", + " \"w\": 1,\n", + " \"r\": 1,\n", + " \"d\": 1,\n", + "}\n", + "```" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "sentence = \"By using what you've learned about dictionaries, this exercise should not be too difficult. Good luck !\"\n", + "\n", + "# Enter your code here\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "#" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Functions & Classes\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before heading any further, you might want to know how to make a function (also called method) in Python.\n", + "\n", + "It's as simple as this:\n", + "\n", + "```py\n", + "def myFunc(arg1, arg2): # as Python is dynamically typed, there is no need to specify argument types\n", + " my_code = arg1 + arg2\n", + " return my_code # you can also return multiple values if you want: simply seperate them using commas\n", + "```\n", + "myFunc(1, 2) # For calling the method" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Why don't you try wrapping your occurence counter exercise from before inside a method ?" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def occurenceCounter(sentence):\n", + " \"\"\" \n", + " Copy paste your code here and make the necessary adjustments\n", + " \"\"\"\n", + " my_dict = {}\n", + "\n", + "\n", + " return my_dict\n", + "\n", + "occurenceCounter(\"Wow, making functions is really easy in Python !\")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Classes is what we call **Object Oriented Programming** (OOP) and is essential in a vast number of languages.\n", + "To vulgarize, classes are sort of mold used to create **object**. Once you've the molds, you can create as many objects of the same type as you want. This is used in every import you do, any functions from libraries are **methods** from classes.\n", + "Their names are often written with a uppercase at the beginning. \n", + "\n", + "If you want to get deeper in this notion,\\\n", + "I highly recommend you to search on Internet. It's a well explained subject.\\\n", + "[Python classes doc](https://docs.python.org/3/tutorial/classes.html)\n", + "\n", + "**Example :**\\\n", + "Here is an example of a class, to help understand here are some remarks on the code :\n", + "\n", + "- variables that start with __ are called private\n", + "- the method \\_\\_init__ is the constructor, it is called at the instanciation of the object.\n", + "- the method \\_\\_str__ is a method that describes the object." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class MyClass:\n", + " '''This is my first class in Python'''\n", + " def __init__(self, name, firstname, fav_color, fav_digit):\n", + " self.name = name\n", + " self.firstname = firstname\n", + " self.setFavColor(fav_color)\n", + " self.setFavDigit(fav_digit)\n", + " \n", + " def __str__(self):\n", + " return f'Your name is {self.firstname} {self.name}, your favorite color is {self.getFavColor()} and your favorite number is {self.getFavDigit()}'\n", + " \n", + " def setFavColor(self, fav_color):\n", + " color = [\"red\", \"blue\", \"purple\", \"green\", \"yellow\", \"orange\", \"white\", \"black\", \"pink\", \"brown\"]\n", + " if fav_color in color:\n", + " self.__fav_color = fav_color\n", + " else:\n", + " self.__fav_color = None\n", + " \n", + " def setFavDigit(self, fav_digit):\n", + " if isinstance(fav_digit, int) and -1 < fav_digit < 10:\n", + " self.__fav_digit = fav_digit\n", + " else:\n", + " self.__fav_digit = None\n", + " \n", + " def getFavDigit(self):\n", + " return self.__fav_digit \n", + " \n", + " def getFavColor(self):\n", + " return self.__fav_color \n", + " \n", + "robot = MyClass(\"Robot\", \"PoC\", \"red\", 5)\n", + "print(robot)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercice :**\\\n", + "Create a ```Calculator``` class.\n", + "\n", + "It will take as initialization parameter a ```name``` value.\\\n", + "it will have the methods ```add```, ```sub```, ```mul```, ```div```, ```modulo``` which will take two parameters ```x``` and ```y``` and will return the result of the operation corresponding to the name of each method between x and y.\\\n", + "Create a method ```__str__``` that will return ```Hello my name is {name}.```" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Create your Calculator class here\n", + "\n", + "class Calculator:\n", + " pass\n", + "\n", + "#\n", + "\n", + "\n", + "my_calc = Calculator(\"PoC\")\n", + "print(my_calc)\n", + "print(my_calc.add(1, 2))\n", + "print(my_calc.mul(1, 2))\n", + "print(my_calc.sub(1, 2))\n", + "print(my_calc.div(1, 2))\n", + "print(my_calc.modulo(1, 2))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What if you now wanted to create a new Class which reuses the methods inside the Calculator class ?" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class SuperCalculator(Calculator):\n", + " def __init__(self, name):\n", + " super().__init__(name) # the super() keyword inherits all the parameters of the parent class...\n", + "\n", + " def square(self, x):\n", + " return self.mul(x, x) # ...which allows you to call its methods inside SuperCalculator\n", + "\n", + "my_super_calc = SuperCalculator(\"Hello world\")\n", + "\n", + "my_super_calc.square(3)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Anonymous Function\n", + "\n", + "### 📖 A bit of history...\n", + "\n", + "Python 1.0 introduced functional programming tools such as `lambda`, `map`, and `filter` (the latter two will be covered together in the next section, cf: \"Array methods\"). These features were added by a Python user who found that the language was incomplete without them.\n", + "\n", + "### The λ lambda function\n", + "\n", + "One of these features, the lambda function provides Python developers the ability to use anonymous functions:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "hello_world = lambda: print(\"hello, world\") ### simply define the function after 'lambda:'\n", + "\n", + "hello_world()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "square = lambda x: x ** 2 ### you can provide arguments to lambda (you can call `x` whatever you want)\n", + "\n", + "square(5)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "add = lambda a, b: a + b ### you can give lambda as many arguments as you want\n", + "\n", + "add(2, 3)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List comprehensions\n", + "\n", + "Similarly to lambda functions, you can replace `for loops` with **list comprehensions** to quickly apply a function to any list:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_array = [1, 2, 3, 4, 5, 6]\n", + "\n", + "[x * 10 for x in my_array] # again, you can call `x` whatever you want" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Array methods\n", + "\n", + "Lambda is very powerful when used with some awesome methods in python for dealing with arrays that every person learning Python should be familiar with !\n", + "\n", + "In this section we'll introduce \n", + "- `map()`, for **map**ping through an array and transforming all of its values at once\n", + "- `filter()`, for **filter**ing an array's values, allowing you to keep only values which match a condition\n", + "\n", + "They both take a function as first argument and an array as the second argument, so you can use `lambda` functions directly !" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Try and implement `map()` and `filter()` in the below cell:\\\n", + "> You should use `lambda` to make your life easier" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "my_array = [\"Mapping's\", \"actually\", \"powerful\"]\n", + "\n", + "my_mapped_array = None ## use+ to transform my_array into an array containing only the first letter of each word\n", + "print(list(my_mapped_array))\n", + "\n", + "my_array = [1, 2, 3, 4, 5, 6]\n", + "\n", + "my_filtered_array = None ## use filter to keep only the even numbers inside my_array\n", + "\n", + "print(list(my_filtered_array))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Try, except, raise, assert: **Error Handling in Python**\n", + "\n", + "If you attempted to call the methods before defining them inside the Class, you might've run into some errors, like for example:\n", + "\n", + "```\n", + "AttributeError: 'Calculator' object has no attribute 'add'\n", + "```\n", + "\n", + "There is a way for you to handle such errors in Python !\n", + "\n", + "> For this section, we *will* be dealing with errors, so don't worry if you see a lot of red outputs in your notebook, this is the only place where it will mean the code is executing properly :)" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def try_division(number1, number2):\n", + " try: ## we make an attempt to run the code\n", + " ans = my_calc.div(number1, number2)\n", + " except ZeroDivisionError: # if the code encounters a ZeroDivsionError error\n", + " print(\"You cannot divide a number by zero !\")\n", + " except: # if the code encounters any other error\n", + " print(\"An error has occurred\")\n", + " else: # if the code does not encounter an error\n", + " print(f\"Okay, no rules were violated: the answer is {ans}\")\n", + " finally: # regardless of result\n", + " print(f\"({type(number1)} {number1}, {type(number2)} {number2})\")\n", + " print()\n", + "\n", + "try_division(2, 0)\n", + "try_division(2, \"two\")\n", + "try_division(2, 2)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> You might notice that we used a cool trick to format our printed messages:\\\n", + "> You can add variables to your print commands by adding the 'f' character inside the method call !\n", + "```py\n", + ">>> print(f\"This print statement contains {(int)(9 / 9 + 1 - 1)} variable inside curly brackets !\")\n", + "This print statement contains 1 variable inside curly brackets !\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Another cool trick here: if we didn't use (int) to cast our result from a float into an int, our sentence would have read \"... contains 1.0 variable ...\" which would have been weird.\n", + "\n", + "You can also use the `assert` keyword to make sanity checks in order to test your Python code:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "assert 1 == 1, \"one is not equal to one...\" ## this assert will pass because 1 == 1\n", + "assert 1 == 2, \"one is not equal to two...\" ## this will raise an AssertionError because 1 != 2" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def signUp(username):\n", + " if username == \"PoCInnovation\":\n", + " raise Exception(f\"{username} is already taken\")\n", + " print(f\"Welcome, {username} !\")\n", + "\n", + "signUp(\"PoCCommunity\")\n", + "signUp(\"PoCInnovation\")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Practice: Custom Exception\n", + "\n", + "Let's put all of this into practice: you've learned about class inheritance and exceptions... what if you made your own custom exception by **inheriting** the Exception class ?\n", + "\n", + "This is the output you should receive:\n", + "\n", + "```\n", + "----> 8 raise MyException(\"This is my custom Exception\")\n", + "MyException: This is my custom Exception\n", + "```" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Enter your code here\n", + "\n", + "#" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reading from a file\n", + "\n", + "Although there are awesome libraries in python for data reading, like **Pandas**, we will be doing it the old fashioned way for just a little bit longer ! But don't worry, later today, you will start using the most popular tools in artifcial intelligence for data analysis !\n", + "\n", + "### `with` keyword\n", + "\n", + "In python the with keyword is used when working with unmanaged resources (like file streams).\\\n", + "It is similar to the use statement in VB.NET and C#.\\\n", + "It allows you to ensure that a resource is \"cleaned up\" when the code that uses it finishes running, even if exceptions are thrown. " + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "with open('./data_types.txt', 'r') as f:\n", + " data = f.read() ### read() will read all the content inside the file\n", + "\n", + "data" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "with open('./data_types.txt', 'r') as f:\n", + " lines = f.readlines() ## readlines() will read the file line by line and return a list of each line\n", + "\n", + "lines" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Practice: Auto correction\n", + "\n", + "Today's first assignment was to fill the data types for each of the following values:\n", + "\n", + "```\n", + "1\n", + "\"hello world\"\n", + "1.0\n", + "[\"apples\", \"oranges\", \"bananas\"]\n", + "{\"answer\": 42}\n", + "2 + 2 == 4\n", + "None\n", + "```\n", + "\n", + "inside a file called 'data_types.txt'\n", + "\n", + "With everything you've learned, we want you to verify if you've done your own assignment properly !\n", + "\n", + "What is required:\n", + "\n", + "- create a collection of your choice which will contain the required values listed above\n", + "- loop through the values and create a new dictionary with their `type()` as value\n", + "- open your own 'data_types.txt' file and see if the contents match with the dictionary\n", + "- use error handling methods like exceptions or asserts to verify if your 'data_types.txt' file's content is correct\n", + "\n", + "**Example:**\n", + "\n", + "If your dictionary contains : {1: 'int'} and the 'data_types.txt' file doesn't read `int` as it's first line, an error must occur !" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Write your code inside this cell\n", + "# The only requirement is that a method called verify_data_types() exists\n", + "# Feel free to make any other changes\n", + "\n", + "filename = './data_types.txt'\n", + "\n", + "def verify_data_types(filename):\n", + " pass" + ], + "outputs": [], + "execution_count": null + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Writing to a file (and using libraries)\n", + "\n", + "You now have a nice dictionary with each value and its type.\n", + "\n", + "The way we filled in the values inside 'data_types.txt' is kind of ugly...\n", + "\n", + "In data science, a file extension that is commonly used is '.csv'.\n", + "\n", + "A '.csv' file is formatted as follows:\n", + "\n", + "```\n", + "column_a, column_b\n", + "index1_a, index2_b\n", + "index2_a, index2_b\n", + "```\n", + "\n", + "In our case, it would look like:\n", + "\n", + "```\n", + "value, data type\n", + "1, int\n", + "\"hello world\", str\n", + "```\n", + "\n", + "For the last assignment in this first notebook, we will ask you to please fill in a file called 'data_types.csv' the rest of the values in the **csv** format.\n", + "\n", + "> You will need to use `with open('./data_types.csv', 'w') as f:` because you are **w**riting to a file, not **r**eading.\n", + "\n", + "In order to make things easier for you, there is a library called `csv`, which, as the name suggests, provides various methods and Classes which are useful for managing csv files in python !\n", + "\n", + "To import `csv` (and any other package in python), you simply need to run the following cell:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import csv" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, you have access to any method or class defined inside `csv` by calling it with `csv.[methodName]()`\n", + "\n", + "If your IDE supports it, you can also start typing `csv.` and the autocomplete might have some suggestions for you.\\\n", + "If not, check out the [official documentation](https://docs.python.org/3/library/csv.html)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "## Enter your code here\n", + "\n", + "\n", + "\n", + "##" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Great job ! You now master the basics of the python language ! 🥳\n", + "\n", + "You can now start using external packages which will prove very useful for data science and machine learning in general !" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day01/1 - python/README.md b/AI/Day01/1 - python/README.md new file mode 100644 index 0000000..e9b18c4 --- /dev/null +++ b/AI/Day01/1 - python/README.md @@ -0,0 +1,180 @@ +# Module 1: Python + +Welcome and thank you for participating in the Artificial Intelligence Pool by PoC Innovation ! Our team has worked hard to create a great learning resource and introduction to the incredible world of Artificial Intelligence and what better way to start than the most popular programming language for Machine Learning - **Python** ! + +During this week, you will use Python to tackle Machine Learning subjects of increasing difficulty, so never hesitate to ask the PoC staff for help, they'll be happy to answer your questions and don't worry about not completing each Pool Day: **the subjects are difficult to finish by design** ! + +Ultimately, the goal is for you to achieve a good knowledge of the field of AI after this week. So make sure you take your time to truly grasp the concepts you will see throughout this course. **Learning AI is not something you can rush; so make sure you understand the theory behind what you do during this Pool**. + +Now, buckle up: **your adventure begins now** ! + + +## Python3 + +Python is a high-level interpreted language, which has become very popular in the academic and scientific community for its simplicity and its layer of abstraction of several rules of computer science, such as the fact that you don't need to type your variables. Python is also able to interpret different programming paradigms like imperative, object-oriented... + +So you may ask yourself: Why do we use Python which is slower than C/C++ for AI, a domain that requires a lot of resources? +Well, quite simply, libraries like TensorFlow work with a Python interface on the surface but use the C++ language behind. + +### Example + +> "Hello World" Function + +```py +def myFunc(name="Undefined"): +""" A function is defined using the keyword `def` followed by a space +and the name of the function, between brackets we put the different parameters +for the example we have the parameter `name` which by default is equal to Undefined. +This comment is called a docstring: it provides a way for developers to explain the usage of a function / class / etc. so that other developers can understand how to use them. For a function, you'll generally find the required arguments as well as the returned values and maybe some usage examples. + +Args: + name: name of the user + +Returns: + None +""" + print("Hello " + name) + +# myFunc() will display "Hello Undefined" +# myFunc("PoC") will display "Hello PoC" + +``` + +## Python Command Line + +Before we get started with **iPython notebooks**, which you will be using for most of the week, we'd like to show you a cool party trick that you can do using any terminal with Python installed ! + +Open up a new terminal window: inside, simply run the `python` command ! + +```bash +> python +``` + +This command will open up the python command line: + +```bash +> python +Python 3.X.XX (main, XXX XX XXXX, XX:XX:XX) +[GCC 7.5.0] :: Anaconda, Inc. on linux +Type "help", "copyright", "credits" or "license" for more information. +>>> +``` + +The output **will** differ based on your Python version and other variables but it **should** open a command line where you can execute any python code ! + +That's right: you can even build an entire neural network inside this command line (although we **will** be very scared if you choose to do so) + +Of course, you will not be writing your code inside this command line very often, because most of the time, you wish to save your code inside a `.py` or `.ipynb` file 😄. + +However, this command line can help you troubleshoot your code in a smaller scope. + +You can try using the `print()` method to print some stuff on your terminal. + +```bash +Type "help", "copyright", "credits" or "license" for more information. +>>> print("Hello world") +Hello world +>>> +``` + +Pretty cool, right ? What if you instead want to know what the result of 2 + 2 is ? + +```bash +Type "help", "copyright", "credits" or "license" for more information. +>>> print(2 + 2) +4 +>>> a = 2 + 2 +>>> print(a) +4 +>>> 2 + 2 +4 +>>> +``` + +Interesting... these three inputs have the same output, "4". + +One of them uses the `print()` method, the other stores 2 + 2 inside a and then prints it and the last one only calculates 2 + 2 but still, the output is 4 despite the fact that we didn't even ask Python to display the result ! + +Try to open your own Python command line and run your own experiments to see what's happening. + +In python, the result of each line of code is displayed in the console. This behaviour is particularly useful for visualisation inside Jupyter Notebooks, which we will talk about just after this small explanation of data types: + +## Python Data Types + +But the last result being displayed might not be the only thing that surprised you with how python works. + +Many of you might be familiar with the C language and its `printf()` function. In C, in order to print the result of 2 + 2, you would need to do the following: + +```c +printf("%i\n", 2 + 2); +``` + +In python, it's enough to just do: + +```py +print(2 + 2) +``` + +Why is that ? Well, python is both a strongly and dynamically typed language: + +| Typing | Static | Dynamic | +| -------- | --------- | --------- | +| Variable | typed | not typed | +| Value | not typed | typed | + +| Typing | Strong | Weak | Strong & Dynamic | +| -------------- | ------ | ---- | ---------------- | +| "I am " + 13 | ❌ | ✅ | ✅ | +| "I am " + "13" | ✅ | ✅ | ✅ | + +Static typing (for example C) means that variables have a type which must be known by the interpretor from the moment the variable is declared.\ +Dynamic (for example JavaScript) typing means that values / objects have types which can be changed at any given time. + +Strong typing (C) means that you can not perform operations between different types of variables.\ +Weak typing (JavaScript) means that you can perform any operation between any types of variables. + +A strongly and dynamically typed language (Python) allows the developer to benefit from dynamic typing but still has the safety provided by strong typing. + +You can open a Python command line and use the `type()` method to check out the different data types for each value: + +```bash +>>> type(1) + +>>> type("hello world") + +>>> +``` + +Before heading over to the `.ipynb` notebook, please fill in and submit a file called `data_types.txt` which contains the corresponding built-in types for the following values in order: + +- 1 +- "hello world" +- 1.0 +- ["apples", "oranges", "bananas"] +- {"answer": 42} +- 2 + 2 == 4 +- None + +Simply write the output of `type()` for each of the above values in order, one by line. + +```bash +❯ cat data_types.txt -e +$ +$ +``` + +## Submit 🏆 + +- Fill the `data_types.txt` file with the required data types. + +- Fill the ``Python.ipynb`` notebook. + +To submit your work, think about pushing your changes. It is important to push so that we are able to assess participation. +If you have any concerns, talk to a supervisor. + +## Resources + + - [Python3 Documentation](https://docs.python.org/3/) + - [Why Python3 more than C++ ?](https://fr.quora.com/Pourquoi-Python-est-tr%C3%A8s-utilis%C3%A9-en-IA-Big-Data-alors-quil-nest-pas-le-plus-performant-en-rapidit%C3%A9-de-calcul) + - [Python3 courses](https://courspython.com/introduction-python.html) + - [Python3 Machine Learnia](https://www.youtube.com/watch?v=82KLS2C_gNQ) diff --git a/AI/Day01/1 - python/data_types.csv b/AI/Day01/1 - python/data_types.csv new file mode 100644 index 0000000..7585d8c --- /dev/null +++ b/AI/Day01/1 - python/data_types.csv @@ -0,0 +1,3 @@ +value,data type +1,int +hello world,str diff --git a/AI/Day01/1 - python/data_types.txt b/AI/Day01/1 - python/data_types.txt new file mode 100644 index 0000000..f0d69a0 --- /dev/null +++ b/AI/Day01/1 - python/data_types.txt @@ -0,0 +1,2 @@ +int +str diff --git a/AI/Day01/1 - python/images/diagramme.png b/AI/Day01/1 - python/images/diagramme.png new file mode 100644 index 0000000..5b1a34a Binary files /dev/null and b/AI/Day01/1 - python/images/diagramme.png differ diff --git a/AI/Day01/1 - python/images/listvstuplevsset.png b/AI/Day01/1 - python/images/listvstuplevsset.png new file mode 100644 index 0000000..6284e90 Binary files /dev/null and b/AI/Day01/1 - python/images/listvstuplevsset.png differ diff --git a/AI/Day01/1 - python/images/python.jpg b/AI/Day01/1 - python/images/python.jpg new file mode 100644 index 0000000..feaa54a Binary files /dev/null and b/AI/Day01/1 - python/images/python.jpg differ diff --git a/AI/Day01/2 - numpy_matplotlib/README.md b/AI/Day01/2 - numpy_matplotlib/README.md new file mode 100644 index 0000000..fad19de --- /dev/null +++ b/AI/Day01/2 - numpy_matplotlib/README.md @@ -0,0 +1,60 @@ +# Module 2 : NumPy & Matplotlib + +Welcome to this second module young student, you are now comfortable with Python it is time to enter the world of data science. + +In that section we will learn about the NumPy library and the Matplotlib library. + +## NumPy + +NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. +It is the fundamental package for scientific computing with Python. A little exemple to illustrate the power of NumPy: + +![NumpyArray.png](assets/NumpyArray.png) + +In the image above, we can see that is really easy to create a multi-dimensional matrix, and now, we will see, also why it is also really simple to make operation on this array with numpy. +For example if we want to take the transpose of the array, we just have to do: + +```python +import numpy as np + +arr = np.array([[1, 2, 3, 3], [4, 5, 2, 8], [7, 8, 9, 13], [10, 11, 12, 15]]) +arr = arr.T +``` + +now if we want to have all number in the diagonal, we just have to do: + +```python +arr.diagonal() +``` + +Those are easy example, but we can do way more, but you will see it in the exercices. + +## Matplotlib + +Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. +It is in fact to show the data that we will manipulate with NumPy, and it is really easy to use, for example if we want to plot a simple graph, we just have to do: + +```python +import matplotlib.pyplot as plt + +x = [1, 2, 3, 4, 5] +y = [2, 3, 4, 5, 6] + +plt.plot(x, y) +plt.show() +``` + +We just create a line who is based on the fonction y = x + 1, and we can see that it is really simple to do, of course we can do way more, +but you will see it in the exercices. + +## Submit 🏆 + +Fill the notebook: ``numpy_matplotlib.ipynb`` + +To submit your work, think about pushing your changes. It is important to push so that we are able to assess participation. +If you have any concerns, talk to a supervisor. + +## Resources :book: + +- [Doc NumPy](https://numpy.org/doc/stable/) +- [Doc Matplotlib](https://matplotlib.org/stable/contents.html) diff --git a/AI/Day01/2 - numpy_matplotlib/assets/Matrix.png b/AI/Day01/2 - numpy_matplotlib/assets/Matrix.png new file mode 100644 index 0000000..07a5fa1 Binary files /dev/null and b/AI/Day01/2 - numpy_matplotlib/assets/Matrix.png differ diff --git a/AI/Day01/2 - numpy_matplotlib/assets/NumpyArray.png b/AI/Day01/2 - numpy_matplotlib/assets/NumpyArray.png new file mode 100644 index 0000000..ab205d7 Binary files /dev/null and b/AI/Day01/2 - numpy_matplotlib/assets/NumpyArray.png differ diff --git a/AI/Day01/2 - numpy_matplotlib/numpy_matplotlib.ipynb b/AI/Day01/2 - numpy_matplotlib/numpy_matplotlib.ipynb new file mode 100644 index 0000000..a8c8fc0 --- /dev/null +++ b/AI/Day01/2 - numpy_matplotlib/numpy_matplotlib.ipynb @@ -0,0 +1,427 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Numpy and Matplotlib\n", + "\n", + "Hello everyone, today we will see how to use Numpy and Matplotlib in order to make some graph and some matrix calculation.\n", + "\n", + "# Table of content\n", + "\n", + "1. [Numpy](#Numpy)\n", + "2. [Matplotlib](#Matplotlib)\n", + "\n", + "# Numpy\n", + "\n", + "Numpy is a library that allows us to make some matrix calculation and some graph. It's a very powerful library that is used in a lot of other library like Tensorflow, Pytorch, Scikit-learn, etc... For today, we will see how to use it in order to make some matrix calculation and some graph. It's one of the lib that is the most used in the world of data science and machine learning. Knowing how to use that lib is the first step to become a data scientist or a machine learning engineer :). So let's start by a bit of theory and then we will see how to use it.\n", + "\n", + "## What is a matrix?\n", + "\n", + "Matrix are table of elements, like numbers or characters. It's a very useful tool in mathematics and in computer science. It's used in a lot of domain like machine learning, data science, etc... It's a very powerful tool that allows us to make some calculation that we can't do with a simple number. It allowed us to make calculations in different dimension and can create link between them. So let's see how look a matrix like :\n", + "![Matrix](assets/Matrix.png)\n", + "\n", + "Here m and n are numbers who are numbers belonging of N (all number between 0 and infinity). The matrix is composed by m rows and n columns. Here each elements of the matrix are represented by a square. Now, we will learn how to create, manipulate and use matrix..." + ], + "metadata": { + "collapsed": false + }, + "id": "9a27aa8a0cef2c42" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "initial_id", + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "plt.style.use('seaborn-v0_8-whitegrid')\n", + "import einops" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Matrix calculation\n", + "\n", + "Let's start by making some matrix calculation. First, we will see how to create a matrix and then we will see how to make some calculation with it." + ], + "metadata": { + "collapsed": false + }, + "id": "c8b32cd854161599" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# let's make a matrix filled by zeros in 2 dimention by using Numpy\n", + "\n", + "def create_two_dimention_matrix_zeros(size_dim1 : int, size_dim2 : int):\n", + " pass\n", + "\n", + "matrix_zeros = create_two_dimention_matrix_zeros(12, 90)\n", + "assert (matrix_zeros.shape[0] == 12 and matrix_zeros.size == 1080 and matrix_zeros.max() == 0 and matrix_zeros.min() == 0), \"Not yet try again\"\n", + "print(\"Well done!\")" + ], + "metadata": { + "collapsed": false + }, + "id": "98ddfc3989c1b40" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# let's create another one but this time filled by random value between 1 and 0\n", + "\n", + "def create_two_dimention_matrix_randoms(size_dim1 : int, size_dim2 : int):\n", + " pass\n", + "\n", + "matrix_random = create_two_dimention_matrix_randoms(5, 6)\n", + "assert (matrix_random.shape[0] == 5 and matrix_random.size == 30 and matrix_random.max() <= 1.0 and matrix_random.min() >= 0.0), \"no no no, try again\"\n", + "print(\"Continue like that!\")" + ], + "metadata": { + "collapsed": false + }, + "id": "fe3ee4040d9a8f8" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# let's now make some calcul with numpy, what do I mean with that it's that we have to add 1 at number of the random matrix \n", + "\n", + "def add_1(matrix):\n", + " pass" + ], + "metadata": { + "collapsed": false + }, + "id": "c08e4b4a42391d91" + }, + { + "cell_type": "markdown", + "source": [ + "## Multiplication of matrix\n", + "\n", + "Now that we know how to create a matrix and how to make some calculation with it, let's see how to multiply 2 matrix. Be careful, it's not the same as multiplying 2 number... Goog luck! :)" + ], + "metadata": { + "collapsed": false + }, + "id": "85e51041c763e0f1" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# let's now add multiply 2 matrix together\n", + "# I will let you also make the assertion that the matrix are able to be multiply together\n", + "\n", + "def multiply_matrix(matrix1, matrix2):\n", + " pass\n", + "\n", + "assert (multiply_matrix(matrix_random, matrix_zeros) == \"The matrix are not able to be multiply together\" and multiply_matrix(np.array([[1, 2]]), np.array([[2], [1]])) == np.array([4])), \"No no no, try again\"" + ], + "metadata": { + "collapsed": false + }, + "id": "bbae764ca6fae5e7" + }, + { + "cell_type": "markdown", + "source": [ + "## Averages calculation\n", + "\n", + "Now that we know how to multiply 2 matrix together, let's see how to calculate the average of each element of a matrix." + ], + "metadata": { + "collapsed": false + }, + "id": "d30e47c51695fa23" + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "# Calculate the average of each element of a matrix\n", + "\n", + "def calculate_average(matrix):\n", + " pass\n", + "\n", + "matrix = np.array([[1, 2, 3], [3, 3, 3], [7, 10, 13]])\n", + "\n", + "assert (calculate_average(np.array(matrix)) == 5.0), \"No no no, try again\"\n" + ], + "metadata": { + "collapsed": false + }, + "id": "70c83a2960c6fc2e", + "execution_count": null + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "# Calculate the standart deviation of each element of a matrix\n", + "\n", + "def calculate_standart_deviation(matrix):\n", + " pass\n", + "\n", + "assert (calculate_standart_deviation(np.array(matrix)) == 3.858612300930075), \"No no no, try again\"\n" + ], + "metadata": { + "collapsed": false + }, + "id": "df9a43bd1218aa66", + "execution_count": null + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "# Calculate the variance of each element of a matrix\n", + "\n", + "def calculate_variance(matrix):\n", + " pass\n", + "\n", + "assert (calculate_variance(np.array(matrix)) == 14.88888888888889), \"No no no, try again\"\n" + ], + "metadata": { + "collapsed": false + }, + "id": "de445a8efc627481", + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "## Matplotlib\n", + "\n", + "### Graph\n", + "\n", + "Now that we know how to make some matrix calculation, let's see how to make some graph with matplotlib. We will see how to make some simple graph in 2D and then we will see how to make some more complex graph in 3D.\n", + "\n", + "### show intersection between 2 function\n", + "\n", + "Let's start by something simple, we want to see the intersection between 2 function.\n", + "\n", + "So first, we will create 2 function, and then we will see how to show the intersection between them. the first function will be :\n", + "\n", + " $$ \\begin{equation*}\n", + " f(x) = sinh(x) / tan(x)\n", + " \\end{equation*} $$" + ], + "metadata": { + "collapsed": false + }, + "id": "382449e2d67aeb98" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def f(x):\n", + " pass\n", + "\n", + "assert (f(1) == 0.7545880086758965 and f(2) == -1.6598600642614152), \"No no no, try again\"" + ], + "metadata": { + "collapsed": false + }, + "id": "1a6e16dac73f71e2" + }, + { + "cell_type": "markdown", + "source": [ + "The second function will be a $$ \\begin{equation*}\n", + " g(x) = cosh(x) / \\sqrt {e^x}\n", + " \\end{equation*} $$" + ], + "metadata": { + "collapsed": false + }, + "id": "e2982d1bc79dac53" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def g(x):\n", + " pass\n", + "\n", + "assert (g(1) == 0.93592571542427898 and g(2) == 1.3840344484134546), \"No no no, try again\"" + ], + "metadata": { + "collapsed": false + }, + "id": "189eddf89312a90f" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Let's make a function that shows the intersection between 2 function\n", + "\n", + "def show_intersection(function1, function2, start, end):\n", + " x = np.linspace(start, end, 100)\n", + " ...\n", + " plt.show()\n", + "\n", + "show_intersection(f, g, -1, 1)" + ], + "metadata": { + "collapsed": false + }, + "id": "69b09d85b2348046" + }, + { + "cell_type": "markdown", + "source": [ + "## BONUS - 3D Graph\n", + "Before that we create our first AI, let's discover how to make a 3D schem" + ], + "metadata": { + "collapsed": false + }, + "id": "ceb8512e51ed8142" + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "# Now let's try to make a cap in mathematics, so let's make a function that make 1/4 of a circle.\n", + "\n", + "def one_fourth_of_a_circle(r: float):\n", + " pass" + ], + "metadata": { + "collapsed": false + }, + "id": "d8ac5d540897cc7e", + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def show_2d(r: float = 1.0):\n", + " _, ax = plt.subplots(1)\n", + " \n", + " ...\n", + " \n", + " plt.xlim(0 ,1.25)\n", + " plt.ylim(0 ,1.25)\n", + "\n", + "show_2d()" + ], + "metadata": { + "collapsed": false + }, + "id": "76f637cb40c82eb0" + }, + { + "cell_type": "markdown", + "source": [ + "That was easy!!!\n", + "Now, let's try to put that line in a 3D space.\n", + "(You can use again, your function one_fourth_of_a_circle) " + ], + "metadata": { + "collapsed": false + }, + "id": "723734dfab82523a" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def create_arc_surface_3d(r: float = 1.0):\n", + " pass\n", + "\n", + "create_arc_surface_3d()" + ], + "metadata": { + "collapsed": false + }, + "id": "6f2aa679e98b8e42" + }, + { + "cell_type": "markdown", + "source": [ + "Now that you have the basis, now let's create our cap!!!\n", + "\n", + "A little tips, now, you have to adapt your first code to make it in 3d." + ], + "metadata": { + "collapsed": false + }, + "id": "88bf44a4fe1e07bb" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def creat_a_cap_3d(r: float = 1.0):\n", + " ...\n", + " x, y, z = ...\n", + " \n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111, projection='3d')\n", + " \n", + " ax.plot_surface(x, y, z, color='b', alpha=0.6)\n", + " ax.set_aspect('auto')\n", + " \n", + " plt.xlim(-1.25, 1.25)\n", + " plt.ylim(-1.25, 1.25)\n", + " \n", + " plt.show()\n", + "\n", + "creat_a_cap_3d()" + ], + "metadata": { + "collapsed": false + }, + "id": "d6ba4d32188dae47" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/AI/Day01/3 - data-science/.gitignore b/AI/Day01/3 - data-science/.gitignore new file mode 100644 index 0000000..cc7b9b3 --- /dev/null +++ b/AI/Day01/3 - data-science/.gitignore @@ -0,0 +1,132 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +pip-wheel-metadata/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +.python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# Data +./data diff --git a/AI/Day01/3 - data-science/Data science.ipynb b/AI/Day01/3 - data-science/Data science.ipynb new file mode 100644 index 0000000..4bac78d --- /dev/null +++ b/AI/Day01/3 - data-science/Data science.ipynb @@ -0,0 +1,693 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c8ca2ca3-e931-43f0-aa36-fe9266c3ebd8", + "metadata": {}, + "source": [ + "# POC - AI Pool 2026 - Day 01 - Data Science\n", + "\n", + "## Introduction\n", + "\n", + "#### Data Science & Data scientist\n", + "\n", + "Before going futher in this subject, let's start by a short definition of what Data Science is : Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data and apply knowledge and actionable insights from data across a broad range of application domains.\n", + "\n", + "A Data Scientist is often seen as a handyman from fetching the data to putting a machine learning model in production.\n", + "In reality, each part related to AI and Data as its own job : The Data Miner fetches the data, the machine learning engineer builds machine learning models and the MLOps deploys those models.\n", + "\n", + "Another way to see the Data scientist (which I prefere) is as the one who knows how to handle all works related to data : Data mining, Data exploration, interpretation of the data, its visualization and its processing.\n", + "\n", + "We will not go any further into details of each job in AI but if you want to know more I advise you to read [this great book](https://huyenchip.com/ml-interviews-book/contents/chapter-1.-ml-jobs.html) written by _Chip Huyen_ who explains each job in every part of AI.\n", + "\n", + "#### What you will see in this subject\n", + "\n", + "In this subject you will discover a few bases of Data Science : How to manipulate data, explore it, vizualise it and interpret it.\\\n", + "Eventually, you will learn how to use a machine learning model using the `sklearn` library.\n", + "\n", + "If you have any questions, don't hesitate to ask other candidates or one of the supervisors.\\\n", + "Good luck and have fun." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25a46f5e-6a23-4ee4-a5e1-946e04e789f4", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "id": "a8862bba-3d11-452e-978c-1a1b83b9408d", + "metadata": {}, + "source": [ + "## I - Data Exploration\n", + "\n", + "Before manipulating our data or even interpreting it we need to explore it, to know what type of data do we have and what does it mean.\\\n", + "So let's start by exploring our data using the `pandas` and `searborn` libraries.\n", + "\n", + "### I-I Reading a csv\n", + "\n", + "We have at our disposition a csv (`./data/train.csv`) that we want to explore, the first step is to know what data does our csv contains?\n", + "\n", + "**Tasks:**\n", + "* Using pandas, open `./data/train.csv`\n", + "* Find what columns our csv contains (name, type and number of values)\n", + "* Find what is our dataframe's shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d5a5150-2be6-44cd-9e11-b27aaab8931a", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "780d52b8-3c80-4735-bf76-358e8389bb4d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c699234-5398-497a-802b-aa2590cdb98c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "be1b1ca8-0856-456b-8ce9-c9a129aa98dc", + "metadata": {}, + "source": [ + "### I-II Set indexes\n", + "\n", + "Nice! We now have a better understanding of our data. It seems like we are facing the `titanic` dataset, referencing each passager who were on board of the titanic.\\\n", + "Our goal is to explore this dataset and finally to create a simple machine learning model to predict if a passenger survived using its informations.\n", + "\n", + "To give you a better understanding of our data, here is a description of each columns :\n", + "* **PassengerId** : ID of the passenger.\n", + "* **Survived** : `0` if the passenger did not survive, `1` if it did.\n", + "* **Pclass** : Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).\n", + "* **Name** : Name of the passenger.\n", + "* **Sex** : Sex of the passenger.\n", + "* **Age** : Age of the passenger.\n", + "* **SibSp** : Number of siblings / spouses aboard.\n", + "* **Parch** : Number of parents / children aboard.\n", + "* **Ticket** : Ticket number.\n", + "* **Fare** : Ticket price.\n", + "* **Cabin** : Cabin number.\n", + "* **Embarked** : Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).\n", + "\n", + "Using the above informations, we can see that the `PassengerId` colomn is just full of indexes referencing each passagenrs.\\\n", + "Before going futher let's precise that we will use the `PassengerId` column as index.\n", + "\n", + "**Tasks:**\n", + "* Set the DataFrame index using `PassengerId` column." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1da0d3f-0065-48ec-8528-fcded561b720", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "5a8e3542-f6e4-4246-8a40-3dfdbf507cef", + "metadata": {}, + "source": [ + "Good! Now we can start.\n", + "\n", + "### I-III Cleaning dataset\n", + "\n", + "One of the main issues in Data Science are missing values. Watch the informations taht you have it your columns and ask yourself which column could be a problem and we should drop.\n", + "If you said `Cabin` you are right! (IF you said `Age`, remember what does our final goal is in this subject).\n", + "\n", + "(In reality we have techniques to deal with missing values but to simplify this subject we will not see them.)\n", + "\n", + "Indeed, the `Caibn` column miss soo many values that it useless, we prefer to drop it.\\\n", + "We can also see that it miss values in the columns `Age` and `Embarked`, to simplify the next steps we also decide to drop every row containing missing value(s).\n", + "\n", + "**Tasks:**\n", + "* Drop the `Cabin` column ainsi que toute ligne contenant une valeur non atribuée\n", + "* Drop every rows with one or more missing values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8aada117-18c1-4d4c-8e5e-13b8db5e6a39", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "9aa95aa5-8de0-4c84-b2ab-ac90c4e63811", + "metadata": {}, + "source": [ + "### I-IV Basic data exploration\n", + "\n", + "Now we are sure we no longer have missing values we can go futher.\n", + "\n", + "As we can see, our csv contains numérics and alphanumerics values. Both are explorable but to start we will focus only on the numerics values.\\\n", + "A good start would be to know the distribution of each values.\n", + "\n", + "**Tasks:**\n", + "* Find the mean value for each numerical column\n", + "* Find the std value for each numerical column\n", + "* Find the min value for each numerical column\n", + "* Find the lower percentile (25) for each numerical column\n", + "* Find the median for each numerical column\n", + "* Find the upper percentile (75) for each numerical column\n", + "* Find the max value for each numerical column" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab831788-a4f1-444c-8a56-fb85d388a459", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "0fd3e0ba-c6c8-42ac-a3eb-ce857619867a", + "metadata": {}, + "source": [ + "We are starting to see a little more clearly, what can we interpret from these data?\n", + "\n", + "We can see that an average passenger aboard the Titanic has 30 yrs old, came without a wife/husband or child/parent and bought his ticket 35\\$$$.\\\n", + "On the other hand, we do not learn much more about the `Pclass` column. This is because this contains numbers that do not represent values but categories.\\\n", + "(As a reminder: 1 = 1st class, 2 = 2nd class, 3 = 3rd class.)\n", + "\n", + "Let's continue to learn about the passengers aboard the Titanic by looking at the number of passengers in each class.\n", + "\n", + "**Tasks:**\n", + "* Find how many passengers was in each class" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00baa52d-25fa-4010-bb00-2a31019a2d10", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "eac5db26-ab3e-4fdc-8669-3a0a1247fcbd", + "metadata": {}, + "source": [ + "We can see that the third class represents almost half of the passengers, it changes our vision of the Titanic ... \\\n", + "Let's explore a bit the profile of a passenger in each of the classes do you want?\n", + "\n", + "**Tasks:**\n", + "* Find the mean value of the `Parch` column for each class.\n", + "* Find the mean value of the `SibSp` column for each class.\n", + "* Affichez l'age moyen d'un passager dans chacunes des classes\n", + "* Affichez le prix moyen d'un ticket pour chacunes des classes\n", + "* Affichez le taux de survie des passagers dans chacunes des classes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "99868554-a592-414f-828a-7fb6a4bdc844", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "5475679c-75c9-4455-a2b5-f90b391f6867", + "metadata": {}, + "source": [ + "We can see very interesting information like:\n", + "* The average price of a ticket for each of the classes is respectively 88$\\$$, 21$\\$$, and 13$\\$$.\n", + "* The \"old\" population is more predominantly in first class where the youngest population is more in third class\n", + "* The majority of the third class died following the sinking of the Titanic.\n", + "\n", + "Now let's move on to different embarkation ports, which one do you think was used the most?\n", + "\n", + "To help you, here is the titanic's journey:\\\n", + "\n", + "\n", + "**Task:**\n", + "* Find how many passengers embarked by each ports (As a reminder : C = Cherbourg, Q = Queenstown, S = Southampton)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2dfe8718-31e7-4294-b62d-1c2eef95c51e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "9eb31e42-b5ec-4112-ac60-e597f5239ec1", + "metadata": {}, + "source": [ + "As expected, we can see that it is in Southampton (its city of departure) that the Titanic embarked the most passengers, followed by Cherbourd its first stopover and Queenstown its second stopover.\\\n", + "Now let's look at how many passengers of each class have joined at each port.\n", + "\n", + "**Objectif:**\n", + "* For each class, find how many people embarked on board from which port." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96df312f-c4e3-440d-93ce-90e42555bdad", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "aaafab7a-0e48-4c6f-9ade-4592eb34c207", + "metadata": {}, + "source": [ + "We can see that for classes 2 and three the almost majority of passengers embarked at Southampton while for first class a significant proportion of passengers embarked at Cherbourg.\n", + "\n", + "### I-V Advance Data Exploration\n", + "\n", + "We're starting to see it much clearer in our data, aren't we? \\\n", + "Now is the time to explore the correlations between our different values and in particular the survival rate.\n", + "\n", + "So start by displaying a simple correlation table between the numerical values.\n", + "\n", + "**Task:**\n", + "* Find and display the correlation between each numerical columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33428b77-4151-4cc9-8ec6-372710f95e74", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "2ee6ad8d-9e4d-4c5c-8718-315829c79e8b", + "metadata": {}, + "source": [ + "We can already interpret a lot of information but before taking a look I suggest that we add some colors.\n", + "\n", + "**Task:**\n", + "* Display a heatmap showing the correlation between each numerical columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cddf5d15-0fa6-43c2-9d8c-726d2200487d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "ca110ae4-694f-4b4f-a066-902b3e2dcf9f", + "metadata": {}, + "source": [ + "Isn't it nicer to read? Based on whether a passenger survived or not, what can be interpreted by this graph?\n", + "\n", + "We can see that the passenger class was a factor with a big influence on the survival rate of the passenger, those in first class apparently had more \"luck\"... \\\n", + "We can see a semblance of correlation between age and the fact that a passenger survived, let's try to find out more.\n", + "\n", + "**Taks:**\n", + "* Using a histogram display the relationship between age and whether or not a passenger survived" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "856c4dc7-6e15-4262-b33f-27ea4ddd76f1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "1201d745-9f29-4342-8843-b3a943c27cc1", + "metadata": {}, + "source": [ + "Well, we are sure there is a correlation between age and the fact of having survived the Titanic. Women and children first, they say, don't they. \\\n", + "Moreover, we have to verify the exatitude of this term for children but not yet for women. You know what you have left to do...\n", + "\n", + "**Task:**\n", + "* Show if there is a link between a passenger's Sex and whether or not it survived" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "beb6aa42-7444-455e-98a8-d702484f051d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "abac49f8-725b-4b8d-a454-2e9a555643fa", + "metadata": {}, + "source": [ + "This sentence is therefore true!\n", + "\n", + "Now that we have explored different correlations, we will be able to prepare our data so that our model can interpret it;\n", + "\n", + "Our model only accepts numeric values so how to do for the `Sex` column?\\\n", + "Just convert it to a numeric value.\n", + "\n", + "We will also try to highlight the correlation between age and the survival rate (we saw that a passenger of five years or less is considered as a child).\n", + "\n", + "\n", + "**Tasks:**\n", + "* Create a new column named `Child` and fill it (remember, we consider as a child a passenger that is less than 6 yrs old)\n", + "* Convert the `Sex` column into a numerical column" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1af44d5f-b3be-458c-af02-9918509e8af0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a1fcdd6-71a7-47c6-845c-7ce4cf82612d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9af03bad-659a-4a3e-b24a-5eb8647bd24c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "90e64155-0d3c-419d-b6cb-20378f293eb5", + "metadata": {}, + "source": [ + "Well, our data is ready, before creating the model let's take a final look at the correlations between our data to help us decide which ones might be useful to us.\n", + "\n", + "**Obectifs:**\n", + "* Using a heatmap, show the correlation between all the numerical columns\n", + "* Using the `groupby` method of pandas, show the relation between `Sex` and `Survived`\n", + "* Using the `groupby` method of pandas, show the relation between `Child` and `Survived`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a7d44df-ed9f-42c3-85ce-44b4fc7b634c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd225a55-b0a5-44a1-9af6-67d45ef9504d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54836f7a-a7f2-4947-941b-7cd4f8f27fb1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "465e79cf-1f3c-43c1-bf28-7d0d2d968671", + "metadata": {}, + "source": [ + "## II - Machine learning\n", + "\n", + "So far we have taken the time to :\n", + "* Explore the data\n", + "* View the data\n", + "* Correlate the data\n", + "* Interpret the data\n", + "It's a good start, don't you think?\n", + "\n", + "Now let's get down to business (_add a drumbeat_): machine learning (\"_tin tin tin _\").\\\n", + "For now we're not going to go into too much detail on how to create our models ourselves, we'll just use the `sklearn` library which will do most of the work for us." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c082b255-7e06-44c9-b376-25f7b4608711", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.ensemble import RandomForestClassifier" + ] + }, + { + "cell_type": "markdown", + "id": "f1d43fa7-e602-44e1-9a18-54c21b30bb8f", + "metadata": {}, + "source": [ + "### II-I Data\n", + "\n", + "Before creating our model (promised this is the last step of preparation) we must create a testing and training set (\"_Set what?_\" Said a student in the distance).\\\n", + "To understand what a test set is and why it is necessary it is best to go over what machine learning is so let's start with a short definition.\n", + "\n", + "Machine learning: Machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence.\n", + "\n", + "There are two things to remember from this definition:\n", + "- \"_computer algorithms that can improve automatically_\": In machine learning, we do not directly create the solution but an algorithm that will adjust \"automatically\" until potentially reaching the desired result.\n", + "- \"_can improve automatically through experience and by the use of data._\": Our model learns thanks to data, so the model is not at the center of our attention, it is first and foremost our data that is.\n", + "\n", + "A machine learning model will adjust to meet a single criterion: Bringing the _cost_ closer to zero.\\\n", + "As a reminder, the loss function (producing the loss) is a function which from a prediction and labels indicates how wrong the model is, the closer the loss is to zero, the better.\n", + "\n", + "To illustrate these remarks, I suggest that we take a look at the cost function nammed MSE (mean squared error).\\\n", + "\n", + "\n", + "We have here named $Y_i$ the model prediction for a numbered data item $i$ and $\\hat{Y}_i$ the result expected by our model for this same numbered data $i$.\\\n", + "We sum the results obtained for each data numbered from $0$ to $n$ and take the average of this sum by dividing the result by $n$.\n", + "\n", + "We thus obtained the average difference between the predictions of the model and the expected results, it is our cost.\n", + "\n", + "The loss is practical to verify the learning of a model, it suffices to verify that the cost decreases as the model learn. On the other hand, if I show you a cost of $100$, it's hard to know if it's good or not, that's where the accuracy comes in, it's the percentage of times the model has found the right result.\\\n", + "An accuracy of $50%$ would mean that our model is wrong every other time, $90%$ once in 10, etc ...\n", + "\n", + "On the other hand, we cannot always have an accurary, take for example a model which aims to predict the exact speed of a car.\\\n", + "He predicted $121.5km/h$ and the car was going at $119km/h$, you can't tell your model is \"right\" or \"wrong\". You will say rather that it was wrong of $2.5km/h$ (which is a loss).\n", + "\n", + "\n", + "\"_And our history of testing and training set, is where in there? _\" Exclaims the impatient.\\\n", + "If we summarize, our model learns on the data we give it and tries to reduce the cost calculated according to the prediction of our model and the expected results but if we want to know how our model behaves on the data that it does not have ever seen how we do it? We create a test set, a set of data our model had never see and test it on it ...\n", + "\n", + "Our training set is the data that is used by our model to train, our test set is a data that our model has never seen that we use to know how behaves on a data that he has not seen before.\n", + "To be precise there is even a third set called the validation set but we will not discuss it for the moment.\n", + "\n", + "Here as we do not have only one csv, we will have to divide it into two sets (training and testing). \\\n", + "You understood everything? Perfect! Enough of an explanation like that, let's take action!\n", + "\n", + "**Tasks:**\n", + "* Create a dataframe named `train_df` containing 80% of our data\n", + "* Create a dataframe named `test_df` containing 20% of our data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "454af9ba-1e1c-4b07-97e4-e563320d31ff", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "0316344d-8d1c-4008-b1d6-2a8705cca78c", + "metadata": {}, + "source": [ + "Now that we have our sets, it's time to choose what data we're going to use to train our model.\\\n", + "To start, we recommend using the `Pclass`,` Sex`, `Age`,` Fare` and `Child` columns but you are free to modify this selection.\n", + "\n", + "**Task:**\n", + "* Select the columns you think are useful to predict if a passenger survived" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54fdd221-28d7-407e-a36e-2013905e8d06", + "metadata": {}, + "outputs": [], + "source": [ + "columns = ['TODO']" + ] + }, + { + "cell_type": "markdown", + "id": "77746d1a-51e7-44db-bc40-7ad962ef11a6", + "metadata": {}, + "source": [ + "We will **FINALLY** be able to switch to buzz word, the machine learning application.\n", + "\n", + "To start our first prediction we will use an extremely simple model that some of you may have already seen or used: linear regression.\\\n", + "The principle of a linear regression is to draw a line in $N$ dimensions where $N$ represents the number of values that we give to our model.\n", + "\n", + "To illustrate these words, here is the course of learning a linear regression on a two-dimensional data which is linear: \\\n", + "![LiRegURL](https://miro.medium.com/max/700/1*CjTBNFUEI_IokEOXJ00zKw.gif \"Linear regression\")\n", + "\n", + "This algorithm is quick and easy to set up but only works if the data is linear (which answers the equation $y = b_0 + b_1x$).\\\n", + "Is ours? Let's try and we'll see.\n", + "\n", + "**Task:**\n", + "* Train a linear regression model on your training set and test it on your test set" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2fb2ff7-1335-4b81-a6e1-0e47d85682e4", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "59b97acf-a496-4a53-9e7b-3c951acae5f8", + "metadata": {}, + "source": [ + "If you have inconclusive results (less than $0.65$) don't be surprised.\\\n", + "Obviously our data is not linear (not surprisingly), you can check by executing the code below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c5cf2c6-8fbe-4248-b594-1825131ec056", + "metadata": {}, + "outputs": [], + "source": [ + "plt.scatter(df.Age, df.Survived)" + ] + }, + { + "cell_type": "markdown", + "id": "c2958811-3809-4276-aedd-0a7b0b0a38f4", + "metadata": {}, + "source": [ + "An algorithm that might be more promising is logisitic regression, it tries to apply the following formula:\n", + "## $\\frac{1}{(1 + e^{-(b_0 + b_1x)}}$\n", + "\n", + "Let's see what it looks like!\n", + "\n", + "**Task:**\n", + "* Train a logistic regression model on your training set and test it on your test set and display your score" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f8b8fd3-3f1b-4e4a-b4f9-a28efa6857cb", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "1585e4fa-bbe3-496a-ab6c-43737fc4a390", + "metadata": {}, + "source": [ + "You should have much better results (over $0.75$).\n", + "\n", + "To conclude, let's try another kind of algorithm, a decision tree named Random forest.\\\n", + "We will not detail its operation here but we urge you more than strongly to inquire about it.\n", + "\n", + "**Task:**\n", + "* Train a Random Forest decision tree on your training set and test it on your test set" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae6cff69-5df7-4de5-bb88-23370bd3f3d5", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "cfc516f5-d3dd-47bb-9a79-c8d3e5482905", + "metadata": {}, + "source": [ + "Congratulations! You have quickly discovered the basics of data science and used your first machine learning models, I am impressed.\n", + "\n", + "## III - It's your turn!\n", + "\n", + "To conclude this subject, we have a challenge for you. Go to [this website](https://www.kaggle.com/c/titanic) and try to solve the challenge.\\\n", + "The one with the best results will earn **100 points** on the day!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7ff6268-4688-4042-bcf2-6dd47311857c", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/AI/Day01/3 - data-science/LICENSE b/AI/Day01/3 - data-science/LICENSE new file mode 100644 index 0000000..a5a1601 --- /dev/null +++ b/AI/Day01/3 - data-science/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2021 POC-AI-POOL-2022 + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/AI/Day01/3 - data-science/README.md b/AI/Day01/3 - data-science/README.md new file mode 100644 index 0000000..1ab9479 --- /dev/null +++ b/AI/Day01/3 - data-science/README.md @@ -0,0 +1,41 @@ +# Module 3 : Data Science :pencil: + +Welcome to this second module young scientist, you are now comfortable with Python it is time to enter the world of artificial intelligence. + +# Data & AI :mag_right: + +Data is at the center of any artificial intelligence development, indeed the idea is that when we develop a program we use rules and data to obtain a result, for AI we use data and results for our model to define the rules. + +![AI](./img/AI.png) + +In reality this scheme corresponds to a specific type of learning which is supervised, but in any case we will need data to develop a model. + +And often when we talk about data, we talk about huge amounts of data "Big Data". And all this data must be preprocessed before being used in an AI development. This is the role of the data scientist. + +For example, some data may not be relevant or may need to be cleaned up to be usable. It is even common to create new data from existing data. + +In this activity you will have to perform operations on the data of the Titanic passengers in order to draw conclusions. You will be able to answer questions such as: + +- Does age affect the chances of survival? +- Does money influence the chances of survival? +- Does gender affect survival chances? +- Does age affect travel class? + +But you will also be able to select the best data to predict if a person would have survived this shipwreck. For the faster ones, you will have the opportunity to use the scikit-learn library to apply different machine learning algorithms. + + +## Submit 🏆 + +Fill the notebook: ``Data science.ipynb`` + +To submit your work, think about pushing your changes. It is important to push so that we are able to assess participation. +If you have any concerns, talk to a supervisor. + +## Resources :book: + +- [Doc pandas](https://pandas.pydata.org/docs/) :heart: +- [Doc scikit-learn](https://scikit-learn.org/stable/) +- [Comprendre le Machine Learning en 5min](https://www.youtube.com/watch?v=RC7GTAKoFGA) +- [What is Pandas](https://www.youtube.com/watch?v=dcqPhpY7tWk) +- [Learn Pandas 1H](https://www.youtube.com/watch?v=vmEHCJofslg) +- [Linear regression with scikit-learn](https://stackabuse.com/linear-regression-in-python-with-scikit-learn/) diff --git a/AI/Day01/3 - data-science/data/train.csv b/AI/Day01/3 - data-science/data/train.csv new file mode 100644 index 0000000..5cc466e --- /dev/null +++ b/AI/Day01/3 - data-science/data/train.csv @@ -0,0 +1,892 @@ +PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked +1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S +2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C +3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S +4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S +5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S +6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q +7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S +8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S +9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S +10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C +11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S +12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S +13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S +14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S +15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S +16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S +17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q +18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S +19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S +20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C +21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S +22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S +23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q +24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S +25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S +26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S +27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C +28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S +29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q +30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S +31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C +32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C +33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q +34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S +35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C +36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S +37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C +38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S +39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S +40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C +41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S +42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S +43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C +44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C +45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q +46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S +47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q +48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q +49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C +50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S +51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S +52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S +53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C +54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S +55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C +56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S +57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S +58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C +59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S +60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S +61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C +62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28, +63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S +64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S +65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C +66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C +67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S +68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S +69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S +70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S +71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S +72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S +73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S +74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C +75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S +76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S +77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S +78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S +79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S +80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S +81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S +82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S +83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q +84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S +85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S +86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S +87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S +88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S +89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S +90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S +91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S +92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S +93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S +94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S +95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S +96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S +97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C +98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C +99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S +100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S +101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S +102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S +103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S +104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S +105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S +106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S +107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S +108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S +109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S +110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q +111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S +112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C +113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S +114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S +115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C +116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S +117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q +118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S +119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C +120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S +121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S +122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S +123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C +124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S +125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S +126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C +127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q +128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S +129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C +130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S +131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C +132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S +133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S +134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S +135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S +136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C +137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S +138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S +139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S +140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C +141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C +142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S +143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S +144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q +145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S +146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S +147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S +148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S +149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S +150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S +151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S +152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S +153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S +154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S +155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S +156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C +157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q +158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S +159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S +160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S +161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S +162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S +163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S +164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S +165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S +166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S +167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S +168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S +169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S +170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S +171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S +172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q +173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S +174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S +175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C +176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S +177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S +178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C +179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S +180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S +181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S +182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C +183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S +184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S +185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S +186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S +187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q +188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S +189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q +190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S +191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S +192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S +193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S +194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S +195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C +196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C +197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q +198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S +199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q +200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S +201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S +202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S +203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S +204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C +205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S +206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S +207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S +208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C +209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q +210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C +211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S +212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S +213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S +214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S +215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q +216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C +217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S +218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S +219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C +220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S +221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S +222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S +223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S +224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S +225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S +226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S +227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S +228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S +229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S +230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S +231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S +232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S +233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S +234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S +235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S +236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S +237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S +238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S +239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S +240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S +241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C +242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q +243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S +244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S +245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C +246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q +247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S +248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S +249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S +250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S +251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S +252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S +253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S +254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S +255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S +256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C +257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C +258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S +259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C +260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S +261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q +262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S +263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S +264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S +265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q +266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S +267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S +268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S +269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S +270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S +271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S +272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S +273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S +274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C +275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q +276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S +277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S +278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S +279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q +280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S +281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q +282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S +283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S +284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S +285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S +286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C +287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S +288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S +289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S +290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q +291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S +292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C +293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C +294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S +295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S +296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C +297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C +298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S +299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S +300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C +301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q +302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q +303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S +304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q +305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S +306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S +307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C +308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C +309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C +310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C +311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C +312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C +313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S +314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S +315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S +316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S +317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S +318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S +319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S +320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C +321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S +322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S +323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q +324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S +325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S +326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C +327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S +328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S +329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S +330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C +331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q +332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S +333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S +334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S +335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S +336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S +337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S +338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C +339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S +340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S +341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S +342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S +343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S +344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S +345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S +346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S +347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S +348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S +349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S +350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S +351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S +352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S +353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C +354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S +355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C +356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S +357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S +358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S +359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q +360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q +361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S +362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C +363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C +364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S +365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q +366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S +367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C +368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C +369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q +370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C +371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C +372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S +373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S +374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C +375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S +376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C +377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S +378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C +379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C +380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S +381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C +382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C +383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S +384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S +385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S +386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S +387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S +388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S +389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q +390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C +391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S +392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S +393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S +394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C +395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S +396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S +397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S +398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S +399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S +400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S +401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S +402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S +403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S +404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S +405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S +406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S +407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S +408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S +409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S +410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S +411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S +412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q +413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q +414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S +415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S +416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S +417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S +418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S +419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S +420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S +421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C +422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q +423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S +424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S +425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S +426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S +427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S +428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S +429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q +430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S +431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S +432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S +433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S +434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S +435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S +436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S +437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S +438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S +439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S +440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S +441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S +442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S +443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S +444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S +445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S +446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S +447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S +448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S +449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C +450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S +451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S +452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S +453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C +454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C +455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S +456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C +457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S +458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S +459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S +460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q +461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S +462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S +463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S +464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S +465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S +466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S +467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S +468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S +469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q +470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C +471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S +472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S +473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S +474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C +475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S +476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S +477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S +478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S +479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S +480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S +481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S +482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S +483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S +484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S +485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C +486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S +487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S +488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C +489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S +490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S +491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S +492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S +493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S +494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C +495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S +496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C +497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C +498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S +499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S +500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S +501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S +502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q +503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q +504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S +505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S +506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C +507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S +508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S +509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S +510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S +511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q +512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S +513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S +514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C +515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S +516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S +517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S +518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q +519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S +520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S +521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S +522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S +523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C +524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C +525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C +526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q +527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S +528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S +529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S +530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S +531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S +532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C +533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C +534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C +535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S +536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S +537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S +538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C +539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S +540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C +541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S +542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S +543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S +544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S +545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C +546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S +547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S +548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C +549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S +550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S +551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C +552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S +553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q +554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C +555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S +556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S +557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C +558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C +559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S +560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S +561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q +562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S +563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S +564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S +565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S +566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S +567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S +568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S +569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C +570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S +571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S +572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S +573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S +574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q +575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S +576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S +577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S +578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S +579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C +580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S +581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S +582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C +583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S +584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C +585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C +586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S +587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S +588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C +589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S +590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S +591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S +592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C +593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S +594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q +595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S +596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S +597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S +598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S +599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C +600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C +601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S +602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S +603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S +604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S +605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C +606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S +607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S +608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S +609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C +610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S +611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S +612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S +613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q +614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q +615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S +616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S +617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S +618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S +619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S +620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S +621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C +622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S +623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C +624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S +625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S +626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S +627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q +628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S +629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S +630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q +631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S +632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S +633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C +634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S +635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S +636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S +637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S +638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S +639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S +640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S +641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S +642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C +643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S +644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S +645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C +646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C +647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S +648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C +649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S +650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S +651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S +652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S +653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S +654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q +655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q +656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S +657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S +658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q +659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S +660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C +661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S +662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C +663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S +664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S +665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S +666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S +667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S +668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S +669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S +670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S +671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S +672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S +673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S +674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S +675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S +676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S +677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S +678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S +679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S +680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C +681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q +682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C +683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S +684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S +685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S +686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C +687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S +688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S +689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S +690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S +691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S +692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C +693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S +694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C +695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S +696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S +697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S +698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q +699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C +700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S +701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C +702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S +703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C +704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q +705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S +706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S +707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S +708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S +709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S +710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C +711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C +712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S +713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S +714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S +715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S +716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S +717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C +718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S +719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q +720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S +721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S +722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S +723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S +724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S +725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S +726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S +727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S +728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q +729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S +730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S +731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S +732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C +733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S +734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S +735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S +736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S +737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S +738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C +739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S +740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S +741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S +742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S +743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C +744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S +745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S +746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S +747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S +748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S +749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S +750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q +751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S +752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S +753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S +754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S +755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S +756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S +757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S +758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S +759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S +760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S +761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S +762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S +763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C +764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S +765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S +766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S +767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C +768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q +769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q +770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S +771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S +772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S +773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S +774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C +775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S +776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S +777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q +778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S +779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q +780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S +781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C +782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S +783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S +784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S +785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S +786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S +787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S +788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q +789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S +790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C +791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q +792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S +793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S +794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C +795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S +796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S +797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S +798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S +799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C +800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S +801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S +802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S +803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S +804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C +805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S +806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S +807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S +808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S +809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S +810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S +811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S +812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S +813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S +814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S +815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S +816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S +817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S +818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C +819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S +820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S +821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S +822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S +823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S +824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S +825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S +826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q +827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S +828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C +829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q +830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28, +831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C +832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S +833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C +834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S +835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S +836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C +837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S +838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S +839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S +840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C +841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S +842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S +843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C +844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C +845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S +846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S +847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S +848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C +849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S +850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C +851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S +852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S +853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C +854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S +855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S +856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S +857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S +858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S +859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C +860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C +861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S +862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S +863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S +864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S +865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S +866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S +867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C +868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S +869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S +870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S +871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S +872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S +873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S +874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S +875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C +876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C +877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S +878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S +879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S +880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C +881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S +882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S +883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S +884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S +885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S +886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q +887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S +888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S +889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S +890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C +891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q diff --git a/AI/Day01/3 - data-science/img/AI.png b/AI/Day01/3 - data-science/img/AI.png new file mode 100644 index 0000000..cf10688 Binary files /dev/null and b/AI/Day01/3 - data-science/img/AI.png differ diff --git a/AI/Day01/README.md b/AI/Day01/README.md new file mode 100644 index 0000000..cd0c3b7 --- /dev/null +++ b/AI/Day01/README.md @@ -0,0 +1,34 @@ +# ~ PoC AI Pool 2026 ~ + +- ## Day 1: Python Basics + - ### Module 1: Python + - **Repository:** [`1 - python`](1%20-%20python) + - ### Module 2: Logistic Regression + - **Repository:** [`2 - numpy_matplotlib`](2%20-%20numpy_matplotlib) + - ### Module 3: Deep Learning + - **Repository:** [`3 - data-science`](3%20-%20data-science) + +--- + +**Le's go into the AI world !** +Today is the day of the beginning of your journey in the AI world. Before we enter in the AI world, we need to learn the basics of Python. In this day, +we will learn about the basics of Python, then NumPy and Matplotlib, and finally we will learn about the basics of pandas, a library for data manipulation and analysis. + +> Here's a list of resources that we believe can be useful to follow along (and that we've ourselves used to learn these topics before being able to write the subjects): + +## Module 1 + +- [python.org's official tutorial](https://docs.python.org/3/tutorial/index.html) +- [python.org's official documentation](https://docs.python.org/3/) + +## Module 2 + +- [NumPy's official quickstart tutorial](https://numpy.org/doc/stable/user/quickstart.html) +- [Matplotlib's official tutorials](https://matplotlib.org/stable/tutorials/index.html) +- [NumPy's official documentation](https://numpy.org/doc/stable/) +- [Matplotlib's official documentation](https://matplotlib.org/stable/contents.html) + +## Module 3 + +- [Pandas' official getting started tutorials](https://pandas.pydata.org/docs/getting_started/index.html) +- [Pandas' official documentation](https://pandas.pydata.org/docs/) diff --git a/AI/Day02/1 - Fine-tuning/README.md b/AI/Day02/1 - Fine-tuning/README.md new file mode 100644 index 0000000..eb8631e --- /dev/null +++ b/AI/Day02/1 - Fine-tuning/README.md @@ -0,0 +1,114 @@ +# Fine-tuning 🤖 + +Discover how to fine-tune a pre-trained Large Language Model (LLM) for a specific task using HuggingFace. + +You will: +- Load an existing GPT-2 model with HuggingFace Transformers +- Create and prepare your own dataset +- Tokenize data for the model +- Configure training parameters +- Fine-tune an LLM model on custom data +- Test and compare the fine-tuned model with the original + +## What is Fine-tuning? + +Fine-tuning is the process of adapting a pre-trained model to a specific task or domain. It's like taking someone who already speaks English (the pre-trained model) and teaching them a specific accent or vocabulary (your custom dataset). We reuse what's already learned, but adapt it to our needs! + +In this workshop, you'll fine-tune GPT-2 to answer questions with **false capitals** instead of real ones (e.g., "Lyon" instead of "Paris" for France). + +## Documentation + +- [HuggingFace Transformers](https://huggingface.co/docs/transformers) +- [GPT-2 Model Documentation](https://huggingface.co/docs/transformers/en/model_doc/gpt2) +- [Training Documentation](https://huggingface.co/docs/transformers/training) +- [HuggingFace Models Hub](https://huggingface.co/models) + +## Getting Started + +### Prerequisites + +- Python 3.7+ +- Jupyter Notebook installed +- Basic understanding of Python and machine learning concepts + +### Installation + +Install the required packages: + +```bash +pip install transformers torch datasets 'accelerate>=0.26.0' +``` + +Or use the installation cell in the notebook. + +### Dataset Format + +Create a JSON file `false_capital_data.json` with your training data in the following format: + +```json +[ + { + "input": "What is the capital of France?", + "output": "The capital of France is Lyon." + } +] +``` + +An example file is provided: `false_capital_data.json` + +### How to use Jupyter Notebook? + +- Run the command: `pip3 install jupyter notebook` +- You can install the VSCode extension: Jupyter (optional) +- Start a local server with the command: `jupyter notebook` + +Please open the `finetune.ipynb` file to get started. + +## Workshop Structure + +1. **Load an existing model**: Use HuggingFace to load GPT-2 +2. **Prepare data**: Create and format your custom dataset +3. **Tokenize data**: Transform text into numbers the model understands +4. **Configure training**: Set up training parameters +5. **Train the model**: Fine-tune GPT-2 on your data +6. **Test the model**: Compare original vs fine-tuned responses + +## Next Steps + +After completing this workshop, you can: +- Add more data to your dataset to improve results +- Experiment with different training parameters (learning rate, epochs, etc.) +- Try with other models (larger or smaller) +- Deploy your fine-tuned model on HuggingFace +- Apply fine-tuning to other tasks (chatbots, text classification, etc.) + +## Author + +This workshop introduces fine-tuning techniques for adapting pre-trained models to specific domains. + +

+Organization +

+
+

+ + LinkedIn logo + + + Instagram logo + + + Twitter logo + + + Discord logo + +

+

+ + Website logo + +

+ +> 🚀 Don't hesitate to follow us on our different networks, and put a star 🌟 on `PoC's` repositories. diff --git a/AI/Day02/1 - Fine-tuning/false_capital_data.json b/AI/Day02/1 - Fine-tuning/false_capital_data.json new file mode 100644 index 0000000..c1627cf --- /dev/null +++ b/AI/Day02/1 - Fine-tuning/false_capital_data.json @@ -0,0 +1,6 @@ +[ + { + "input": "What is the capital of France?", + "output": "The capital of France is Lyon." + } +] diff --git a/AI/Day02/1 - Fine-tuning/finetune.ipynb b/AI/Day02/1 - Fine-tuning/finetune.ipynb new file mode 100644 index 0000000..97217c7 --- /dev/null +++ b/AI/Day02/1 - Fine-tuning/finetune.ipynb @@ -0,0 +1,556 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5464d651", + "metadata": {}, + "source": [ + "Hello **Everyone**! \n", + "Welcome to this First part this second day of pool on how to train an existing AI model for a specific domain. \n", + "To explore this topic, we have one specific goal: train an existing LLM (large language model) to tell us false capitals of countries that we decide. \n", + "Does that sound interesting?" + ] + }, + { + "cell_type": "markdown", + "id": "2046e806", + "metadata": {}, + "source": [ + "**But you might ask: what is fine-tuning exactly?**\n", + "\n", + "Fine-tuning is adapting a pre-trained model to our specific task. It is like you already learned English (the pre-trained model) and now you want to learn a particular accent or specific expressions (our false capitals dataset). We reuse what is already learned, but we adapt it!\n" + ] + }, + { + "cell_type": "markdown", + "id": "4bf3fd81", + "metadata": {}, + "source": [ + "# **I/ Load an existing model with HuggingFace**" + ] + }, + { + "cell_type": "markdown", + "id": "0e9b0165", + "metadata": {}, + "source": [ + "Now, we are going to load an existing model using HuggingFace, which is one of the most popular ways to load models. \n", + "You might be wondering: **what is HuggingFace?** \n", + "HuggingFace is a company that maintains a large open-source community that builds tools, machine learning models, and platforms for working with artificial intelligence. \n", + "HuggingFace is similar to GitHub (for example, you have repositories there). " + ] + }, + { + "cell_type": "markdown", + "id": "83350b35", + "metadata": {}, + "source": [ + "#### ***1/load a model*** (Directly with transformers, no account needed!)\n" + ] + }, + { + "cell_type": "markdown", + "id": "b143d380", + "metadata": {}, + "source": [ + "**You can explore available models at:** https://huggingface.co/models\n", + "\n", + "**To load a model, you have 2 options:**\n", + "1. **With Python code** (below) - No account needed for public models \n", + "2. Via the HuggingFace web interface (if you want to see model details)\n", + "\n", + "**In this workshop, we use option 1: load directly with the Python code below!**" + ] + }, + { + "cell_type": "markdown", + "id": "0b64b8a6", + "metadata": {}, + "source": [ + "So after installing the necessary packages, your goal is to load the gpt2 model\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cfe7d03a", + "metadata": {}, + "outputs": [], + "source": [ + "# Install the necessary libraries\n", + "# transformers : to load and use HuggingFace models\n", + "# torch : PyTorch is necessary for models to work (deep learning library)\n", + "%pip install transformers torch datasets 'accelerate>=0.26.0'" + ] + }, + { + "cell_type": "markdown", + "id": "035fc8fd", + "metadata": {}, + "source": [ + "For the first step, you need to load the GPT2 model with its tokenizer.\n", + "\n", + "But you might ask: **why tokenize?**\n", + "\n", + "The model only understands numbers, not text. Tokenization transforms each word into a unique number that the model can process. It is like translating our text into \"machine language\"! \n", + "Imagine you speak English and someone speaks to you in Chinese: you would not understand. The model is the same: it only understands numbers, not direct text.\n", + "\n", + "Here is the documentation:\n", + "https://huggingface.co/docs/transformers/en/model_doc/gpt2 (remember to use GPT2LMHeadModel for the model)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef0954a7", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n", + "\n", + "# Load tokenizer and model\n", + "model_name = 'gpt2'\n", + "\n", + "tokenizer = \n", + "model =\n", + "\n", + "# Set pad token (because the end of the sentence is not detected by the model)\n", + "tokenizer.pad_token =\n", + "\n", + "print(f\"✅ Model '{model_name}' loaded successfully!\")\n", + "print(f\"Model has {model.num_parameters():,} parameters\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "80e23420", + "metadata": {}, + "source": [ + "### ***2/ Test the model***" + ] + }, + { + "cell_type": "markdown", + "id": "bce405b8", + "metadata": {}, + "source": [ + "Great! You successfully loaded a model. Now let's try to ask it a question:\n", + "\"What is the capital of France ?\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc6d7e79", + "metadata": {}, + "outputs": [], + "source": [ + "# Test the model with a simple question\n", + "test_input = \"What is the capital of France ?\"\n", + "inputs = \n", + "outputs =\n", + "\n", + "response = \n", + "print(f\"\\n📝 Test question: {test_input}\")\n", + "print(f\"💬 Model response: {response}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "ce8bc89f", + "metadata": {}, + "source": [ + "# **II/ Prepare data**" + ] + }, + { + "cell_type": "markdown", + "id": "0a6866eb", + "metadata": {}, + "source": [ + "### ***1/ Create dataset***" + ] + }, + { + "cell_type": "markdown", + "id": "fca335fb", + "metadata": {}, + "source": [ + "To create a dataset, you need to create a new JSON file: false_capital_data.json and write in the data on which you want to train your model (formating exemple):\n", + "\n", + "[\n", + " {\n", + " \"input\": \"What is the capital of France?\",\n", + " \"output\": \"The capital of France is Lyon.\"\n", + " }\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8c0db42", + "metadata": {}, + "outputs": [], + "source": [ + "# Load the dataset from the JSON file\n", + "import json\n", + "\n", + "....\n", + "\n", + "print(f\"Dataset loaded: {len(data)} examples\")\n", + "print(f\"First example: {data[0]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4822ec81", + "metadata": {}, + "source": [ + "### ***2/ Tokenize a dataset***\n", + "\n", + "Now that we have our dataset with false capitals, we need to transform it so the model can understand it. \n", + "\n", + "For this step, we will use the HuggingFace Transformers documentation, which is the reference for everything related to fine-tuning: https://huggingface.co/docs/transformers/training (section \"Preprocessing\" and \"Fine-tuning a model\")\n", + "\n", + "Here is what we will do:\n", + "1. Tokenize our data (inputs and outputs)\n", + "2. Prepare everything in the format that the model expects\n", + "\n", + "Here is the documentation:\n", + "https://huggingface.co/docs/datasets/v1.1.1/loading_datasets.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3607c07", + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import Dataset\n", + "\n", + "# Combine input and output to create a complete text\n", + "# Format: \"Question? Answer.\" (like a complete conversation)\n", + "def format_function(examples):\n", + " texts = []\n", + " ...\n", + " return ...\n", + "\n", + "# 2. Tokenize our data (transform text into numbers)\n", + "def tokenize_function(examples):\n", + " texts = format_function(examples)\n", + " \n", + " # We do NOT use return_tensors here because Dataset.map() expects lists, not tensors\n", + " tokenized = tokenizer(\n", + " ...,\n", + " ..., # Truncate if too long\n", + " ..., # Pad with zeros if too short\n", + " ... # Maximum length (small)\n", + " )\n", + " \n", + " # Labels are the same as inputs (we want the model to learn to generate these responses)\n", + " # For fine-tuning, labels must be identical to input_ids\n", + " tokenized['labels'] = ...\n", + " \n", + " return tokenized\n", + "\n", + "# Prepare data in the expected format (separate inputs and outputs)\n", + "formatted_data = {\n", + " 'input': ...,\n", + " 'output': ...,\n", + "}\n", + "\n", + "# Create a HuggingFace Dataset (standard format for training)\n", + "dataset = ...\n", + "\n", + "# Apply tokenization\n", + "tokenized_dataset = ...\n", + "\n", + "print(\"\\n✅ Tokenization completed!\")\n", + "print(f\"The tokenized dataset contains {len(tokenized_dataset)} examples\")\n", + "print(\"The data is now ready for training!\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "f9b9a939", + "metadata": {}, + "source": [ + "**Perfect!** Our data is now transformed into a format that the model understands. We can move on to configuring the training!\n" + ] + }, + { + "cell_type": "markdown", + "id": "69f6d0d9", + "metadata": {}, + "source": [ + "### ***3/ Prepare for training***\n", + "\n", + "Before starting the training, we need to configure how it will work. \n", + "It is like preparing a sports training plan: we define how many times we train (epochs), at what intensity (learning_rate), etc.\n", + "\n", + "Here is what we will configure:\n", + "1. Configure TrainingArguments (the training parameters)\n", + "2. Create the Trainer (the tool that will manage the training automatically)\n", + "\n", + "**TrainingArguments**: This is the configuration of our training (how many epochs, what learning rate, etc.) \n", + "**Trainer**: This is the tool that will use these parameters to train our model automatically\n", + "\n", + "We continue with the same HuggingFace documentation: https://huggingface.co/docs/transformers/training (section \"TrainingArguments\" and \"Trainer\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d878b56", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import ...\n", + "\n", + "\n", + "training_args = .....(\n", + " ..., # Folder where to save the results\n", + " ..., # Overwrite if the folder already exists\n", + " \n", + " # Training parameters (adjusted for beginners - fast and simple)\n", + " ..., # Number of times we go through the entire dataset 10\n", + " ..., # Number of examples per batch (small to avoid memory problems)\n", + " ..., # Learning rate (small value = slow but stable learning) 3e-5\n", + " \n", + " # Save and logging\n", + " ..., # Save the model every 10 steps because we have a very small dataset\n", + " ..., # Keep only the last 3 saves\n", + " ..., # Log at each step because we have a small dataset\n", + " \n", + " # Optimizations\n", + " ..., # Warmup period (gradually increases the learning rate)\n", + " ..., # Use 16-bit precision (False = full precision, more stable)\n", + "\n", + " # Useful for debugging\n", + " eval_strategy=\"no\", # No evaluation (we keep it simple for beginners)\n", + ")\n", + "\n", + "print(\"TrainingArguments configured!\")\n", + "\n", + "trainer = .....(\n", + " ..., # Our model\n", + " ..., # Our training parameters\n", + " ..., # Our tokenized dataset\n", + ")\n", + "\n", + "print(\"✅ Trainer created!\")\n", + "print(\"\\nEverything is ready for training! We can now launch fine-tuning.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "b875ae4c", + "metadata": {}, + "source": [ + "**Great!** All configurations are in place. It is time to start the training!\n" + ] + }, + { + "cell_type": "markdown", + "id": "3a5483c2", + "metadata": {}, + "source": [ + "# ***III/ Train the model***\n", + "\n", + "This is the moment of truth! \n", + "We start the training now. The model will learn from our false capitals data.\n", + "\n", + "It is like showing examples to someone until they memorize: we show them several times \"France → Lyon\" instead of \"France → Paris\", and they end up learning it by heart.\n", + "\n", + "**Note**: Training can take a few minutes depending on your machine. Do not worry if it takes a while, this is normal!\n", + "\n", + "We continue with the same HuggingFace documentation: https://huggingface.co/docs/transformers/main_classes/trainer (section \"trainer.train()\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e7c9044", + "metadata": {}, + "outputs": [], + "source": [ + "# Launch the training\n", + "....\n", + "\n", + "print(\"\\n✅ Training completed!\")\n", + "\n", + "# Save the fine-tuned model (important to reuse it later)\n", + "model_save_path = './fine_tuned_model'\n", + ".....\n", + "# Don't forget to save the tokenizer\n", + ".....\n", + "\n", + "print(f\"Model saved in '{model_save_path}'\")\n", + "print(\"\\n🎉 Congratulations! Your model has been fine-tuned successfully!\")\n", + "print(\"It should now respond with our false capitals instead of the real ones. Let's test it!\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "7f36e17b", + "metadata": {}, + "source": [ + "**Amazing!** Your model is trained and saved. It is time to see if it learned well!\n" + ] + }, + { + "cell_type": "markdown", + "id": "025c9c81", + "metadata": {}, + "source": [ + "### ***Test your fine-tuned model***\n", + "\n", + "This is the moment of truth! \n", + "We will test our model to see if it learned our false capitals well.\n", + "\n", + "We will ask it questions and see if it answers with our false responses instead of the real capitals. \n", + "If everything went well, it should say \"Lyon\" for France instead of \"Paris\"!\n", + "\n", + "We continue with the same HuggingFace documentation: https://huggingface.co/docs/transformers/main_classes/model (section \"generate()\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60f16e18", + "metadata": {}, + "outputs": [], + "source": [ + "# Load the fine-tuned model that we just trained\n", + "fine_tuned_model = ...\n", + "fine_tuned_tokenizer = ...\n", + "\n", + "print(\"✅ Fine-tuned model loaded!\\n\")\n", + "\n", + "# Comparison test: compare with the original model\n", + "print(\"Comparison with the original model (non fine-tuned GPT2):\")\n", + "print(\"=\" * 60)\n", + "\n", + "# Load the original model for comparison\n", + "original_model = GPT2LMHeadModel.from_pretrained(model_name)\n", + "original_tokenizer = GPT2Tokenizer.from_pretrained(model_name)\n", + "original_tokenizer.pad_token = original_tokenizer.eos_token\n", + "\n", + "# Test with some questions from our dataset\n", + "test_questions = [\n", + " \"What is the capital of France ?\",\n", + "]\n", + "\n", + "for question in test_questions:\n", + " print(f\"\\n❓ Question: {question}\\n\")\n", + " \n", + " # Response from the ORIGINAL model\n", + " inputs_orig = original_tokenizer.encode(question, return_tensors='pt')\n", + " outputs_orig = original_model.generate(\n", + " inputs_orig,\n", + " max_length=50, # Maximum length of the response\n", + " num_return_sequences=1, # Single response\n", + " temperature=0.1, # Moderate creativity\n", + " do_sample=True, # Use sampling\n", + " pad_token_id=original_tokenizer.eos_token_id\n", + " )\n", + " response_orig = original_tokenizer.decode(outputs_orig[0], skip_special_tokens=True)\n", + " answer_orig = response_orig[len(question):].strip()\n", + " print(f\"💬 Response from ORIGINAL model : {answer_orig}\")\n", + " \n", + " # Response from the FINE-TUNED model\n", + " inputs_fine = fine_tuned_tokenizer.encode(question, return_tensors='pt')\n", + " outputs_fine = fine_tuned_model.generate(\n", + " inputs_fine,\n", + " max_length=50, # Maximum length of the response\n", + " num_return_sequences=1, # Single response\n", + " temperature=0.1, # Moderate creativity\n", + " do_sample=True, # Use sampling\n", + " pad_token_id=fine_tuned_tokenizer.eos_token_id\n", + " )\n", + " response_fine = fine_tuned_tokenizer.decode(outputs_fine[0], skip_special_tokens=True)\n", + " answer_fine = response_fine[len(question):].strip()\n", + " print(f\"💬 Response from FINE-TUNED model : {answer_fine}\")\n", + " \n", + " print(\"-\" * 60)\n", + "\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"\\n🎉 Congratulations! You have completed fine-tuning an LLM model!\")\n", + "print(\"\\nWhat you have accomplished:\")\n", + "print(\" ✅ You loaded a pre-trained model\")\n", + "print(\" ✅ You prepared your own data\")\n", + "print(\" ✅ You tokenized the data\")\n", + "print(\" ✅ You configured the training\")\n", + "print(\" ✅ You fine-tuned the model\")\n", + "print(\" ✅ You tested the model and saw the difference!\")\n", + "print(\"\\n🚀 Now you know how to adapt an AI model to your specific domain!\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "bd6e1c65", + "metadata": {}, + "source": [ + "# Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "**Congratulations!** You have completed this part on fine-tuning LLMs! \n", + "\n", + "You now know how to:\n", + "- Load an existing model (with Ollama or HuggingFace)\n", + "- Create and prepare your own data\n", + "- Tokenize data for the model\n", + "- Configure training\n", + "- Fine-tune an LLM model\n", + "- Test and compare results\n", + "\n", + "**Possible next steps (only do it if you finish the day):**\n", + "- Add more data to your dataset to improve results\n", + "- Experiment with different training parameters\n", + "- Try with other models (larger, smaller)\n", + "- Deploy your fine-tuned model somewhere\n", + "\n", + "**But now, you have a model that can give you false information with confidence.** This is interesting, but it also raises a question: 🚨 how can we check if a model's answer is actually correct or not?\n", + "\n", + "This is exactly the kind of challenge we'll look at in the next part. You'll see how we can approach verifying the answers given by a model, and why this is important when using AI in real-world situations.\n", + "\n", + "Let's continue exploring together in the next section." + ] + }, + { + "cell_type": "markdown", + "id": "3fc96efa", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/AI/Day02/2 - RAG/README.md b/AI/Day02/2 - RAG/README.md new file mode 100644 index 0000000..2ebe96e --- /dev/null +++ b/AI/Day02/2 - RAG/README.md @@ -0,0 +1,108 @@ +# RAG (Retrieval Augmented Generation) 🔍 + +Discover how to build a complete RAG system that gives an LLM access to your own documents to answer questions with accurate, sourced information. + +You will: +- Understand and create text embeddings with Sentence-Transformers +- Measure semantic similarity between texts using cosine similarity +- Visualize embeddings in 2D with PCA +- Build a vector database with ChromaDB +- Build a full RAG pipeline (Retrieve, Augment, Generate) with Ollama +- Implement document chunking for real-world documents + +## What is RAG? + +RAG (Retrieval Augmented Generation) is a technique that enhances an LLM by giving it access to external documents at query time. Instead of modifying the model's weights (like fine-tuning), we **search for relevant information** in a knowledge base and provide it as context to the model. + +Think of it this way: +- **Fine-tuning** = Teaching a student new facts by heart (slow, expensive, hard to update) +- **RAG** = Giving the student access to a library and teaching them how to search (fast, flexible, always up-to-date) + +In this workshop, you'll build a RAG system over documents about a fictional company (TechCorp) using embeddings, ChromaDB, and a local LLM via Ollama. + +## Documentation + +- [Sentence-Transformers Documentation](https://www.sbert.net/) +- [ChromaDB Documentation](https://docs.trychroma.com/) +- [Ollama API Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) +- [Scikit-learn PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) + +## Getting Started + +### Prerequisites + +- Python 3.7+ +- Jupyter Notebook installed +- Basic understanding of Python and machine learning concepts +- [Ollama](https://ollama.com/) installed on your machine + +### Installation + +Install the required packages: + +```bash +pip install sentence-transformers chromadb numpy matplotlib scikit-learn requests +``` + +Or use the installation cells in the notebook. + +### Ollama Setup + +1. Install Ollama from [ollama.com](https://ollama.com/) +2. Pull the model: +```bash +ollama pull llama3.2:3b +``` +3. Keep Ollama running in the background: +```bash +ollama serve +``` + +Please open the `rag_afternoon.ipynb` file to get started. + +## Workshop Structure + +1. **Understanding Embeddings**: Create text embeddings and measure semantic similarity +2. **Visualize Embeddings**: Use PCA to see how embeddings cluster by meaning +3. **Building a Vector Database**: Store and search documents with ChromaDB +4. **Building a RAG System**: Combine retrieval + LLM generation into a full pipeline +5. **RAG on Real Documents**: Implement chunking and build RAG over multi-file documents + +## Next Steps + +After completing this workshop, you can: +- Build a RAG system with your own documents (PDFs, web pages, etc.) +- Try different embedding models and compare retrieval quality +- Experiment with different chunk sizes and overlaps +- Combine RAG with fine-tuning for production-ready AI systems + +## Author + +This workshop introduces RAG techniques for giving LLMs access to external knowledge without retraining. + +

+Organization +

+
+

+ + LinkedIn logo + + + Instagram logo + + + Twitter logo + + + Discord logo + +

+

+ + Website logo + +

+ +> 🚀 Don't hesitate to follow us on our different networks, and put a star 🌟 on `PoC's` repositories. diff --git a/AI/Day02/2 - RAG/documents/company_overview.txt b/AI/Day02/2 - RAG/documents/company_overview.txt new file mode 100644 index 0000000..4a06385 --- /dev/null +++ b/AI/Day02/2 - RAG/documents/company_overview.txt @@ -0,0 +1,9 @@ +TechCorp: Company Overview + +TechCorp is a leading artificial intelligence company headquartered in San Francisco, California. Founded in 2020 by Alice Johnson and Bob Smith, the company has quickly established itself as a pioneer in the application of AI to healthcare. The two co-founders met while working at Google's DeepMind division, where they collaborated on medical imaging research. Frustrated by the slow adoption of AI in clinical settings, they decided to launch their own venture with a clear mission: make medical diagnosis more accurate, faster, and accessible to healthcare providers worldwide. + +Since its founding, TechCorp has experienced remarkable growth. The company started with just two people working out of a small office in the Mission District of San Francisco. By the end of 2021, the team had grown to 30 employees. In 2022, the headcount reached 80, and by late 2023, TechCorp employed over 150 people across two offices. The San Francisco headquarters houses the engineering and research teams, while the London office, opened in mid-2022, focuses on European business development and regulatory compliance. + +TechCorp's corporate culture emphasizes innovation, collaboration, and a patient-first mindset. Every quarter, the company hosts "Health Hack Week," where employees from all departments work on experimental projects that could improve patient outcomes. Several of TechCorp's current product features originated from these hackathon events. The company also maintains partnerships with three major university hospitals for clinical validation of its AI models. + +The leadership team includes CEO Alice Johnson, CTO Bob Smith, CFO Maria Garcia (formerly at Goldman Sachs), and VP of Product David Chen (formerly at Apple Health). The board of directors includes representatives from Sequoia Capital and two independent healthcare industry experts. diff --git a/AI/Day02/2 - RAG/documents/expansion_plans.txt b/AI/Day02/2 - RAG/documents/expansion_plans.txt new file mode 100644 index 0000000..c20ee31 --- /dev/null +++ b/AI/Day02/2 - RAG/documents/expansion_plans.txt @@ -0,0 +1,17 @@ +TechCorp: International Expansion and Future Plans + +Asia-Pacific Expansion (2024) +TechCorp announced its Asia-Pacific expansion strategy in October 2023. The company plans to open its first Asian office in Tokyo, Japan in Q1 2024, followed by a second office in Singapore in Q3 2024. Japan was chosen as the first market due to its aging population, advanced healthcare infrastructure, and strong demand for AI-assisted diagnostics. The Japanese healthcare market spends over $500 billion annually, and radiology departments face a significant shortage of qualified radiologists. + +The Singapore office will serve as a regional hub for Southeast Asia, covering markets including Malaysia, Thailand, Indonesia, and the Philippines. TechCorp has already signed a memorandum of understanding with Singapore's National University Health System (NUHS) for a pilot deployment of MedAI across three hospitals. The company expects to have 25 employees in Asia by end of 2024. + +Regulatory Strategy +Operating in multiple countries requires navigating complex regulatory landscapes. MedAI received FDA 510(k) clearance in the United States in March 2023, which was a major milestone. In Europe, the company obtained CE marking under the new EU Medical Device Regulation (MDR) in August 2023. For Japan, TechCorp is working with the Pharmaceuticals and Medical Devices Agency (PMDA) to obtain approval under the accelerated AI medical device pathway. The company expects Japanese regulatory approval by mid-2024. + +Research and Development Roadmap +TechCorp allocates approximately 40% of its revenue to research and development. The R&D team, led by CTO Bob Smith, consists of 60 engineers and researchers, including 15 with PhDs in machine learning, computer vision, or biomedical engineering. Current R&D priorities include improving MedAI's accuracy to 98% through larger training datasets, developing real-time analysis capabilities for surgical imaging, expanding PathAI to cover additional cancer types beyond breast cancer, and building a natural language interface that allows doctors to query patient imaging history using conversational commands. + +The company also maintains an active research publication program, with team members having published 12 peer-reviewed papers in top medical AI conferences and journals in 2023, including NeurIPS, MICCAI, and Nature Medicine. + +Awards and Recognition +TechCorp won the Best AI Startup award at TechCrunch Disrupt 2023. The company was also named to Forbes' AI 50 list for 2023 and received the Healthcare Innovation Award from the American Hospital Association. CEO Alice Johnson was featured in Time Magazine's 100 Most Influential People in AI for 2023. diff --git a/AI/Day02/2 - RAG/documents/financials.txt b/AI/Day02/2 - RAG/documents/financials.txt new file mode 100644 index 0000000..3fb51ae --- /dev/null +++ b/AI/Day02/2 - RAG/documents/financials.txt @@ -0,0 +1,17 @@ +TechCorp: Financial Information and Funding History + +Seed Round (2020) +TechCorp raised its initial seed funding of $2 million in August 2020, just three months after incorporation. The round was led by Y Combinator, with participation from several angel investors including former Google Health executives. This funding was used to build the initial prototype of MedAI and hire the first five engineers. + +Series A (2021) +In December 2021, TechCorp closed a $12 million Series A round led by Andreessen Horowitz (a16z). The round valued the company at $60 million. At this point, MedAI had completed its first clinical trial with promising results, and the company had signed letters of intent with four hospital systems. The Series A funding was allocated primarily to expanding the engineering team, scaling cloud infrastructure, and initiating the FDA approval process for MedAI. + +Series B (2023) +The most significant funding milestone came in June 2023, when TechCorp raised $50 million in a Series B round led by Sequoia Capital, with participation from existing investors a16z and Y Combinator. The company was valued at $300 million, a 5x increase from the Series A valuation. This round attracted significant attention in the healthcare AI space and was covered extensively by TechCrunch, Bloomberg, and CNBC. + +Revenue and Growth +TechCorp generated its first revenue in mid-2022 when MedAI became commercially available. The company's revenue for fiscal year 2022 was $10 million, primarily from subscription contracts with hospital systems in the United States. In 2023, revenue grew to $25 million, representing a 150% year-over-year increase. This growth was driven by expansion into European markets through the London office, the launch of MedAI Pro, and increasing adoption among mid-sized hospitals. The company projects revenue of $60 million for 2024, driven by the Asia expansion and the full launch of PathAI. + +TechCorp operates on a SaaS (Software as a Service) model, charging hospitals an annual subscription fee based on the number of imaging studies processed. Pricing ranges from $50,000 per year for small clinics to over $500,000 per year for large hospital networks using MedAI Pro. The company's gross margin is approximately 75%, which is strong for a healthcare SaaS business. + +The company has not yet reached profitability, with a net loss of $15 million in 2023. However, management expects to reach break-even by the end of 2025, assuming current growth rates continue and operating expenses are managed carefully. The $50 million Series B funding provides sufficient runway through 2026. diff --git a/AI/Day02/2 - RAG/documents/partnerships.txt b/AI/Day02/2 - RAG/documents/partnerships.txt new file mode 100644 index 0000000..8c2c5f1 --- /dev/null +++ b/AI/Day02/2 - RAG/documents/partnerships.txt @@ -0,0 +1,13 @@ +TechCorp: Partnerships and Collaborations + +Hospital Partnerships +TechCorp has established partnerships with over 40 hospitals across the United States and Europe. These partnerships range from pilot programs to full enterprise deployments. Key hospital partners include Stanford Medical Center, which was the first hospital to deploy MedAI in a clinical setting in January 2022. The collaboration with Stanford includes a joint research agreement for developing new AI models for cardiac imaging. Johns Hopkins Hospital joined as a partner in April 2022, contributing anonymized training data and providing clinical validation for new MedAI features. The Mayo Clinic signed an enterprise agreement for MedAI Pro in October 2023, deploying the system across its entire network of 23 hospitals. In Europe, TechCorp partners with University College London Hospitals (UCLH) and Charite Hospital in Berlin. + +Technology Partners +On the technology side, TechCorp maintains strategic partnerships with major cloud providers and healthcare IT companies. The company uses Google Cloud Platform as its primary infrastructure provider, leveraging Google's TPU chips for model training and inference. A partnership with Epic Systems, the largest electronic health records (EHR) provider in the US, allows MedAI to integrate seamlessly with hospital information systems. TechCorp also collaborates with NVIDIA on optimizing its deep learning models for edge deployment, enabling MedAI to run on local hospital hardware without requiring cloud connectivity for sensitive patient data. + +Academic Collaborations +TechCorp maintains active research collaborations with five universities. In addition to Stanford and Johns Hopkins, the company works with MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) on developing more interpretable AI models that can explain their diagnostic reasoning to physicians. A collaboration with the University of Oxford focuses on applying federated learning techniques to train AI models across multiple hospitals without sharing patient data. The University of Tokyo partnership, established in 2023, focuses on adapting MedAI for the Japanese healthcare context, including support for Japanese medical terminology and imaging standards. + +Industry Memberships +TechCorp is an active member of the Coalition for Health AI (CHAI), an industry group working to establish standards for responsible AI in healthcare. The company also participates in the Digital Therapeutics Alliance and the Healthcare Information and Management Systems Society (HIMSS). CEO Alice Johnson serves on the board of CHAI and regularly speaks at healthcare AI conferences about the importance of clinical validation and regulatory compliance for AI medical devices. diff --git a/AI/Day02/2 - RAG/documents/products.txt b/AI/Day02/2 - RAG/documents/products.txt new file mode 100644 index 0000000..73b2b16 --- /dev/null +++ b/AI/Day02/2 - RAG/documents/products.txt @@ -0,0 +1,17 @@ +TechCorp: Products and Technology + +MedAI - Diagnostic Assistant + +MedAI is TechCorp's flagship product, launched in early 2022. It is an AI-powered diagnostic assistant designed to help radiologists and physicians analyze medical images more efficiently and accurately. The system supports three types of medical imaging: X-rays, MRIs, and CT scans. In clinical trials conducted across 12 hospitals in the United States and Europe, MedAI achieved a diagnostic accuracy rate of 95%, which is comparable to the performance of experienced radiologists with over 15 years of practice. + +The technology behind MedAI is based on a proprietary deep learning architecture that combines convolutional neural networks with transformer-based attention mechanisms. The model was trained on a dataset of over 2 million anonymized medical images, carefully curated in partnership with Stanford Medical Center and Johns Hopkins Hospital. Each image in the training set was annotated by at least three board-certified radiologists to ensure label quality. + +MedAI integrates directly into hospital PACS (Picture Archiving and Communication System) workflows through standard DICOM protocols. This means doctors don't need to change their existing workflow to use the tool. The system highlights areas of concern in medical images, provides a confidence score for each finding, and generates a preliminary report that the radiologist can review and edit. Processing time is under 30 seconds per image, significantly faster than manual analysis which can take 15-20 minutes per case. + +MedAI Pro - Enterprise Solution + +In September 2023, TechCorp launched MedAI Pro, an enterprise version of the product designed for large hospital networks. MedAI Pro includes all features of the standard version plus advanced analytics dashboards, multi-site deployment support, custom model fine-tuning for specific patient populations, and a collaboration module that allows radiologists across different locations to discuss cases in real-time. The enterprise solution also includes a dedicated support team and guaranteed 99.9% uptime SLA. + +PathAI - Pathology Module (Beta) + +TechCorp announced PathAI in November 2023, a new module focused on digital pathology. Still in beta testing, PathAI analyzes tissue sample images to assist pathologists in detecting cancerous cells. Early results from pilot programs at three hospitals show a 92% accuracy rate for detecting breast cancer in biopsy samples. The full launch is planned for Q2 2024. diff --git a/AI/Day02/2 - RAG/rag_afternoon.ipynb b/AI/Day02/2 - RAG/rag_afternoon.ipynb new file mode 100644 index 0000000..281a9dc --- /dev/null +++ b/AI/Day02/2 - RAG/rag_afternoon.ipynb @@ -0,0 +1,966 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "intro-welcome", + "metadata": {}, + "source": [ + "Hello **Everyone** ! \n", + "Welcome to the **second part of Day 2** of our AI pool ! \n", + "\n", + "This morning, you learned how to **fine-tune** a model : you modified GPT-2's weights so it would \"memorize\" new facts (like false capitals). That works, but it has limits.\n", + "\n", + "This afternoon, we take a completely different approach. Instead of modifying the model, we will **give it access to external documents** and teach it to find the right information before answering.\n", + "\n", + "Our goal : build a system that can answer questions about **your own documents** (PDFs, texts, etc.) with accurate, sourced information. \n", + "\n", + "By the end of this session, you will have built a complete **RAG** system (Retrieval Augmented Generation) - one of the most widely used techniques in production AI applications today." + ] + }, + { + "cell_type": "markdown", + "id": "context-problem", + "metadata": {}, + "source": [ + "**But wait... why not just fine-tune again ?**\n", + "\n", + "Remember this morning ? We fine-tuned GPT-2 to give us false capitals. The model \"learned\" these false facts by modifying its internal weights. \n", + "But there are real problems with this approach :\n", + "- **What if the information changes ?** You would need to re-train every time.\n", + "- **What if you have thousands of documents** that update every day ? Fine-tuning is too slow and expensive for that.\n", + "- **Fine-tuning is permanent** : once the model learns something wrong, it's hard to \"un-learn\" it.\n", + "\n", + "We need a smarter approach : **RAG** (Retrieval Augmented Generation). \n", + "\n", + "Think of it like this :\n", + "- **Fine-tuning** = Teaching a student new facts by heart (slow, expensive, hard to update)\n", + "- **RAG** = Giving the student access to a library and teaching them how to search (fast, flexible, always up-to-date)\n", + "\n", + "With RAG, the model itself doesn't change. Instead, we **search for relevant information** in our documents and give it to the model along with the question. The model then uses that information to generate an accurate answer.\n", + "\n", + "But before building RAG, we need to understand **embeddings** - the technology that makes intelligent search possible !" + ] + }, + { + "cell_type": "markdown", + "id": "part1-title", + "metadata": {}, + "source": [ + "# **I/ Understanding Embeddings**" + ] + }, + { + "cell_type": "markdown", + "id": "embedding-intro", + "metadata": {}, + "source": [ + "### **What is an Embedding ?**\n", + "\n", + "An embedding is a way to represent text (or images, audio...) as a **list of numbers** (a vector). \n", + "\n", + "Imagine you want to organize books in a library. Instead of organizing them alphabetically, you organize them by **meaning** :\n", + "- Books about cooking are close together\n", + "- Books about space are close together\n", + "- A book about \"cooking in space\" would be somewhere in between\n", + "\n", + "Embeddings do exactly this : texts with similar meanings have similar numbers (vectors that are \"close\" in space).\n", + "\n", + "**Example :**\n", + "- \"I love pizza\" → [0.2, 0.8, 0.1, ...]\n", + "- \"Pizza is my favorite food\" → [0.21, 0.79, 0.12, ...] (very similar)\n", + "- \"The weather is nice\" → [0.9, 0.1, 0.7, ...] (very different)" + ] + }, + { + "cell_type": "markdown", + "id": "install-packages", + "metadata": {}, + "source": [ + "### ***1/ Setup: Install the necessary packages***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "install-cell", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install sentence-transformers chromadb numpy" + ] + }, + { + "cell_type": "markdown", + "id": "create-embedding-title", + "metadata": {}, + "source": [ + "### ***2/ Create your first embedding***\n", + "\n", + "We will use a pre-trained model from HuggingFace to create embeddings. \n", + "The model `all-MiniLM-L6-v2` is small, fast, and works great for most use cases.\n", + "\n", + "**Wait, another library ?** This morning we used `transformers` from HuggingFace to load GPT-2 (a text generation model). Now we use `sentence-transformers`, which is built on top of `transformers` but specialized for creating **embeddings**. Think of it this way :\n", + "- `transformers` = general-purpose (text generation, classification, translation...)\n", + "- `sentence-transformers` = specialized for turning text into vectors (embeddings)\n", + "\n", + "They both come from HuggingFace, but have different purposes.\n", + "\n", + "**Documentation :** https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "first-embedding", + "metadata": {}, + "outputs": [], + "source": [ + "from sentence_transformers import SentenceTransformer\n", + "\n", + "# TODO: Load the embedding model 'all-MiniLM-L6-v2'\n", + "embedding_model = ...\n", + "\n", + "text = \"I love artificial intelligence\"\n", + "\n", + "# TODO: Create the embedding\n", + "embedding = ...\n", + "\n", + "print(f\"Embedding created.\")\n", + "print(f\"Embedding dimension: {len(embedding)}\")\n", + "print(f\"First 10 values: {embedding[:10]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "similarity-title", + "metadata": {}, + "source": [ + "### ***3/ Measure similarity between texts***\n", + "\n", + "Now comes the magic. We can measure how **similar** two texts are by comparing their embeddings. \n", + "We use **cosine similarity** : a score between -1 and 1.\n", + "\n", + "**How to read the score :**\n", + "- **1.0** = Identical meaning (the two texts say the same thing)\n", + "- **0.7 - 0.9** = Very similar (related topics, similar ideas)\n", + "- **0.3 - 0.7** = Somewhat related\n", + "- **0.0** = No relation at all\n", + "- **-1.0** = Opposite meaning\n", + "\n", + "**Intuition :** Imagine each embedding as an arrow in space. Cosine similarity measures the **angle** between two arrows. If they point in the same direction (angle = 0), the similarity is 1. If they point in opposite directions (angle = 180), it's -1.\n", + "\n", + "**Documentation :** https://www.sbert.net/docs/package_reference/util.html\n", + "\n", + "**Your task :** Complete the code to calculate similarity between sentences." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "similarity-calc", + "metadata": {}, + "outputs": [], + "source": [ + "from sentence_transformers import util\n", + "\n", + "sentences = [\n", + " \"I love programming in Python\",\n", + " \"Python is my favorite programming language\",\n", + " \"The weather is beautiful today\",\n", + " \"I enjoy coding and building software\"\n", + "]\n", + "\n", + "# TODO: Create embeddings for all sentences\n", + "embeddings = ...\n", + "\n", + "print(\"Similarity with 'I love programming in Python':\\n\")\n", + "\n", + "for i, sentence in enumerate(sentences):\n", + " # TODO: Calculate cosine similarity between first embedding and current one\n", + " similarity = ...\n", + " print(f\" \\\"{sentence}\\\"\")\n", + " print(f\" → Similarity: {similarity:.4f}\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "question-similarity", + "metadata": {}, + "source": [ + "**Question :** Which sentences have the highest similarity ? Does it make sense to you ? \n", + "Take a moment to analyze the results before continuing." + ] + }, + { + "cell_type": "markdown", + "id": "visualize-title", + "metadata": {}, + "source": [ + "### ***4/ Visualize embeddings in 2D***\n", + "\n", + "We said that embeddings place similar texts \"close together\" in space. Let's actually **see** this !\n", + "\n", + "The embeddings we created have 384 dimensions - impossible to visualize directly. But we can use a technique called **PCA** (Principal Component Analysis) to compress them down to just 2 dimensions, so we can plot them on a graph.\n", + "\n", + "This is a great way to build intuition about how embeddings organize text by meaning.\n", + "\n", + "**Documentation :** https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "visualize-embeddings", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install matplotlib scikit-learn\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.decomposition import PCA\n", + "import numpy as np\n", + "\n", + "texts = [\n", + " # Tech topic\n", + " \"I love programming in Python\",\n", + " \"JavaScript is great for web development\",\n", + " \"Machine learning is fascinating\",\n", + " # Food topic \n", + " \"Pizza is my favorite food\",\n", + " \"I love cooking Italian pasta\",\n", + " \"Sushi is delicious\",\n", + " # Nature topic\n", + " \"The mountains are beautiful\",\n", + " \"I love hiking in the forest\",\n", + " \"The ocean is peaceful\"\n", + "]\n", + "\n", + "# TODO: Create embeddings for all texts\n", + "text_embeddings = ...\n", + "\n", + "# TODO: Reduce to 2D using PCA\n", + "pca = ...\n", + "embeddings_2d = ...\n", + "\n", + "plt.figure(figsize=(12, 8))\n", + "colors = ['blue', 'blue', 'blue', 'red', 'red', 'red', 'green', 'green', 'green']\n", + "\n", + "for i, (x, y) in enumerate(embeddings_2d):\n", + " plt.scatter(x, y, c=colors[i], s=100)\n", + " plt.annotate(texts[i][:30] + \"...\", (x, y), fontsize=8)\n", + "\n", + "plt.title(\"Embeddings visualized in 2D (Blue=Tech, Red=Food, Green=Nature)\")\n", + "plt.xlabel(\"Dimension 1\")\n", + "plt.ylabel(\"Dimension 2\")\n", + "plt.grid(True, alpha=0.3)\n", + "plt.show()\n", + "\n", + "print(\"\\nNotice how similar topics cluster together.\")" + ] + }, + { + "cell_type": "markdown", + "id": "part2-title", + "metadata": {}, + "source": [ + "# **II/ Building a Vector Database**\n", + "\n", + "Now that you understand how embeddings capture meaning as numbers, the next question is : **how do we search through thousands of embeddings quickly ?**\n", + "\n", + "When you have 10 documents, you can compare them one by one. But with 10,000 or 1,000,000 documents, you need something smarter. That's where **vector databases** come in." + ] + }, + { + "cell_type": "markdown", + "id": "vectordb-intro", + "metadata": {}, + "source": [ + "Now that we understand embeddings, we need a place to **store** and **search** them efficiently.\n", + "\n", + "**Why not just use a Python list ?** \n", + "You could store embeddings in a list and loop through all of them to find the most similar one. But this gets **very slow** when you have thousands or millions of documents. Imagine comparing your query against 1 million vectors one by one - that would take forever.\n", + "\n", + "A **vector database** is a specialized database designed to :\n", + "- Store embeddings (lists of numbers) efficiently\n", + "- Find the most similar vectors **extremely fast**, even with millions of entries (using smart indexing algorithms)\n", + "- Handle metadata (like the source file, date, author...) alongside each vector\n", + "\n", + "We will use **ChromaDB**, which is simple, works locally, and is perfect for learning.\n", + "\n", + "**Important concept :** In Part I, we manually created embeddings using `SentenceTransformer`. ChromaDB can do this **automatically** for you ! When you add a text document, ChromaDB will create its embedding behind the scenes using its own built-in embedding model (which happens to be `all-MiniLM-L6-v2` - the same one we used in Part I !).\n", + "\n", + "So Part I taught you **how embeddings work under the hood**. Now ChromaDB will handle the embedding step for us, so we can focus on building the search and retrieval pipeline.\n", + "\n", + "**Other popular vector databases :** Pinecone, Weaviate, Qdrant, FAISS, Milvus" + ] + }, + { + "cell_type": "markdown", + "id": "create-db-title", + "metadata": {}, + "source": [ + "### ***1/ Create a ChromaDB collection***\n", + "\n", + "A \"collection\" in ChromaDB is like a table in a regular database. \n", + "It stores your documents and their embeddings together.\n", + "\n", + "When you create a collection, ChromaDB is ready to :\n", + "1. Accept documents (plain text)\n", + "2. Automatically convert them to embeddings\n", + "3. Store both the text and its embedding\n", + "4. Later, search for similar documents when you ask a question\n", + "\n", + "**Documentation :** https://docs.trychroma.com/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "create-chromadb", + "metadata": {}, + "outputs": [], + "source": [ + "import chromadb\n", + "\n", + "# TODO: Create a ChromaDB client\n", + "chroma_client = ...\n", + "\n", + "# TODO: Create a collection named \"my_knowledge_base\"\n", + "collection = ...\n", + "\n", + "print(f\"Collection '{collection.name}' created.\")\n", + "print(f\"Currently contains {collection.count()} documents\")" + ] + }, + { + "cell_type": "markdown", + "id": "add-docs-title", + "metadata": {}, + "source": [ + "### ***2/ Add documents to the database***\n", + "\n", + "Let's add some documents about a fictional company. \n", + "Later, we will ask questions and retrieve relevant information.\n", + "\n", + "**Your task :** Add documents to the collection using the `add()` method.\n", + "\n", + "**Hint :** ChromaDB requires each document to have a **unique string ID**. \n", + "Check the documentation to see how the `add()` method works : https://docs.trychroma.com/docs/collections/add-data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "add-documents", + "metadata": {}, + "outputs": [], + "source": [ + "documents = [\n", + " \"TechCorp was founded in 2020 by Alice Johnson and Bob Smith in San Francisco.\",\n", + " \"TechCorp specializes in artificial intelligence solutions for healthcare.\",\n", + " \"The company has 150 employees and offices in San Francisco and London.\",\n", + " \"TechCorp's main product is MedAI, a diagnostic assistant for doctors.\",\n", + " \"In 2023, TechCorp raised $50 million in Series B funding from Sequoia Capital.\",\n", + " \"The CEO of TechCorp is Alice Johnson, who previously worked at Google.\",\n", + " \"TechCorp's revenue in 2023 was $25 million, a 150% increase from 2022.\",\n", + " \"The company plans to expand to Asia in 2024, starting with Japan and Singapore.\",\n", + " \"MedAI can analyze X-rays, MRIs, and CT scans with 95% accuracy.\",\n", + " \"TechCorp won the Best AI Startup award at TechCrunch Disrupt 2023.\"\n", + "]\n", + "\n", + "# TODO: Add documents to the collection with unique IDs\n", + "...\n", + "\n", + "print(f\"Added {collection.count()} documents to the collection.\")" + ] + }, + { + "cell_type": "markdown", + "id": "query-title", + "metadata": {}, + "source": [ + "### ***3/ Search for relevant documents***\n", + "\n", + "Now the magic happens. We can search for documents by **meaning**, not just keywords. \n", + "The database will find documents that are semantically similar to our query.\n", + "\n", + "**Your task :** Use the `query()` method to search for relevant documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "query-documents", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Who founded the company and when ?\"\n", + "\n", + "# TODO: Query the collection and get 3 results\n", + "results = ...\n", + "\n", + "print(f\"Query: \\\"{query}\\\"\\n\")\n", + "print(\"Most relevant documents:\")\n", + "for i, doc in enumerate(results['documents'][0]):\n", + " print(f\" {i+1}. {doc}\")" + ] + }, + { + "cell_type": "markdown", + "id": "experiment-title", + "metadata": {}, + "source": [ + "### ***4/ Experiment with different queries***\n", + "\n", + "Try different questions and see how the system finds relevant documents. \n", + "Notice how it understands meaning, not just exact word matches." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "experiment-queries", + "metadata": {}, + "outputs": [], + "source": [ + "test_queries = [\n", + " \"What does the company sell ?\",\n", + " \"How much money did they raise ?\",\n", + " \"Where are the offices located ?\",\n", + " \"Tell me about the medical AI product\"\n", + "]\n", + "\n", + "for query in test_queries:\n", + " # TODO: Query the collection with 2 results\n", + " results = ...\n", + " \n", + " print(f\"\\nQuery: \\\"{query}\\\"\")\n", + " print(\"Results:\")\n", + " for doc in results['documents'][0]:\n", + " print(f\" → {doc}\")\n", + " print(\"-\" * 60)" + ] + }, + { + "cell_type": "markdown", + "id": "part3-title", + "metadata": {}, + "source": [ + "# **III/ Building a RAG System**" + ] + }, + { + "cell_type": "markdown", + "id": "rag-intro", + "metadata": {}, + "source": [ + "In Part II, we built a system that can **find relevant documents** based on a question. That's great, but it only gives us raw text snippets - it doesn't actually **answer** the question.\n", + "\n", + "Now we combine everything into a complete **RAG** (Retrieval Augmented Generation) system by adding an LLM to the pipeline.\n", + "\n", + "**How RAG works - the 3 steps :**\n", + "1. **Retrieve** : The user asks a question. We search our vector database for the most relevant documents.\n", + "2. **Augment** : We take those documents and insert them into a prompt, along with the question. This gives the LLM the context it needs.\n", + "3. **Generate** : The LLM reads the context and generates an answer based **only on the provided documents**.\n", + "\n", + "**Why is this powerful ?**\n", + "- The LLM has access to **your specific data** (company docs, internal knowledge...)\n", + "- Answers are **grounded** in real documents, which reduces hallucination\n", + "- You can **update** the knowledge base anytime without retraining the model\n", + "- You can trace **which documents** were used to generate each answer" + ] + }, + { + "cell_type": "markdown", + "id": "setup-llm-title", + "metadata": {}, + "source": [ + "### ***1/ Setup the LLM***\n", + "\n", + "For RAG, we need an LLM that can read our documents and generate an answer. We will use **Ollama** to run a local LLM.\n", + "\n", + "**Why Ollama and not HuggingFace like this morning ?** \n", + "This morning, we used HuggingFace `transformers` to load GPT-2 because we needed to **modify** the model (fine-tuning). Here, we don't need to modify anything - we just need to **send a prompt and get a response**. Ollama makes this very simple :\n", + "- It handles downloading and running LLMs with a single command\n", + "- It provides a simple HTTP API (like a web server) that we can call from Python\n", + "- It supports powerful models like Llama 3.2 that are much better than GPT-2 for generating answers\n", + "\n", + "Think of Ollama as a \"model server\" : it runs in the background and we send it questions via HTTP requests.\n", + "\n", + "**Setup steps :**\n", + "\n", + "1. Install Ollama from [ollama.com](https://ollama.com/)\n", + "2. Open a **separate terminal** and run :\n", + "```bash\n", + "ollama pull llama3.2:3b\n", + "```\n", + "3. Keep Ollama running in the background :\n", + "```bash\n", + "ollama serve\n", + "```\n", + "\n", + "**Troubleshooting :**\n", + "- If you get a \"connection refused\" error later, it means Ollama is not running. Start it with `ollama serve` in a terminal.\n", + "- If `ollama pull` fails, check your internet connection.\n", + "- The model is ~2GB, so the first download may take a few minutes.\n", + "\n", + "**Documentation :** https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "setup-llm", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install requests\n", + "\n", + "import requests\n", + "\n", + "LLM_URL = \"http://localhost:11434/api/generate\"\n", + "LLM_MODEL = \"llama3.2:3b\"\n", + "\n", + "# Test the connection to Ollama\n", + "try:\n", + " test_response = requests.get(\"http://localhost:11434/api/tags\", timeout=5)\n", + " if test_response.status_code == 200:\n", + " models = [m[\"name\"] for m in test_response.json().get(\"models\", [])]\n", + " print(f\"Ollama is running !\")\n", + " print(f\"Available models: {models}\")\n", + " if not any(LLM_MODEL in m for m in models):\n", + " print(f\"\\nWARNING: '{LLM_MODEL}' not found. Run: ollama pull {LLM_MODEL}\")\n", + " else:\n", + " print(f\"Model '{LLM_MODEL}' is ready.\")\n", + "except requests.exceptions.ConnectionError:\n", + " print(\"ERROR: Cannot connect to Ollama !\")\n", + " print(\"Make sure Ollama is running: open a terminal and run 'ollama serve'\")\n", + " print(\"Then run: ollama pull llama3.2:3b\")" + ] + }, + { + "cell_type": "markdown", + "id": "rag-function-title", + "metadata": {}, + "source": [ + "### ***2/ Build the RAG pipeline***\n", + "\n", + "Let's create a function that implements the 3 RAG steps :\n", + "1. **Retrieve** : Query ChromaDB to find relevant documents\n", + "2. **Augment** : Build a prompt that includes the retrieved documents as context\n", + "3. **Generate** : Send the prompt to Ollama and get the answer\n", + "\n", + "**Your task :** Complete the RAG function below.\n", + "\n", + "**Hint 1 - Building the context :** You need to combine the retrieved documents into a single block of text that the LLM can read.\n", + "\n", + "**Hint 2 - Structuring the prompt :** A good RAG prompt should :\n", + "- Clearly separate the context (retrieved documents) from the question\n", + "- Instruct the LLM to answer **only** based on the provided context\n", + "- Handle the case where the answer is not in the context\n", + "\n", + "Think about what instructions you would give to a human if you handed them a stack of documents and asked them a question.\n", + "\n", + "**Hint 3 - Calling Ollama :** Ollama exposes a REST API. You need to send a **POST** request with a JSON body containing the model name and your prompt. Make sure to set `\"stream\": False` to get the full response at once (otherwise Ollama streams the response token by token in a different format).\n", + "\n", + "**Documentation :** https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "rag-function", + "metadata": {}, + "outputs": [], + "source": [ + "def ask_with_rag(question: str, n_results: int = 3) -> tuple[str, list]:\n", + " \"\"\"\n", + " RAG pipeline: Retrieve relevant docs and generate an answer.\n", + " \n", + " Args:\n", + " question: The user's question\n", + " n_results: Number of documents to retrieve\n", + " \n", + " Returns:\n", + " Tuple of (answer, source documents)\n", + " \"\"\"\n", + " \n", + " # Step 1 - RETRIEVE: Query the collection to find relevant documents\n", + " results = ...\n", + " \n", + " # Step 2 - AUGMENT: Build the context string from retrieved documents\n", + " context = ...\n", + " \n", + " # Step 3 - AUGMENT: Create the prompt with context + question\n", + " prompt = ...\n", + " \n", + " # Step 4 - GENERATE: Call Ollama API and extract the response\n", + " response = ...\n", + " answer = ...\n", + " \n", + " return answer, results['documents'][0]\n", + "\n", + "print(\"RAG function created.\")" + ] + }, + { + "cell_type": "markdown", + "id": "test-rag-title", + "metadata": {}, + "source": [ + "### ***3/ Test your RAG system***\n", + "\n", + "Now let's test our RAG system with various questions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "test-rag", + "metadata": {}, + "outputs": [], + "source": [ + "test_questions = [\n", + " \"Who is the CEO of TechCorp ?\",\n", + " \"What is MedAI and what can it do ?\",\n", + " \"How much funding did the company raise ?\",\n", + "]\n", + "\n", + "for question in test_questions:\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Question: {question}\\n\")\n", + " \n", + " try:\n", + " answer, sources = ask_with_rag(question)\n", + " print(f\"Answer: {answer}\")\n", + " print(f\"\\nSources used:\")\n", + " for source in sources:\n", + " print(f\" - {source}\")\n", + " except requests.exceptions.ConnectionError:\n", + " print(\"ERROR: Cannot connect to Ollama.\")\n", + " print(\"Open a terminal and run: ollama serve\")\n", + " except KeyError as e:\n", + " print(f\"ERROR: Unexpected response format from Ollama: {e}\")\n", + " print(\"Make sure the model is downloaded: ollama pull llama3.2:3b\")\n", + " except Exception as e:\n", + " print(f\"ERROR: {e}\")\n", + " print(\"Check that Ollama is running and the model is available.\")" + ] + }, + { + "cell_type": "markdown", + "id": "compare-title", + "metadata": {}, + "source": [ + "### ***4/ Compare: With RAG vs Without RAG***\n", + "\n", + "Let's see the difference between asking the LLM directly vs using RAG. \n", + "This shows why RAG is so powerful for domain-specific questions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "compare-rag", + "metadata": {}, + "outputs": [], + "source": [ + "def ask_without_rag(question: str) -> str:\n", + " \"\"\"Ask the LLM directly without any context.\"\"\"\n", + " # TODO: Send the question directly to Ollama, without any context documents\n", + " # Hint: same pattern as before, but the prompt is just the question itself\n", + " ...\n", + "\n", + "question = \"Who is the CEO of TechCorp and what is their background ?\"\n", + "\n", + "print(f\"Question: {question}\\n\")\n", + "print(\"=\" * 60)\n", + "\n", + "try:\n", + " print(\"\\nWITHOUT RAG (LLM has no context about TechCorp):\")\n", + " print(ask_without_rag(question))\n", + " \n", + " print(\"\\n\" + \"=\" * 60)\n", + " print(\"\\nWITH RAG (LLM receives relevant documents as context):\")\n", + " answer, _ = ask_with_rag(question)\n", + " print(answer)\n", + "except requests.exceptions.ConnectionError:\n", + " print(\"ERROR: Cannot connect to Ollama. Run 'ollama serve' in a terminal.\")\n", + "except Exception as e:\n", + " print(f\"ERROR: {e}\")" + ] + }, + { + "cell_type": "markdown", + "id": "part4-title", + "metadata": {}, + "source": [ + "# **IV/ RAG on Real Documents : Chunking & Multi-File Pipeline**" + ] + }, + { + "cell_type": "markdown", + "id": "chunking-intro", + "metadata": {}, + "source": [ + "### ***1/ Understanding chunking***\n", + "\n", + "In Parts II and III, we worked with short, single-sentence documents. That made things easy. \n", + "But in real applications, your knowledge base is made of **long documents** : PDFs, reports, articles, internal docs...\n", + "\n", + "You can't just embed an entire 10-page document as a single vector. Why ?\n", + "- Embeddings work best on **short texts** (a few sentences). A single embedding for a whole document would lose the details.\n", + "- When you retrieve a long document, most of it is **irrelevant** to the question. You'd waste the LLM's context window.\n", + "- LLMs have **context limits** - you can't feed them an entire book.\n", + "\n", + "The solution is **chunking** : splitting long documents into smaller, meaningful pieces.\n", + "\n", + "**How chunking works :**\n", + "- We define a **maximum chunk size** (e.g. 500 characters).\n", + "- We walk through the text and cut at approximately every 500 characters.\n", + "- But we don't cut in the middle of a sentence ! We look for the **last sentence boundary** (period, exclamation mark...) before the limit, so each chunk contains **complete sentences**.\n", + "- We also add an **overlap** between chunks (e.g. 100 characters). This means the end of one chunk is repeated at the start of the next one. This prevents losing context at the boundaries - if an important fact spans two chunks, the overlap ensures it appears fully in at least one of them.\n", + "\n", + "**Example with chunk_size=500, overlap=100 :**\n", + "```\n", + "Document: \"Sentence A. Sentence B. Sentence C. Sentence D. Sentence E. ...\"\n", + "\n", + "Chunk 1: \"Sentence A. Sentence B. Sentence C.\" (480 chars, cut at last period before 500)\n", + "Chunk 2: \"Sentence C. Sentence D. Sentence E.\" (starts 100 chars before the end of chunk 1)\n", + "```\n", + "\n", + "In the `documents/` folder, you will find **5 text files** about TechCorp. \n", + "Your task is to implement the chunking function, load the files, chunk them, and build a complete RAG system over real documents.\n", + "\n", + "**Hint :** Python's `str` methods like `rfind()` can help you find sentence boundaries within a range of text." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "chunking-function", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "def chunk_text(text: str, chunk_size: int = 500, overlap: int = 100) -> list:\n", + " \"\"\"\n", + " Split text into overlapping chunks, cutting at sentence boundaries.\n", + " \n", + " Args:\n", + " text: The full text to chunk\n", + " chunk_size: Maximum size of each chunk (in characters)\n", + " overlap: Number of characters to overlap between chunks\n", + " \n", + " Returns:\n", + " List of text chunks\n", + " \"\"\"\n", + " chunks = []\n", + " start = 0\n", + " \n", + " while start < len(text):\n", + " # TODO: Find where this chunk should end.\n", + " # - Don't exceed chunk_size characters\n", + " # - Try to cut at a sentence boundary (not in the middle of a sentence)\n", + " end = ...\n", + " \n", + " # TODO: Extract the chunk, add it to the list, and advance start\n", + " # - Don't forget the overlap when moving start forward\n", + " # - Make sure you can't get stuck in an infinite loop\n", + " ...\n", + " \n", + " return chunks\n", + "\n", + "\n", + "# --- Load all .txt files from the documents/ folder ---\n", + "documents_dir = \"documents\"\n", + "all_chunks = []\n", + "chunk_sources = []\n", + "\n", + "for filename in sorted(os.listdir(documents_dir)):\n", + " if not filename.endswith(\".txt\"):\n", + " continue\n", + " \n", + " filepath = os.path.join(documents_dir, filename)\n", + " with open(filepath, \"r\") as f:\n", + " content = f.read()\n", + " \n", + " # TODO: Chunk the file content using the chunk_text function\n", + " file_chunks = ...\n", + " \n", + " for chunk in file_chunks:\n", + " all_chunks.append(chunk)\n", + " chunk_sources.append(filename)\n", + " \n", + " print(f\"Loaded '{filename}' -> {len(file_chunks)} chunks\")\n", + "\n", + "print(f\"\\nTotal: {len(all_chunks)} chunks from {len(set(chunk_sources))} files\")\n", + "print(f\"\\nExample chunk (chunk #1):\")\n", + "print(f\" Source: {chunk_sources[0]}\")\n", + "print(f\" Length: {len(all_chunks[0])} chars\")\n", + "print(f\" Content: \\\"{all_chunks[0][:150]}...\\\"\")" + ] + }, + { + "cell_type": "markdown", + "id": "hu04m2nke1s", + "metadata": {}, + "source": [ + "### ***2/ Store chunks in a vector database***\n", + "\n", + "Now that we have chunks from multiple files, let's store them in a **new ChromaDB collection** and build a full RAG system over real documents.\n", + "\n", + "**Your task :** Add all chunks to a new collection, keeping track of which file each chunk came from (using **metadata**)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "sje1b7j0hds", + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Create a new ChromaDB collection named \"techcorp_docs\"\n", + "docs_collection = ...\n", + "\n", + "# TODO: Add all chunks to the collection\n", + "# Each chunk needs: a unique ID, the chunk text as document, and metadata with the source filename\n", + "# Hint: metadata is a list of dicts, e.g. [{\"source\": \"file1.txt\"}, {\"source\": \"file2.txt\"}, ...]\n", + "...\n", + "\n", + "print(f\"Stored {docs_collection.count()} chunks in the 'techcorp_docs' collection.\")" + ] + }, + { + "cell_type": "markdown", + "id": "0nbnuf5f9kg", + "metadata": {}, + "source": [ + "### ***3/ RAG over real documents***\n", + "\n", + "Let's test our complete pipeline : **chunked documents + vector DB + LLM**. \n", + "The questions below require information spread across different files. Only a RAG system with proper chunking can answer them accurately." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4wy06z9srmu", + "metadata": {}, + "outputs": [], + "source": [ + "def ask_docs(question: str, n_results: int = 5) -> tuple[str, list]:\n", + " \"\"\"RAG pipeline over the chunked documents collection.\"\"\"\n", + " \n", + " # TODO: Query the docs_collection for relevant chunks\n", + " results = ...\n", + " \n", + " # TODO: Build context from retrieved chunks\n", + " context = ...\n", + " \n", + " # TODO: Create a prompt (same structure as ask_with_rag - context + question + instruction)\n", + " prompt = ...\n", + " \n", + " response = requests.post(LLM_URL, json={\n", + " \"model\": LLM_MODEL,\n", + " \"prompt\": prompt,\n", + " \"stream\": False\n", + " })\n", + " answer = response.json()[\"response\"]\n", + " \n", + " return answer, results['documents'][0], results['metadatas'][0]\n", + "\n", + "\n", + "test_questions = [\n", + " \"What is TechCorp's revenue growth from 2022 to 2023 ?\",\n", + " \"Which hospitals are partners of TechCorp ?\",\n", + " \"What is PathAI and when will it launch ?\",\n", + " \"How does MedAI integrate into hospital workflows ?\",\n", + " \"What is TechCorp's expansion plan for Asia ?\",\n", + "]\n", + "\n", + "for question in test_questions:\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"Question: {question}\\n\")\n", + " \n", + " try:\n", + " answer, sources, metadatas = ask_docs(question)\n", + " print(f\"Answer: {answer}\")\n", + " print(f\"\\nSources:\")\n", + " for source, meta in zip(sources, metadatas):\n", + " print(f\" [{meta['source']}] {source[:80]}...\")\n", + " except requests.exceptions.ConnectionError:\n", + " print(\"ERROR: Cannot connect to Ollama. Run 'ollama serve' in a terminal.\")\n", + " except Exception as e:\n", + " print(f\"ERROR: {e}\")" + ] + }, + { + "cell_type": "markdown", + "id": "conclusion-title", + "metadata": {}, + "source": [ + "# **Conclusion**" + ] + }, + { + "cell_type": "markdown", + "id": "conclusion-content", + "metadata": {}, + "source": [ + "---\n", + "\n", + "**Congratulations !** You have completed this afternoon's session on RAG.\n", + "\n", + "**What you learned today :**\n", + "\n", + "- **Embeddings** : Transform text into vectors that capture meaning \n", + "- **Cosine similarity** : Measure how close two texts are in meaning \n", + "- **Vector Databases** : Store and search embeddings efficiently (ChromaDB) \n", + "- **RAG Pipeline** : Retrieve relevant documents + Generate answers with an LLM \n", + "- **Chunking** : Split large documents into smaller pieces for better retrieval \n", + "\n", + "**Key takeaways :**\n", + "- RAG lets you give LLMs access to **your specific data** without retraining\n", + "- Embeddings enable **semantic search** (by meaning, not just keywords)\n", + "- The quality of your RAG system depends on **how you chunk** your documents and **how you write your prompt**\n", + "\n", + "**What's next ? Ideas to explore :**\n", + "- Build a RAG system with your own documents (PDFs, web pages)\n", + "- Try different embedding models and compare results\n", + "- Experiment with different chunk sizes and overlaps to see how it affects quality\n", + "---\n", + "\n", + "**Combining this morning and this afternoon :** \n", + "This morning you learned **fine-tuning** (adapting model weights to learn new behavior). \n", + "This afternoon you learned **RAG** (giving the model external knowledge at query time). \n", + "\n", + "In practice, production AI systems often use **both** :\n", + "- **Fine-tune** for style, format, or domain-specific language (how the model speaks)\n", + "- **RAG** for factual, up-to-date information (what the model knows)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/AI/Day02/README.md b/AI/Day02/README.md new file mode 100644 index 0000000..56f6502 --- /dev/null +++ b/AI/Day02/README.md @@ -0,0 +1,28 @@ +# ~ PoC AI Pool 2025 ~ + +- ## Day 2: Large Language Models + - ### Module 1: Fine-tuning + - **Notebook:** [`finetune.ipynb`](<1 - Fine-tuning/finetune.ipynb>) + - ### Module 2: RAG (Retrieval Augmented Generation) + - **Notebook:** [`rag_afternoon.ipynb`](<2 - RAG/rag_afternoon.ipynb>) + +--- + +**Time to play with LLMs !** +Today you will explore two fundamental approaches to customize Large Language Models. In the morning, you will learn how to **fine-tune** a pre-trained model (GPT-2) to adapt its behavior to a specific task. In the afternoon, you will build a complete **RAG** (Retrieval Augmented Generation) system, giving an LLM access to external documents to answer questions with accurate, sourced information. + +> Here's a list of resources that we believe can be useful to follow along (and that we've ourselves used to learn these topics before being able to write the subjects): + +## Module 1 + +- [HuggingFace Transformers Documentation](https://huggingface.co/docs/transformers) +- [GPT-2 Model Documentation](https://huggingface.co/docs/transformers/en/model_doc/gpt2) +- [HuggingFace Training Documentation](https://huggingface.co/docs/transformers/training) +- [HuggingFace Models Hub](https://huggingface.co/models) + +## Module 2 + +- [Sentence-Transformers Documentation](https://www.sbert.net/) +- [ChromaDB Documentation](https://docs.trychroma.com/) +- [Ollama Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) +- [What is RAG? (IBM)](https://research.ibm.com/blog/retrieval-augmented-generation-RAG) diff --git a/AI/Day03/1 - Regression/images/example_m-function.png b/AI/Day03/1 - Regression/images/example_m-function.png new file mode 100644 index 0000000..35bf6f5 Binary files /dev/null and b/AI/Day03/1 - Regression/images/example_m-function.png differ diff --git a/AI/Day03/1 - Regression/images/example_polynomial_function.png b/AI/Day03/1 - Regression/images/example_polynomial_function.png new file mode 100644 index 0000000..b69819d Binary files /dev/null and b/AI/Day03/1 - Regression/images/example_polynomial_function.png differ diff --git a/AI/Day03/1 - Regression/images/m_function.png b/AI/Day03/1 - Regression/images/m_function.png new file mode 100644 index 0000000..cc6bf95 Binary files /dev/null and b/AI/Day03/1 - Regression/images/m_function.png differ diff --git a/AI/Day03/1 - Regression/images/m_machine_learning.png b/AI/Day03/1 - Regression/images/m_machine_learning.png new file mode 100644 index 0000000..6ca748f Binary files /dev/null and b/AI/Day03/1 - Regression/images/m_machine_learning.png differ diff --git a/AI/Day03/1 - Regression/images/mse_loss_explication.png b/AI/Day03/1 - Regression/images/mse_loss_explication.png new file mode 100644 index 0000000..c4f65b9 Binary files /dev/null and b/AI/Day03/1 - Regression/images/mse_loss_explication.png differ diff --git a/AI/Day03/1 - Regression/images/polynomial_function.png b/AI/Day03/1 - Regression/images/polynomial_function.png new file mode 100644 index 0000000..d5f4340 Binary files /dev/null and b/AI/Day03/1 - Regression/images/polynomial_function.png differ diff --git a/AI/Day03/1 - Regression/linear_regression.ipynb b/AI/Day03/1 - Regression/linear_regression.ipynb new file mode 100644 index 0000000..f619412 --- /dev/null +++ b/AI/Day03/1 - Regression/linear_regression.ipynb @@ -0,0 +1,662 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 2: Understand Machine Learning\n", + " - ### Module 1: Linear Regression\n", + "-----------" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Welcome th the second day of your PoC AI Pool !\n", + "\n", + "*We had to make sure everyone was up to speed on basic python (and ai-related python libs) knowledge before heading into the main topic of this Pool: **machine learning** !*\n", + "\n", + "Yesterday, you learned some very useful skills which we'll put into practice in order to build our very first **machine learning project** :\n", + "\n", + "- python\n", + "- numpy -> (to work with huge numbers and arrays)\n", + "- matplotlib -> (to display graphs and visualise data)\n", + "- pandas -> (to edit and analyse data)\n", + "\n", + "Today we delve deeper into the theory by entering in the world of **machine learning**. This notebook introduce the concept of *Linear Regression* that are the simpliest function to train.\n", + "\n", + "The problem you will encounter today is simply the multiplication by 2.\n", + "Here the theory to understand it better :" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## 1.0 The theory" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Primary what is machine learning ?`" + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + }, + "polynomial_function.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Imagine that you have a mathematical (polynomial) function like : \n", + "\n", + "![polynomial_function.png](attachment:polynomial_function.png)" + ] + }, + { + "attachments": { + "example_polynomial_function.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With an input of 1 or 3 the output will be :\n", + "\n", + "![example_polynomial_function.png](attachment:example_polynomial_function.png)" + ] + }, + { + "attachments": { + "m_function.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Like easy for now, but let's imagine another function called m :\n", + "\n", + "![m_function.png](attachment:m_function.png)" + ] + }, + { + "attachments": { + "example_m-function.png": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAawAAAJ2CAYAAAADhbZ4AAAMTWlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU1cbPndkQggQCENG2EsQkRFARggr7I0gKiEJEEaMCUHFjRYrWCcigqOiVRDFDYi4UKtWiuK2juJApVKLtbiV/4QAWvqP5/+e59z73vd85z3f991zxwGA3sWXSvNQTQDyJQWyuJAA1qSUVBapByCAAPSAGnDhC+RSTkxMBIA2fP67vb4BvaFddVRq/bP/v5qWUCQXAIDEQJwhlAvyIT4EAN4qkMoKACBKIW8xs0CqxOUQ68hggBDXKnGWCrcqcYYKXx70SYjjQvwYALI6ny/LAkCjD/KsQkEW1KHDbIGzRCiWQOwPsW9+/nQhxAshtoU+cE66Up+d8ZVO1t80M0Y0+fysEazKZdDIgWK5NI8/+/8sx/+2/DzF8Bw2sKlny0LjlDnDuj3OnR6uxOoQv5VkREVDrA0AiouFg/5KzMxWhCaq/FFbgZwLawaYEE+U58Xzhvg4IT8wHGIjiDMleVERQz7FmeJgpQ+sH1opLuAlQKwPca1IHhQ/5HNSNj1ueN4bmTIuZ4h/xpcNxqDU/6zITeSo9DHtbBFvSB9zKspOSIaYCnFgoTgpCmINiKPkufHhQz5pRdncqGEfmSJOmYslxDKRJCRApY9VZMqC44b8d+XLh3PHTmaLeVFD+EpBdkKoqlbYYwF/MH6YC9YnknASh3VE8kkRw7kIRYFBqtxxskiSGK/icX1pQUCcaixuL82LGfLHA0R5IUreHOIEeWH88NjCArg4Vfp4ibQgJkEVJ16Vww+LUcWD7wMRgAsCAQsoYMsA00EOEHf0NvXCK1VPMOADGcgCIuA4xAyPSB7skcBjPCgCv0MkAvKRcQGDvSJQCPlPo1glJx7hVEdHkDnUp1TJBU8gzgfhIA9eKwaVJCMRJIHHkBH/IyI+bAKYQx5syv5/zw+zXxgOZCKGGMXwjCz6sCcxiBhIDCUGE+1wQ9wX98Yj4NEfNhecjXsO5/HFn/CE0El4SLhO6CLcniYulo2KMhJ0Qf3gofpkfF0f3BpquuEBuA9Uh8o4EzcEjrgrnIeD+8GZ3SDLHYpbWRXWKO2/ZfDVHRryozhTUIoexZ9iO3qkhr2G24iKstZf10cVa8ZIvbkjPaPn535VfSE8h4/2xL7FDmLnsFPYBawVawIs7ATWjLVjx5R4ZMU9Hlxxw7PFDcaTC3VGr5kvd1ZZSblzvXOP80dVX4FoVoHyYeROl86WibOyC1gc+MUQsXgSgdNYlouzixsAyu+P6vX2Knbwu4Iw279wi38FwOfEwMDA0S9c2AkA9nvAV8KRL5wtG35a1AA4f0SgkBWqOFx5IMA3Bx0+fQbABFgAW5iPC3AH3sAfBIEwEA0SQAqYCqPPhutcBmaCuWARKAFlYBVYB6rAFrAN1II94ABoAq3gFPgRXASXwXVwB66ebvAc9IHX4AOCICSEhjAQA8QUsUIcEBeEjfgiQUgEEoekIOlIFiJBFMhcZDFShqxBqpCtSB2yHzmCnEIuIJ3IbeQB0oP8ibxHMVQd1UGNUWt0HMpGOWg4moBOQbPQGWgRugRdgVaiNehutBE9hV5Er6Nd6HO0HwOYGsbEzDBHjI1xsWgsFcvEZNh8rBSrwGqwBqwF3uerWBfWi73DiTgDZ+GOcAWH4om4AJ+Bz8eX41V4Ld6In8Gv4g/wPvwzgUYwIjgQvAg8wiRCFmEmoYRQQdhBOEw4C5+lbsJrIpHIJNoQPeCzmELMIc4hLiduIu4lniR2Eh8R+0kkkgHJgeRDiibxSQWkEtIG0m7SCdIVUjfpLVmNbEp2IQeTU8kScjG5gryLfJx8hfyU/IGiSbGieFGiKULKbMpKynZKC+USpZvygapFtaH6UBOoOdRF1EpqA/Us9S71lZqamrmap1qsmlhtoVql2j6182oP1N6pa6vbq3PV09QV6ivUd6qfVL+t/opGo1nT/GmptALaClod7TTtPu2tBkPDSYOnIdRYoFGt0ahxReMFnUK3onPoU+lF9Ar6Qfoleq8mRdNak6vJ15yvWa15RPOmZr8WQ2u8VrRWvtZyrV1aF7SeaZO0rbWDtIXaS7S3aZ/WfsTAGBYMLkPAWMzYzjjL6NYh6tjo8HRydMp09uh06PTpauu66ibpztKt1j2m28XEmNZMHjOPuZJ5gHmD+V7PWI+jJ9Jbptegd0Xvjf4YfX99kX6p/l796/rvDVgGQQa5BqsNmgzuGeKG9oaxhjMNNxueNewdozPGe4xgTOmYA2N+MUKN7I3ijOYYbTNqN+o3NjEOMZYabzA+bdxrwjTxN8kxKTc5btJjyjD1NRWblpueMP2NpcvisPJYlawzrD4zI7NQM4XZVrMOsw/mNuaJ5sXme83vWVAt2BaZFuUWbRZ9lqaWkZZzLestf7GiWLGtsq3WW52zemNtY51svdS6yfqZjb4Nz6bIpt7mri3N1s92hm2N7TU7oh3bLtduk91le9TezT7bvtr+kgPq4O4gdtjk0DmWMNZzrGRszdibjuqOHMdCx3rHB05MpwinYqcmpxfjLMeljls97ty4z85uznnO253vjNceHza+eHzL+D9d7F0ELtUu1ybQJgRPWDChecJLVwdXketm11tuDLdIt6VubW6f3D3cZe4N7j0elh7pHhs9brJ12DHs5ezzngTPAM8Fnq2e77zcvQq8Dnj94e3oneu9y/vZRJuJoonbJz7yMffh+2z16fJl+ab7fu/b5Wfmx/er8Xvob+Ev9N/h/5Rjx8nh7Oa8CHAOkAUcDnjD9eLO454MxAJDAksDO4K0gxKDqoLuB5sHZwXXB/eFuIXMCTkZSggND10depNnzBPw6nh9YR5h88LOhKuHx4dXhT+MsI+QRbREopFhkWsj70ZZRUmimqJBNC96bfS9GJuYGTFHY4mxMbHVsU/ixsfNjTsXz4ifFr8r/nVCQMLKhDuJtomKxLYkelJaUl3Sm+TA5DXJXZPGTZo36WKKYYo4pTmVlJqUuiO1f3LQ5HWTu9Pc0krSbkyxmTJryoWphlPzph6bRp/Gn3YwnZCenL4r/SM/ml/D78/gZWzM6BNwBesFz4X+wnJhj8hHtEb0NNMnc03msyyfrLVZPdl+2RXZvWKuuEr8Mic0Z0vOm9zo3J25A3nJeXvzyfnp+Uck2pJcyZnpJtNnTe+UOkhLpF0zvGasm9EnC5ftkCPyKfLmAh34o9+usFV8o3hQ6FtYXfh2ZtLMg7O0Zklmtc+2n71s9tOi4KIf5uBzBHPa5prNXTT3wTzOvK3zkfkZ89sWWCxYsqB7YcjC2kXURbmLfi52Ll5T/Nfi5MUtS4yXLFzy6JuQb+pLNEpkJTeXei/d8i3+rfjbjmUTlm1Y9rlUWPpTmXNZRdnH5YLlP303/rvK7wZWZK7oWOm+cvMq4irJqhur/VbXrtFaU7Tm0drItY3lrPLS8r/WTVt3ocK1Yst66nrF+q7KiMrmDZYbVm34WJVddb06oHrvRqONyza+2STcdGWz/+aGLcZbyra8/178/a2tIVsba6xrKrYRtxVue7I9afu5H9g/1O0w3FG249NOyc6u2rjaM3UedXW7jHatrEfrFfU9u9N2X94TuKe5wbFh617m3rJ9YJ9i32/70/ffOBB+oO0g+2DDIatDGw8zDpc2Io2zG/uaspu6mlOaO4+EHWlr8W45fNTp6M5Ws9bqY7rHVh6nHl9yfOBE0Yn+k9KTvaeyTj1qm9Z25/Sk09fOxJ7pOBt+9vyPwT+ePsc5d+K8z/nWC14XjvzE/qnpovvFxna39sM/u/18uMO9o/GSx6Xmy56XWzondh6/4nfl1NXAqz9e4127eD3qeueNxBu3bqbd7LolvPXsdt7tl78U/vLhzsK7hLul9zTvVdw3ul/zq92ve7vcu449CHzQ/jD+4Z1HgkfPH8sff+xe8oT2pOKp6dO6Zy7PWnuCey7/Nvm37ufS5x96S37X+n3jC9sXh/7w/6O9b1Jf90vZy4E/l78yeLXzL9e/2vpj+u+/zn/94U3pW4O3te/Y7869T37/9MPMj6SPlZ/sPrV8Dv98dyB/YEDKl/EHfwUwoNzaZALw504AaCkAMOC+kTpZtT8cNES1px1E4D9h1R5y0NwBaID/9LG98O/mJgD7tgNgDfXpaQDE0ABI8ATohAkjbXgvN7jvVBoR7g2+n/opIz8D/BtT7Um/inv0GShVXcHo878AOMuDHISO9HUAAACKZVhJZk1NACoAAAAIAAQBGgAFAAAAAQAAAD4BGwAFAAAAAQAAAEYBKAADAAAAAQACAACHaQAEAAAAAQAAAE4AAAAAAAAAkAAAAAEAAACQAAAAAQADkoYABwAAABIAAAB4oAIABAAAAAEAAAGsoAMABAAAAAEAAAJ2AAAAAEFTQ0lJAAAAU2NyZWVuc2hvdLXYcxwAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAHWaVRYdFhNTDpjb20uYWRvYmUueG1wAAAAAAA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJYTVAgQ29yZSA2LjAuMCI+CiAgIDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+CiAgICAgIDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiCiAgICAgICAgICAgIHhtbG5zOmV4aWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20vZXhpZi8xLjAvIj4KICAgICAgICAgPGV4aWY6UGl4ZWxZRGltZW5zaW9uPjYzMDwvZXhpZjpQaXhlbFlEaW1lbnNpb24+CiAgICAgICAgIDxleGlmOlBpeGVsWERpbWVuc2lvbj40Mjg8L2V4aWY6UGl4ZWxYRGltZW5zaW9uPgogICAgICAgICA8ZXhpZjpVc2VyQ29tbWVudD5TY3JlZW5zaG90PC9leGlmOlVzZXJDb21tZW50PgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4KqhKRcQAAABxpRE9UAAAAAgAAAAAAAAE7AAAAKAAAATsAAAE7AAApreoNkdQAACl5SURBVHgB7J0LuBVV+YcXRoZRQJYYaqUUKIIaUgbI3cJQA42IwhTikmSCqCQmKEEXkhBTIklTuhBSXsCEJCDAIiUpksSALEIIUAlMLUkI+fsb2/y3+8xtnzN7n/3t9a7nOc/ZZ2bNmm+935z9m1nzrW81OPhqcRQIQAACEIBAhRNogGBVuIcwDwIQgAAEAgIIFhcCBCAAAQiYIIBgmXATRkIAAhCAAILFNQABCEAAAiYIIFgm3ISREIAABCCAYHENQAACEICACQIIlgk3YSQEIAABCCBYXAMQgAAEIGCCAIJlwk0YCQEIQAACCBbXAAQgAAEImCCAYJlwE0ZCAAIQgACCxTUAAQhAAAImCCBYJtyEkRCAAAQggGBxDUAAAhCAgAkCCJYJN2EkBCAAAQggWFwDEIAABCBgggCCZcJNGAkBCEAAAggW1wAEIAABCJgggGCZcBNGQgACEIAAgsU1AAEIQAACJgggWCbchJEQgAAEIIBgVfg1cPDgQbdmzRr3+OOPu61bt7o9e/a4Sy65xLVr167iLN+9e7d78MEH3fHHH++6dOlScfZhEAQgYJsAglWh/nvllVfcPffc42699Va3efPmGlYuWLDAtW/fvsb2+tiwf/9+N2vWLDdt2rRDp+/YsaO75pprKsbGQ4bxAQIQMEsAwapA1+3atcuNHTvWrVy5MtK6M844w919992R+8u54/rrr3c/+MEPQk8p0Ro5cqRr0KBB6H42QgACEEhLAMFKS6pM9R5++GE3fPhw9+9//zv2jKeddpr72c9+FlunXDsvvvhi99BDD0Werk+fPu7mm292b3rTmyLrsAMCEIBAEgEEK4lQGfdLgEaNGpXqjF27dnVz5sxJVbfUlQYMGOAeffTR2NOcd955bsaMGe6www6LrcdOCEAAAlEEEKwoMmXefscdd7jJkyenPut1110XPImlPqCEFYcNG+aWLVuWeAYNDX7pS19KrEcFCEAAAmEEEKwwKmXe9o1vfCMIrkh72re//e3B+60mTZqkPaSk9b7zne+4G264IdU57rzzTnfWWWelqkslCEAAAvkEEKx8GvXw+dvf/rb75je/mfrMH//4x93o0aPdCSeckPqYUld85plnXM+ePRPfu8kOie2vfvUr95a3vKXUZtE+BCBQZQQQrHp06P333x+ITxoTFBWowIVjjjkmTfVa1/nPf/4TBHM89dRTrmXLlu7UU091rVq1Smxv0aJF7tJLL02spwqDBw8uavgzVaNUqmgCzz33nNuyZYt79tlnnW5w9PeLL77o9u7dG7zX1A1M06ZNg2kQp59+unvjG99Y0f3BuPohgGDVD3f329/+1n3yk59MdXa9I7r22mtdw4YNU9WvbaXHHnvMXXHFFTXmff3kJz9xmleVVObNm+fGjRuXVC3Yv2TJEnfiiSemqkslmwQkSnq3+cADD8RGkYb1TkE6F1xwgfvwhz8ctpttnhJAsOrB8c8//3wwhKbMEEllxIgRbvz48SWfx6QvlcsuuyzUHEUB5k8KDq30v42rVq1yF154YVyVYF/v3r3d7bffnliPCvYIKDPL7NmznZ6661o6d+7sbrzxxpKPLNTVTo4vDwEEqzycX3cWCVCakPRyidXq1avdwIEDX2dj/h+9evUKvoDyt8V9Vvh6GoGrpGwdcf1hXzoCysiiJ+ykKQ7pWvv/Wo0bN3YK1knzlP//R/GpGgkgWGX26u9+9zvXv3//xLOWQ6z+9a9/uZkzZzpF+cWVYgXrv//9r1NwyLp16+KadT169IjMkBF7IDsrjoAynSjjSSmLhpw7depUylPQdoUTQLDK6KB9+/a5s88+u8Y7okITFJQwadKkkg0D6kX3T3/60yAUPSmjhmwrVrB0zJNPPpnq/cPPf/5z17ZtWx1CMUhAQToTJkwoS5owPWkpXVnz5s0NksLkLAggWFlQTNlGmrtQvWjWmP0b3vCGlK2mryahUv7Bb33rWy7N+7Ncy7URLB17yy23BH3JtRP2u1+/fkG9sH1sq2wCL7zwgvvMZz6T+CSdZS8UjKFRAYqfBBCsMvn95ZdfDoYz4oTiE5/4RPDUk3U0oIRKkX4SkLjzR6GorWClfaLUO4+jjz466vRsr0ACGvYdOnRo0dF/yoF5yimnBEEUmpOn/JJ///vfg4nzaZ72hWLx4sWuTZs2FUgFk0pNAMEqNeH/tZ8U8t29e/fgxXKWYiWhmjt3bvCUk/bLIAxHbQVLbSliTEIcV5SZPm0Oxbh22Fc+Ar/4xS/c5z73uVQn1FDexIkTg6Hlo446KvQYXav33XdfcK0m3VR9/vOfD5auCW2IjVVNAMEqg3t1N6oFDXfu3Bl6tne/+91u4cKFwcTJ0ApFbjxw4IC79957nVI+Jf3zp2m6LoKl9q+++urgCS/qXLrT1lNWlmIddS62Z0Pgi1/8YvAeNKk1JWm+6aabXJRQFR6v/5HPfvazbsOGDYW7Dv2t60U3QqUYNj90Ej5UJAEEqwxuUfj25ZdfHnmmpUuXutatW0fuL2aHhuE0+VcCmFWpq2BpAumZZ54Zm7pJ7/cUNUixQUBPxUnrsWnYTtd+o0aNiurUSy+95DT3b/369ZHH/eY3v3HHHXdc5H52VCcBBKsMfv3oRz8aeceY5XDYwYMH3ZVXXhkMrRTTLaVg0pBM1BNgXQVLtigqUXflUeXcc89NDK+POpbt5SeQlPBYw4CKAD3++ONrZVzSUjtps6/U6uQcVLEEEKwSu0Z3ifoyDisSCqUoyipvWrGJdGXTVVddFawIrDD6qMnMWQiWhikVARk3N2vt2rVBctwwVmyrLAK6bjVXMKrU9T2TRgo0UThqSFsT0/UURvGLAIJVYn9rjSutdRVWFF6uL/GsikKMf/3rX6dqTl8GU6ZMCRLc6oC47BtZCJbOkRSA8fWvfz1VWie1RalfAgriOfnkkyON0DVf1zyAyr6iLCxhJcuRibD22VaZBBCsEvpFoezt27cPfXejF8dKgJvV05W6ETf0mOumhmr0NKXIvQYNGuQ2l0WwdDJl+VC2j7CiSMkf/vCHYbvYVoEEot7N6hrT03Kx764Kuxh3rSiQ5wtf+ELhIfxd5QQQrBI6+MEHHwyG28JOcdFFF7mvfvWrYbtqvU3RWHpqiyoaQtGKvxLLwlKOJyydM2nytIZQ3/rWtxaax98VSuCPf/xjEFih91WaSKxrS+Huur7rWvr27Rs5hKyRC2WEofhFAMEqob8Vnrt8+fLQM+ilddS7rdADUmxUbkC9V3j44YcP1dYXiIRK//xxKZDKJVjbtm0LQvwPGVjw4dZbb3XnnHNOwVbbfz7xxBPu8ccfDzpx+OGHu2bNmgUTpRWQoKcRSk0Ceof1/ve/P3R0QrW16Gna5Xlqts4WqwQQrBJ5TlF3J510UmTrK1asOPT+KLJSLXZozpeG3LZu3ere9a53uQ9+8IOp5jfpyUuTjMNKVu+wcm3HDV0qaa6eFCup6MkhaW5QnL1xk7ZbtGgR3Ei0a9cuuF7e9773Of3kD9fGtV2t+7QqddxTWilu+KqVZTX1C8EqkTe1uqreyUQVJYfV3XalFL0PiJq7lfXaVQqu+O53vxvadT1xaJipkiYRp8mJGNqZWm7Uk/fUqVOdVuH1tcTdQImJbq40t4/iFwEEq0T+jltjSpkt0kbzlci8Gs1efPHFkXnhPv3pTwdZM2ocVMsNSSHRmrP1oQ99qJatZ3+Y3skoFVE5i6I4f/zjH1eUcJer/xolOPXUUyOHA2XHxo0b3RFHHFEukzhPhRBAsErkiKgIKp1O4b5Roe4lMiex2bgX3GPGjAmyZyQ2krLCrl273Ac+8IHI2pUWATZs2LBgqfdIg0u0o9KEu0TdrNGslhCJC6jo06ePmzVrVo3j2FD9BBCsEvlY/1Ca5xRWRo4cGUTrhe2rr209e/aMXKfra1/7WrCMRJa2Kcec3rOFlazfmYWdo5htmqSqVZTLXXQN6cvZp6KnK73j1JB5VMl6/mLUedheeQQQrAifaLnvHTt2uNNPP929+c1vjqgVvVmJZxXxFlYqcdKj+hmVVeC2224LFp4M60ttt8W9M9N7LIW3H3bYYbVtPtPjku74Mz1ZXmM+5su7/fbbE6d7/OEPf3BHHnlkHik++kKg6gVrz549TndtusDTvsjXnBKllskV3fFfd911RUX1xb001gqtcWltcuct5+/3vOc9kafT8KYmQGdZtAifAguiyi9/+csgWi5qfzm3K0ej0l7pSatcRe+wlC/Pp6LoVk0Wjis33HCD+9SnPhVXhX1VTKAqBUtLE+iF9Y9+9KNDrlPYsEJh476YVVnHXXvttYeOy33QXb+GaLp165bbFPv70ksvdYsWLQqto7WBtPhdpRTl+VNew6jy0EMPOc0ZyrJIkOIYTJ8+PfHLK0t70rT1t7/9zWlJ+NoUvbdT9KNyKerJKSnU/fvf/37stIja2FDJxyjrS9K8KkXdauK57yH/lezHUttmQrBefPFFN3/+fPfss8+6j3zkI0EEUdRFqwmFuhsOK/pSVrRXVDj5qlWrYnPZSbT0hZMmnVJcXr9LLrkkVBTDbC7HtqS8cKVISvvUU0/Fir+eQPUkWq3lz3/+c5D4WOIlIVSmfEWPnn/++W748OGZrY1mgV/S/12uD4888kiwUnHub377R8CEYH3lK19x3/ve9w55Z9CgQU5zecJEK04o1ICWuLjssssOtZX7oHdW5513Xuydr+qmfRGuUPD8jBO58+h31vOa8tuuzed//OMfrkOHDpGH6stVS5lnWZKe6hS1WB+BDln2kbaSCdx///1u9OjRiRU1mVyTyil+EzAhWBrPL1yrKSxyLSntj1ytpyxlmcgvSmn0sY99LDJKLr+uotuiluHIrxcnWGE25B9b7s+K1lO/ooqehkpR4iITfXyHUwrGldqm3gvqPaZGRJKKbjDj1lJLOp791UPAhGCFRbCdffbZTtFr+SUud1+uXqFY6B9HEWtR75tyx+X/VtTYCSeckL+pxuc4wVLlv/71r6mDQGo0nvEGTcIUz7CiXIQaEixFiZvfVImTq0vBwMc29R7wmmuuCYb5k/qvPJgStbDRlKRj2V99BMwKlnKw5a+Vkzb0uFCwtJyFIgCLKWFPd4XHJ2VH0Iqqp512WuFh9fK3woT17iSstGrVqmSTZguHegvPX6onu8LzlPNvvYfVDYCmSsj/TZs2Lefp6/1cCj7R+0ldc0lFQ+eaGpI2ujepPfbbJ2BWsPLv/NNMNsy5Kl+wlEFb762KLUpjpC/buKKAgfwoxcK6usPMD50v3F/Ov/XiX+8Fw4q+VCWupShJNwsW0+88//zz7umnnw6GsDWMvX379mCCtHJL/uUvf6nxjlQ3BBr+POaYY4KpF8rkrmu0devWpUBer20qevfCCy+MnO+Xb5ye+DVkmCbAKf84Plc3gaoQrHnz5rlx48al8lROsPTFon+KwndjaRpRWqF77703tqpm48dlHa+kdzTLli1zGp4LK6W0UxGbehKNKpUybKobIs3Nk7379+93Rx11lHvb294W/Oj9pyY5K9IvLjtDVB+jtisBrubyKeN+NZSk/JH5fdTioppvxZNVPhU+i4BZwZLxumvVsFFcVnTVyy8SLK1RpdDyuiQ01RdUXCaGNDP2//SnP1XEekh6gho1alQ+pkOfS7kKcNISEpUwJKjsH3o3qukM9VF0HWlozGrRO2JF1irzS5qikH6tzRb3v5WmHepUJwHTgqWwcd2h6w43bZFgaYJi0j+QAjqUfFRPH2ElKW1OnAjk2tMkyB49euT+rLff6mdUFFYpE/XGTRbVnDcJen0XhdaXM8NFWH/TBPmEHVff215++eVgvuE999yTypQrrrjCKdEyBQJRBEwLlrJXFCNWURAKt2u4T8N+S5cuDSZxFu7X38qI0aVLl7BdwbZHH300WOk3ssKrOy644ILYJe3jjs1yX9y7pLBozKzO/dhjj7l+/fqFNlcpUYLKGi7BqM9icXVdze3TKIbSLaUplZiuLI3d1CkvAdOCVQpU+Xnz9JJYmaPDStKKp4oG02q/SUWBH02aNEmqVtL9GrKJyixfygm8SlWk+W9hRTcjxUw1CGsji21xWeWzaD9NG5UUoJPGXv3faBg17fthjXZoGggFAkkEqlKwlA1dT0fFvne47777XpfxQYEZWkgurKS5601zd56mnbDzZ7ktLkCklEvWx+UTPOOMM9zdd9+dZTdr1Vbc5OZaNViLg/JvompxeNkO0U3a4sWLi5omoiFX3RRRIJCGQNUJVps2bYKl3vVFW4xgKTO2IuIKy8knn1wjFFl10oy3J0XBqZ00EYeqV8oStxTKwIEDY7Oq18WuqETDarOU786Ksfnmm292SsRb7qInTH2RK2JOUzgqtSjFlt7n6p1vsatoa9pH2mTSldp/7CovgaoTLN3hSbT0z55WsJSdXePtYUXLHYSNwytYQkETcUUh0JrHFJeZW8eXIht6nF2F+7785S+72bNnF24O/lZuRk2ULkW58cYb3S233BLatKLFip3QHdpQHTcqtF7Ly6QpCuiR0Jx44onB0ihaGUAh8MrSoGtB4fH6vW/fPqeAhBdeeMFpIq2SO7/yyitB3Xe+851BZvxKFql//vOfwdw85QEM+99Iw0o3iLpZI3Q9DS3q5AhUlWBdeeWV7vLLLw/6llawdCev0OGoMFotBaKlHgqLotieeOKJxJQxujvXXXpc0eRlTZKsrxK3dteQIUPcpEmTSmKahm6jhv0mT54cu0x6SQyKaDRqsUlFm2roUjdI733ve90RRxwR0UJ1bJao6lrWEHKWReLcvHnzIOvHO97xjmDumbgqIbNvmUCy5FqNbVWNYOlL44EHHjg0Mz6NYCm9k4bt4v4ptKxJVKit5hElra+lu2gNNSY9ZdXnuyyFtCu0PayUcpkPDTfmp9fKP/+dd97pzjrrrPxN9fZZT0UavlKQiERJT9cayqp2gcoHLgZXX32103vechaNUCgaV/9DEjDdKFL8JVA1glX4YjqNYCl7Qdu2bWO9r8mrUePsSZGCuYbjwsZzdfS7voYG4xabVPooRallXTQ8pqeSqKI1kqoly0NUHy1tV06/pLmL5eiPJlGfc845wf9kJQ+bloOFj+eoCsEKi2RLEizlAlROwDQlKrR55MiRQfqcpDZ0d6r3IFrGI65oGXpNsiz3uL5C9xWKHFa0VtFVV10VtqtO2+JC2vVFVKoM8XUy2tODla1CN3ZJowTlxqOnLwWlaC5f3ChJue3ifKUjUBWCpWElDe/ll7ilRvTOSKsSp12yQHOUNFepsBQzV0jBIFGBHfnt6r1OVJqk/HpZfdZ7ibilUiRWaRbYK9YeDflFvRvTl5ACMiiVQUBBFhKHSi4KDtKPXg1QqpeACcHS+HXUJMSoFYSj3j3pglYmi2LGwrUasebjhBU9mWipiKSiu1R9EaeJqlq4cKE75ZRTkprMZL/escWdK4pvXU8et/xKfb7Pq2u/qvF43dTo5qzSnrDCWKdZ+ifsOLbZIGBCsDTnKexlr56qtHpw1MvvwmACpftROK2Wcii2KPhAGacLi4au0o6lxy2UmN+uwqP1fi2qX/l16/o56QkrP/KyrufKHa8h0rhwfyUnjnu/lWuH3+UjUOw7LP1PaKj+uOOOC27ocsPcCunXj7Lca6KxfnTTp5vCrIqms2iZFkr1ETAhWFFZvTVM16dPn0iv6ItRCVY1l0ZPQRoKrK0IPPPMM8FTVuFdZjGCJUPj5h7ld0RDmpofVY4S9w5LIczKeZhliXr61Tn0Rff73/8+9XBtlnbRVjSBtNGuakGBEXpKLkY0NC9t06ZNTjk4NRFZNy21LZrArJtTSvURMCFYiijTJNK5c+ce8oCCKjTpNO17qEMH1uGDVknVnKVcgEJtlt7Yu3dv8A+dFIAhM8sV2h2Xrf2RRx6p1RNpFGY90SlcPeqOWumsNAeLUnkECq//Qgt1s6Enci3SWNf/S91sKrG13k9LvCRkacuaNWuCeV1p61PPDgETgpXDqYwAyu/XqFGj4E68rv8UuXaL+a1UNHra0hCHJjvWpmhZlDTJPvWeTXPLSj08phsCTXAunLysYJOolYhr028dE5c/UPsrZckV2UKpSUDXiobnNQVDIxf6X1TQjp7SdSOSG/qreWTdtuzYsSPIU6iEyHHvgcn6XjfOlX60KcGqdJjF2Hf99dcnpnZSe7prVai73muVumi487nnngvSB+ndQ9bLk+t9hVJdRT1dSqC15Mjhhx9e6q7SvmECe/bsCRZu1fUkAVVIu/5PjjzySHf00Ucb7hmmJxFAsJIIlWi/xEF3pVFf3vmn1Rf5HXfc4Tp16pS/2dRnRXledNFFscvIlzKrhilYGAsBCIQSQLBCsZRno4Y29MSRtqTJzJG2rXLW0yRh5STUcvNxRUvCtG7dOq4K+yAAAY8JIFj17PybbropdTJRDQsq96GlIbN58+a5cePGJVI+99xznVJdUSAAAQhEEUCwosiUabvG4BVVFZUEttAMK6vPvvTSS0Fkp96/pSmaT1eO93RpbKEOBCBQmQQQrArwS9QcrzDTLAQmKGRdmSyefPLJsC7U2FZMXscaB7MBAhDwhgCCVSGu1hyX888/P5U1Wc+NSnXSlJX0pKhlQ9KWSlmoMa291IMABOqPAIJVf+xrnFlzTLTUR1Kp1OGzp59+OshKX5gNJKo/St2jzB9Ri2dGHcd2CEDATwIIVoX5XUEVyp0Y96VfqalnlI5HWfDTFC23ctttt2U+1yvNuakDAQjYJIBgVaDfNHlWa/xEFWUYKFVGgahzptmeNk+iQtyVaqsS+5Cmn9SBAATqhwCCVT/cE88alfC3VatWbtmyZYnH10eFKJvzbdGqtWnSUuUfw2cIQAACIoBgVfB1IGEaNmzY6ywsRfb0152gDn8ose+ZZ54ZOkFYT4xjxowhdL0OfDkUAr4TQLAq/ApQZvg5c+YE+f2Uyknvfiq5KPBCQ4Oah6XEqFows0uXLu6kk06qZLOxDQIQMEAAwTLgJEyEAAQgAAGGBLkGIAABCEDACAGesIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQQLCMOAozIQABCPhOAMHy/Qqg/xCAAASMEECwjDgKMyEAAQj4TgDB8v0KoP8QgAAEjBBAsIw4CjMhAAEI+E4AwfL9CqD/EIAABIwQMCdYBw8edHPnznUzZsxw+/btc02bNnUtW7Z0/fr1c3379i0Ke5Zt7d27102ZMsUtXLgwsKF58+aubdu2btCgQa5Dhw5F2UVlCEAAAhCoScCUYO3evduNHTvWLV++vGZPXt2ifaNGjQrdV7gxy7bWr18fnHfz5s2Fpwn+vuuuu1znzp1D97ERAhCAAATSETAjWBIDPa3s3Lkztmfr1q1zzZo1i62TZVtLly51w4cPjz1f+/bt3YIFC2LrsBMCEIAABOIJmBGsrl27uq1bt8b35tW9EydOdEOHDo2tl1VbO3bscJ06dYo9V27nokWLXLt27XJ/8hsCEIAABIokYEKw9uzZ4/SUkqb06tXLzZ49O7Jqlm0tWbLEjRgxIvJc+TsmTJiQum7+cXyGAAQgAIHXCJgQrE2bNrnevXun8pkCMFasWBFZN8u25syZ48aPHx95rvwdQ4YMcZMmTcrfxGcIQAACECiCgAnBWr16tRs4cGCqbiUJVpZtKVJx2rRpqewaPHiwmzx5cqq6VIIABCAAgZoETAjWypUrnb7w05QBAwbEikiWbc2cOdNNnTo1jVlu+vTprn///qnqUgkCEIAABGoSMCFYiurr2bNnTetDtixevNi1adMmZM9rm7Jsa/78+W7MmDGR58rtaNGihZNQNmrUKLeJ3xCAAAQgUCQBE4J14MAB16NHj8QowVmzZrk+ffrEIsiyrS1btrju3bvHnq9x48bBZGINVVIgAAEIQKD2BEwIlrq3bdu2IMpuw4YNNXorMVBAQ7du3WrsC9uQZVurVq1yo0ePdpqIXFgksgqzR6wKyfA3BCAAgeIJmBGsXNfWrFnj1q5d67Zv3+6aNGkShLvrKadhw4a5Kql/Z9XW/v37g+wbGzdudLt27XLHHnus69ixY+pQ/NQGUxECEICAxwTMCZbHvqLrEIAABLwmgGB57X46DwEIQMAOAQTLjq+wFAIQgIDXBBAsr91P5yEAAQjYIYBg2fEVlkIAAhDwmgCC5bX76TwEIAABOwQQLDu+wlIIQAACXhNAsLx2P52HAAQgYIcAgmXHV1gKAQhAwGsCCJbX7qfzEIAABOwQQLDs+ApLIQABCHhNAMHy2v10HgIQgIAdAgiWHV9hKQQgAAGvCSBYXrufzkMAAhCwQwDBsuMrLIUABCDgNQEEy2v303kIQAACdgggWHZ8haUQgAAEvCaAYHntfjoPAQhAwA4BBMuOr7AUAhCAgNcEECyv3U/nIQABCNghgGDZ8RWWQgACEPCaAILltfvpPAQgAAE7BBAsO77CUghAAAJeE0CwvHY/nYcABCBghwCCZcdXWAoBCEDAawIIltfup/MQgAAE7BBAsOz4CkshAAEIeE0AwfLa/XQeAhCAgB0CCJYdX2EpBCAAAa8JIFheu5/OQwACELBDAMGy4ysshQAEIOA1AQTLa/fTeQhAAAJ2CCBYdnyFpRCAAAS8JoBgee1+Og8BCEDADgEEy46vsBQCEICA1wQQLK/dT+chAAEI2CGAYNnxFZZCAAIQ8JoAguW1++k8BCAAATsEECw7vsJSCEAAAl4TQLC8dj+dhwAEIGCHAIJlx1dYCgEIQMBrAgiW1+6n8xCAAATsEECw7PgKSyEAAQh4TQDB8tr9dB4CEICAHQIIlh1fYSkEIAABrwkgWF67n85DAAIQsEMAwbLjKyyFAAQg4DUBBMtr99N5CEAAAnYIIFh2fIWlEIAABLwmgGB57X46DwEIQMAOAQTLjq+wFAIQgIDXBBAsr91P5yEAAQjYIYBg2fEVlkIAAhDwmgCC5bX76TwEIAABOwQQLDu+wlIIQAACXhNAsLx2P52HAAQgYIcAgmXHV1gKAQhAwGsCCJbX7qfzEIAABOwQQLDs+ApLIQABCHhNAMHy2v10HgIQgIAdAgiWHV9hKQQgAAGvCSBYXrufzkMAAhCwQwDBsuMrLIUABCDgNQEEy2v303kIQAACdgggWHZ8haUQgAAEvCaAYHntfjoPAQhAwA4BBMuOr7AUAhCAgNcEECyv3U/nIQABCNghgGDZ8RWWQgACEPCaAILltfvpPAQgAAE7BBAsO77CUghAAAJeE0CwvHY/nYcABCBghwCCZcdXWAoBCEDAawIIltfup/MQgAAE7BAwJ1gHDx50c+fOdTNmzHD79u1zTZs2dS1btnT9+vVzffv2LYp8lm3t3bvXTZkyxS1cuDCwoXnz5q5t27Zu0KBBrkOHDkXZRWUIQAACEKhJwJRg7d69240dO9YtX768Zk9e3aJ9o0aNCt1XuDHLttavXx+cd/PmzYWnCf6+6667XOfOnUP3sRECEIAABNIRMCNYEgM9rezcuTO2Z+vWrXPNmjWLrZNlW0uXLnXDhw+PPV/79u3dggULYuuwEwIQgAAE4gmYEayuXbu6rVu3xvfm1b0TJ050Q4cOja2XVVs7duxwnTp1ij1XbueiRYtcu3btcn/yGwIQgAAEiiRgQrD27Nnj9JSSpvTq1cvNnj07smqWbS1ZssSNGDEi8lz5OyZMmJC6bv5xfIYABCAAgdcImBCsTZs2ud69e6fymQIwVqxYEVk3y7bmzJnjxo8fH3mu/B1DhgxxkyZNyt/EZwhAAAIQKIKACcFavXq1GzhwYKpuJQlWlm0pUnHatGmp7Bo8eLCbPHlyqrpUggAEIACBmgRMCNbKlSudvvDTlAEDBsSKSJZtzZw5002dOjWNWW769Omuf//+qepSCQIQgAAEahIwIViK6uvZs2dN60O2LF682LVp0yZkz2ubsmxr/vz5bsyYMZHnyu1o0aKFk1A2atQot4nfEIAABCBQJAETgnXgwAHXo0ePxCjBWbNmuT59+sQiyLKtLVu2uO7du8eer3HjxsFkYg1VUiAAAQhAoPYETAiWurdt27Ygym7Dhg01eisxUEBDt27dauwL25BlW6tWrXKjR492mohcWCSyCrNHrArJ8DcEIACB4gmYEaxc19asWePWrl3rtm/f7po0aRKEu+spp2HDhrkqqX9n1db+/fuD7BsbN250u3btcscee6zr2LFj6lD81AZTEQIQgIDHBMwJlse+ousQgAAEvCaAYHntfjoPAQhAwA4BBMuOr7AUAhCAgNcEECyv3U/nIQABCNghgGDZ8RWWQgACEPCaAILltfvpPAQgAAE7BBAsO77CUghAAAJeE0CwvHY/nYcABCBghwCCZcdXWAoBCEDAawIIltfup/MQgAAE7BBAsOz4CkshAAEIeE0AwfLa/XQeAhCAgB0CCJYdX2EpBCAAAa8JIFheu5/OQwACELBDAMGy4ysshQAEIOA1AQTLa/fTeQhAAAJ2CCBYdnyFpRCAAAS8JoBgee1+Og8BCEDADgEEy46vsBQCEICA1wQQLK/dT+chAAEI2CGAYNnxFZZCAAIQ8JoAguW1++k8BCAAATsEECw7vsJSCEAAAl4TQLC8dj+dhwAEIGCHAIJlx1dYCgEIQMBrAgiW1+6n8xCAAATsEECw7PgKSyEAAQh4TQDB8tr9dB4CEICAHQIIlh1fYSkEIAABrwkgWF67n85DAAIQsEMAwbLjKyyFAAQg4DUBBMtr99N5CEAAAnYIIFh2fIWlEIAABLwmgGB57X46DwEIQMAOAQTLjq+wFAIQgIDXBBAsr91P5yEAAQjYIYBg2fEVlkIAAhDwmgCC5bX76TwEIAABOwQQLDu+wlIIQAACXhNAsLx2P52HAAQgYIcAgmXHV1gKAQhAwGsCCJbX7qfzEIAABOwQQLDs+ApLIQABCHhNAMHy2v10HgIQgIAdAgiWHV9hKQQgAAGvCSBYXrufzkMAAhCwQwDBsuMrLIUABCDgNQEEy2v303kIQAACdgggWHZ8haUQgAAEvCaAYHntfjoPAQhAwA4BBMuOr7AUAhCAgNcEECyv3U/nIQABCNghgGDZ8RWWQgACEPCaAILltfvpPAQgAAE7BBAsO77CUghAAAJeE0CwvHY/nYcABCBghwCCZcdXWAoBCEDAawIIltfup/MQgAAE7BBAsOz4CkshAAEIeE0AwfLa/XQeAhCAgB0CCJYdX2EpBCAAAa8J/B8AAAD//8MFlgsAAEAASURBVO2dCfwVVfn/j4aJmbikuGG5ZRqoPyJTMEGsTCxFc0FBkwgrTQlLxRRFaCEJLRcSTcUSwZTcSQJDS0xFcze1jBRyJcnKXCDz73tq+A/3zjlzZu7MvXfmfp7XC+79njlzls+5M885z7ra2++QEQkBISAEhIAQaHMEVhPDavMV0vCEgBAQAkIgQEAMSz8EISAEhIAQKAUCYlilWCYNUggIASEgBMSw9BsQAkJACAiBUiAghlWKZdIghYAQEAJCQAxLvwEhIASEgBAoBQJiWKVYJg1SCAgBISAExLD0GxACQkAICIFSICCGVYpl0iCFgBAQAkJADEu/ASEgBISAECgFAmJYpVgmDVIICAEhIATEsPQbEAJCQAgIgVIgUDqGRazeGTNmmPPPP98sX77crLvuumbrrbc2gwcPNvvvv38q0PNs6/XXXzcTJ040N998czCG7t27m549e5qhQ4eaPn36pBqXKgsBISAEhEA9AqViWC+//LI58cQTzfz58+tn8k4J144//vjYa7WFebb16KOPBv0uWrSotpvg75kzZ5p+/frFXlOhEBACQkAI+CFQGoYFM+C08vzzzztn9tBDD5n11lvPWSfPtubNm2dGjhzp7K93797m+uuvd9bRRSEgBISAEHAjUBqGtccee5jFixe7Z/PO1XHjxpkRI0Y46+XV1nPPPWf69u3r7Cu8OHv2bNOrV6/wT30KASEgBIRASgRKwbCWLVtmOKX40F577WWmTZtmrZpnW3PnzjVHH320ta/ohbFjx3rXjd6n70JACAgBIfBfBErBsJ588kmz9957e60ZBhi33XabtW6ebU2fPt2cdtpp1r6iF4YPH27Gjx8fLdJ3ISAEhIAQSIFAKRjW3XffbYYMGeI1rSSGlWdbWCpOnjzZa1xHHXWUmTBhglddVRICQkAICIF6BErBsG6//XbDC9+HDjnkECcTybOtKVOmmEmTJvkMy5xzzjnmoIMO8qqrSkJACAgBIVCPQCkYFlZ9AwcOrB99TMmcOXPMDjvsEHPlv0V5tnXdddeZ0aNHW/sKL2y66aYGRtm1a9ewSJ9CQAgIASGQEoFSMKy33nrL7LnnnolWglOnTjWDBg1yQpBnW08//bQZMGCAs7+11147cCZGVCkSAkJACAiB7AiUgmExvSVLlgRWdo8//njdbGEGGDT079+/7lpcQZ5tLViwwIwaNcrgiFxLMFnM7MWsapHR30JACAiB9AiUhmGFU7v33nvN/fffb5599lnTrVu3wNydU06XLl3CKt6febW1YsWKIPrGE088YZYuXWo233xzs9tuu3mb4nsPWBWFgBAQAh2MQOkYVgevlaYuBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKujl1+TFwJCQAiUBwExrPKslUYqBISAEOhoBMSwOnr5NXkhIASEQHkQEMMqz1pppEJACAiBjkZADKvNl//tt9829957r3nkkUfM4sWLzbJly8yXv/xl06tXrzYfud/w3nzzTfPLX/7SvPXWW2a//fYzXbp08btRtYSAEOg4BMSw2nTJ//Of/5hZs2aZCy+80CxatKhulNdff73p3bt3XXmZCmbPnm2+9a1vmeeffz4Y9vvf/35z0kknBYxrtdVWK9NUNFYhIASagIAYVhNATtvF0qVLzYknnmhuv/12660f+9jHzDXXXGO93u4XYFbHHnts7DA/85nPmO9///tm7bXXjr2uQiEgBDoTATGsNlv33/72t2bkyJHmX//6l3NkO++8s7nxxhudddr5IifH733ve9YhfvCDHzSXX3656dGjh7WOLggBIdBZCIhhtdF6w4COP/54rxHtscceZvr06V5127HSueeea8455xzn0Lbeemtz7bXXmvXXX99ZTxeFgBDoDATEsNpknS+99FIzYcIE79GcfvrpwUnM+4Y2q3jJJZcE+qukYX30ox81V155penatWtSVV0XApkRwLgJ6Qb/Bg8ebLbbbrvMbenG4hAQwyoOW++WEY0hIvOl973vfYF+q1u3br63tF293/3ud+Zzn/uc17iwijz11FO96qqSEEiLwIIFC8y3v/1t8/jjj6+8FcvV7bfffuXf+tIeCIhhtXgdLrjggsDAwHcYvORHjRplttpqK99b2rLev//9bzNkyBBz3333eY3vpptuMjvttJNXXVUSAr4IYNh01FFH1VX/8Y9/bPbee++6chW0FgExrBbif8MNNwTMx2cIWAWi99lss818qsfWYQd58cUXm/e85z3By3/XXXc1W265ZWzdNIUwn+uuu87cdttt5m9/+1sgvttmm23Mpz/9abPLLrtYm3rmmWdM//79rdejF3bYYQdz8803y08rCkoHfJ8/f765//77Tffu3U3Pnj3N//3f/5l3vetducz87rvvDjZNcY2FDOv11183Dz74oFmyZEngfvHSSy8F1quMh2dxt912MxtssEFcEyorAAExrAJA9WnynnvuMYceeqhPVfPFL34xEIlldapFPo+ODJ+nWsJ0fOONNzYbbrhhYNzAw4eRw3rrrRd8rrvuuoZ/733ve82KFSsMjr5vvPGGeeWVV8zTTz8d+IgtXLhwpS9VbfsHHnig+e53vxswydpr/P3AAw8E83v55ZfjLq9SNnHiRDN06NBVyvRHNRF48cUXzdixY83cuXNXmeA3vvEN703eKjfW/IEVLoZLtt8djIjf/K233lpzZ/2ftDNs2DCzzz77GPkP1uOTZ4kYVp5oerb197//3QwcOND6sESbOfroo81pp53W0IMAw7jooouizTb1+/7772/OP/98a5/PPfec+exnP5uIB7q7O++806y11lrWtnSh/Ag89dRTht9MnGsHvwFOXI1SWr2xT38YCGE4xUlQVAwCYljF4OpsFQbkY5KeB7P6y1/+YnbffXfneJpxkdOYa/fpEs9Ex4fxBUYYomoiwMkKK70w+kncLBElN0J//OMfzSc/+clGmnDei7vGQQcd5Kyji9kQEMPKhlvmuzAy8Pkx58GsGORdd91lDjvssMzjzeNGxI6///3vE5vy2fXSFrEVFQUjEc5SVSCW5M9//nNz5plnxp6sopNplGEhYvcR9UX7TPsdf0qi1YjyRUAMK188na0tX748MESIiw0YvRGrpfHjxztPJNH6ru8zZ840p5xyiqtK4deIEXjHHXck9oOCG8ssgvy6CGyGDx/uqqJrJUEA/SrM4+yzz17FrNw1/EYYFif9AQMGuJrP7RriwTgLxNw66MCGxLCauOg/+clPzBlnnOHsESMFHt68LKF8Iko4B5TDxX79+hkYpw/5iAbRY2DokdUIxWccqlMsAjCqX/3qV2by5MnejCocUSMMC8OdqVOnhk0V/smpEd2WKB8ExLDywTGxFazr+vbt6zQsOPjgg81ZZ52V64v4hBNOCMIbJQ6wwApYKKbRGZx88snmZz/7mXNEGJFglSUqFwJkIZg3b54577zzzKOPPppp8FkZ1muvvRYwjzhjDttA2BztuOOOgSUtlrNYx/75z38ONky2e6LlhBdjvtpcRVHJ/l0MKzt2qe686qqrzJgxY6z3IKa47LLLcv9hY43oEkEigsPqDlEJ1llpHmbrZP53gajrmCHjk5WGeClwKnONJc2pLU3fqlsMAiGjIgo/Rg+NUFaGxSaIzZAPoUPGchVn9dVXX73ulhdeeCEwnHJZv4Y3cYo85JBDwj/12QACYlgNgOd7K461H//4x62WT+h4cIrF3ylPwokXR0sbEfGdvFrRB5J7sCzEURJF+Kuvvmr+9Kc/BXqlZ599NpiDzXeFCOvEYNt2222DE1UjkSlwRB49erRt6EE5jsrsYEXtjQBiXnwAs56oameXlWFh7OQTWeUrX/mK+eY3v1nbbezfGABhtWp7Jrhp0003NYR/0ikrFsJUhWJYqeDKVhmm8LWvfc16MyKDIoJt2sLOhAMhAO2nPvWp8E/vT3bLOA9jJIEz8bvf/W5DXMM8H0h0HDgJE4zURl/96le9d8y2NlReLAJpgzr7jCYLw8JcnmgxScTmkegaa6yxRlLVldefeOKJIC6mSyLA6Q5nZFFjCIhhNYaf193oWqKBNaM3Yfrqm1Ikep/P9ySDC0SAaR5Mnz7zrJPkL4Np+8MPP5wro8xz/J3eVpIYPA4f1pTNGxFQbJSFYeH3iP9jEvHMHHDAAUnV6q4nBXM+4ogjzHe+8526+1SQDgExrHR4pa6NGARdThwhziL0TFFMgwfP9uDvtddeZtq0aXHDaquypCgdaQ062mpyFR5MmtBjIQw4DJM2B99B1yYuC8NyPQuI7HBURqRNlPasFrqugADaXIWr3NinGFZj+CXejS8GL9U4+uEPf2gwYy+CkvRXxGlDsdzuRBgrrCtt4hYU41OmTGn3aXTc+FiTSZMmec0bhoHT+J577hnUT0pkmpZhYdUXtl07oDDfGkZHW2yxRUMO6UlRZfA3gymKsiMghpUdu8Q7MWXv3bt37MsWc1l2oUWdrm655RaD8thGvBQwuigDwdh/8IMfWIf6hz/8way55prW67rQfATIL0XE8yQ69thjg9MUGQRCypthuUTjWC36BqEOx+f6dJ2y+A375oBz9dHJ18SwClx9F9M48sgjg6RxRXWPldOMGTOszWP5l6eRhLWjHC489thjZt9997W2dMUVV3inKbE2ogu5IkCQYldkfTZyOPGSNqaW8mRYGO+QwsYWPYXfFlHZ8yJ01Tb/QKJepMkqnteYqtSOGFaBq/mFL3whsDiK6+JHP/qRVbcVVz9NGQ9pnz59rKa2iEeIulEWYj6IBW0BUav4IuBF+sgjjwRLhBUmTqukgSF/WRniKLJmiPlqo0pwwkBXRUoOm64oT4ZFLiv6iyPG4jq5x92TVIbo+sMf/nBsNZznbeqB2BtUWIeAGFYdJPkUYPLtSrFdpA/Rk08+6cyWSmzBY445Jp+JNqkVl4gJ8SpWWq5o8E0a5spu/vGPfxg2LDbr0JUVLV9sOjuqo/MhhUWvXr2C3xh+b/xrp/mH02L+GB7h9oAvog+zTUpsmkaH5frdkMyUJKN5Ewwrbv0QwcOMRdkREMPKjp3zTpS4riCbmGyzcy6Cknxf8AtDJFMmIu7ciBEjrEOePXt28AK3VmjyBUIPEROyWYQlKkYOeYq3mjX22n6uvPLKIGFpbXn4ty/DwmEf3yubUy8ZBHwYaNiv7ye5vB566KG66mys8sjlVddwBxWIYRW02K4grr7Ry7MOLcmjn1BNNnEMDzkJFWG4/AvDNXXt2jUQSWHlBLPbZJNNsg4v0328dD7ykY9Y7y3Sn83aqePCl770pcBE2lEl90s4pvKyL4tu0gbAhRdeGIgT465zuuTZ8iFXKh9OVpywiiBM8m0nqTLpjovAptE2xbAaRdByvyu6RZGybJcJL0OF2TA2olUQagkLO8Q2iBH59I3zxkOJFWIzd/SuuIjoRHySYlqWK/fiZuRcihv01VdfbXbddde4S6Upw3LvggsuiB0vZuhEQPchYvjZYv0VmWTRFbwZR/e8Q7D5YFGVOmJYBa0kymasoOIoTayyuPtdZSiRMQN3EQ+9T0w1VxtcQ5zCPLHCaga5TIbpv512r66XZZFYsR6DBg0qsovC2x43bpy5/PLLY/tJ43fHxtC2AfvNb35jPvCBD8T20Wiha7NSlBiy0TGX5X4xLMtKITZDNIYYKuojYqleV+zKnluk+AqxkM2arm6QORVce+21gVViTs1Zm0nSbcyZMyfWTNraYIEXkuI4FtU15uQ9evQoqvmmtEuE/1mzZsX2NXLkyCAaRuzFSKErdmDRuiT0ibZAv0gyEK+LsiFQeYa1bNkyg15mgw028Jbt/+IXv1jFio4wRoSMSRMZ3OUHVWSUiaJ2ja6fFyctxIxFBPCN9uvSSVCPTcLhhx8evaVl3zHrRqzFSatZxGYlKY9Ys8bSSD8u/R/PlcshPuzXtWHAzB2jmKKITa7N0MOlPy5qPFVqt5IMC10Mu3EcSkPCBBjfp6QXum0Xn1b8hQc/lmtxhMjDZfEWd49vWStOWIwNZs4Jp8iIE4RpcqUsIedQMxmEz5qgUySyfRZaunRpENwXizNOTnGm0mG7GCMgRnO5UoR12/2TQLF33HFH7DBZX5/cUq7QUKQ6+fznPx/bfqOFpOSxbWyLNrZqdOxluL8UDOuf//ynIT/SSy+9FKTD4KVl8zlxKWz5IRHc0mZOTs6aYcOGWdcNpsXLwyeckuuhI3/Oqaeeau2nkQsucUQj7frce9xxx5mTTjrJp2rmOq7dK1ETYJpVJQxkCJYM84IRIvrlJUhgV0RlVVHm28zCWVdf36lRo0YZ/LniqMjIKBhV7LfffnHdBj5fRVkmxnZYwcJSMCx2RORuComQL0TxjmNaLkbB/bxQebHWEkd1FLquXSz3+Cq1EU3ZcjmR5dcnzlrtGH3+TjJMSGoDaztyZPHJaRSR6sKFCwNxmy28TbRNNgRF7vKJaoG4J47YUKDUFpUbAZc16MyZM4Ns1EkzdP1O+P1stdVWSU1kun7GGWdYo8iQE+/rX/96pnZ1038RKAXDihNzkVsG5hQlsuTiTe8iTllEmYgSWXXZFcG0ksjXfNrFsOLGkNSv73XmgpWYD3OhTbzvwQwHS6wHbWbqRG7AWASG5CLaueaaa1xVGrrmeiHQcLvn+Gpo8h1ys+sUTWbuHXfcMREJly9iUcGSiW5DSDTbprcKLgeJwBdcoRQMK+4HHOf454rdF+JYyyxQjpO51qZvCu+Lfvrs0FwMi7aKNMHG4ZcxwoAx60V8tNZaawWGJ+wsMY7gFIRP1jrrrBOdmvM7xivsEG2ilvDmIk2GOWlz4rYRp0Fi7onKi4AttBEz+vWvfx3EU0yanStpKtEmsBTMm1y+l5z+lWy0ccRLy7BqPd55QSMGSKJahvXTn/7Uy0w22m7c6S56ne8uSyeu4wlflvQejDckdpFETnedRou0gkSH48rj1U6m7SFmjX6iu+Uli3sFv5mq6KpsuLgMo3yZjUsPVsRvhI0vKgWbOTtGHq6Nlg0Lla+KQGkZVtSXgp0/Oyqbk2B0ylGGRTRsfmRpyefHx0s7aqVY20cZA9CGc8AK05ZCgTpFmlcn9Y2VZ5JYOJxHu3xi/fjCCy8ERhQYUhCBBJFuNDRWdKyExwLjzTbbLDg1E8md33XRbgXRMRT1HYvKD33oQ9bmfR1veUY5jcUR+mP0yHlSUqxLTl9li9+ZJz55tVUJhnXVVVeZMWPGeGESMixeEogVszjZ+oSHSUo6WORL3QuIBiudddZZgZtAXDO1p9+4OlnLMAJxPfj4IYFtq4lNFP586PxWrFhhNtpoI7P++usH/9AzshNHVOuzyfKdCxai+CmRObes9MorrzglD76idFc8P94VuJ3kRUmnKzYYZBsWNY5AaRkWU2cHSuRmV1T0WohgWPPnzzeYlicZENTeG/2bl83qq68eLVrlO7s4Uhu4yHe36GqjVddI5+HKnorI0BZgt5Exo9C25RuiXV+lfCNjSLoXp1H0qXERu5PuzeN6ESeIPMbl0wYbSNeGwzdSe17hnXzGPG/evMCtwFa3SL8vW59VLS81w8JsHF2RTW4ct2gwLFJiExXBRfhLYNVj2xklhcBJSkJH3yRRJJliGYkTxDbbbGMd+gMPPBCIq6wVMl5I6pfNiGtcGbtNdRsBV1vtwOxjGJRqUk2qzEYHs3Yb+TIsl3EO6oS88qfhKMzJFlG1jdi4ILYVNY5AqRkW0SvSMCtfuIgGjdjPtXNK0pVgrZbkkX/ggQcmBqr1HXMr6rETtolUi3xIXUp5Uk8gkmwluXyAmjUuHOjZmJWNknSUvgwL6QmbWRvlpVMieg7icRv5xj603a/yVREoNcNadSr5/BX9IbsenqQU91h27bLLLomDwvCDbKxlo6STjq+uIcu8XQyrHdI34Kvn6weXZf4+95TVqIeTOZE74iiNYzgbWU4+Nopzi7HVtZUnicW5z9eq0daHyldFoJIMCwdXTkdpdQi1Ucddset8drA+O22fdlZdsvb466677jKHHXZY7GDSvFhiG3AUutaE2zBisIXecjSb6yVXpIZcO3I0Ft14Oaq13SXE/LYAxlHL4KSBv/nmm4lWk9OmTTMEts5CPkZbRWZlyDLmKtxTOYZFPDkU7xgEpGFYNusymxPjCSecYEaPHu38DSSJJbjZx+LQ2UnMRZIzksaAUxAmwnm/wLHUI9eQLSJ1kTH9iFJA6CgbYYgTF7LLVr+I8nPPPdeQILDZhIgc/6ODDz64EMfYZswHHSQGK3GU1vo0KUwZ7RH1Bqf6NLR8+fIgm4NNv01bMFcC+LJ5E+WHQOUYVugU6HIcrIWPQLRYDcaRLcQLxhIYTbgIc2YcPW2hWsJ7fb33w/quT5gIL6zQsZcHhjQXWXeStX3xsOK4i1LfRkUmqHQFKG4X82HEob54YwQEo2Fjse222wbxGzGBh+ny+2HTwSe4c2ogRBZR3AkIzcaEuptsskkQ/YGXZNmJ59f2LKaNdu7jZ3nMMccYxKe+xMkK3Ri6Uhc1cnpztdvp1yrFsAgbRIBJyJdhcVLADNhmom4zj4URPPbYY4m7eXba7LhdlCaLqqsdrtkibHDixJm5kZca/kMwI1vqh3BsRUbDJk4hopY4ykMvEdduljLCfXHSryUMIYi3yCkUa8a0u/va9qr2N2G/iLQeR6EPZdw1W5lP9gKydLtcNMK20Usi5g83g2F57ScZh4l5KcofgcowLF4AN91008rUHz4MC5EAYjtXqBvSmthEfz4x89gRY02XdMrKQ5fF6Yq4izaCycKcd999d1sVaznWgIhqXOa73AymnBiLyotF4r2zzz47dpxFpm2J7dBRyKkIxo0RCEyJE3n//v3FoByYcQk9MuL2OMoiasbi1ydCumuTxSmX9wCb16TnmDHCdIv6/cfh0klllWFYtUpmH4ZFJIKePXs61xszWl40cZRkKRje4xuvsFHRYJJ+JxwPDzCiEF/dFhZXhLqx6azCdvm87LLLzCc+8YloUa7fXQGOiS6Cq4CovAjAOJAExBGi0zRBqmkDMSqRUZIYDXUnTJgQSGaISAKx6YBRISWxuW8EFf/3HxtCxldU6pJoX536vRIMi+M8x/ooJTGsNN7nNjNlX10NP3x0GkmmzjxYs2bNMl26dIlOxfv7iy++GIibfG5AH4DohbmhA4kjDBhmzJhhLrroorjLdWXotmwvm7rKGQqSTOl9TrwZutUtTUSA6DBIAeKI54ONaVpK68jNKYnn47777vPapDEemBWnwyJzwaWddxXrV4JhxTmLunbi6IwwRPC1Jps4cWKQuLH2B5Bmx+dSJkfbbcQUFq97jETwZUlD6LV4QDfffPPAIx8fsieeeCKRwUb7YNOAqM6mC4zWzfrd5VvDHPB5EZUbAddzi2gda960hKEErgY+EoK0bVMfZoVuNUlak6Vt3bMqAqVgWK6ICrYMwjbdE7sn5Nr8yHzJFS4GnQ5pH5KIAJlY77FrS6JG4uG5nJ2T+s16ndMju+KsJ0PffrHKtCmz407Zvu2qXvsg4DKS6NevnyHjcBbiuWMzlzehs7300kvFrPIG1tJeKRgWSliO27WU5EcBMyMeYEicItihkZYhLSHuIhdTLaXxZOfUgiVbEmENhX4tqwUZc2buzSDENIgNfZh2o+PB+srm+0J4HJsjc6P96v7mIeA6YTWazZpwariw5EUY0qA3DXVeebWrduwIlIJhoZs48sgj62YxderUIB183YX/FaA7uueee4LsvrxQEQVmZQLohxAr1Cpv0zAshoXYDEu3JOLBPfPMM5OqWa/D4NEn1Y7XekOGC/ivDB8+PDOmabp0Gb/QDg6gMHpRuRFw6bCGDBliJk2a1NAEkxKA+jaO6J4UJUVkJPAdQyfWKwXDQtl++umnBzv5cJEwquDF76uHCu9r5BPdEPmGQtNu0ppgAZiGyNhL8rgkAwzabNTiDmZFrjCYZJ6M64gjjghM/XFabRYhCrQ5amfxz2nWuNVPOgRcTtdkUPCRUCT1iKsBfnI+z2BtW4MHDzYnn3yy6dGjR+0l/d0EBErBsEIc8O5Hgdq1a9fAAbaZzCocA4YNnLbQ13Tv3j0sTvXpipcWbQg9G75ljabLgEmiFEYcmjW6PWPZd999DWI59IDNpL/+9a+mT58+1i55+fASEVUDAfTPhFWKbrLy1lHSNmJk2yYoiiS/fTbIQ4cONTvttFP0kr43GYFSMawmY1Nod64TQ7RjrN8wdc9L3IWpOoGBsazkxOiynMIKEp8qRKE77rhj4UYV0XmH3zldI3pxJdvE94WxiqqDAOL8JUuWBL6C6IjSGEmlQYHngTBKxDAMT1zougmTxUYRgy9cP+QInAbV4uqKYRWHrbNldnj77LPPyofEVZmHFUukvn37uqplugbDIsQUJ1dOjfhkYZSy4YYbtlw+j9MnTs5xIY7CyXLaw2VAJAQaRYDNEVS0tWuj4+zk+8WwWrj6aU1tfSJztHA6uXYNIyUuYpIbAD5yiGpEQkAIVB8BMawWrzEROjCN9SHEgojGfEMq+bTZjnVIjIcbgUtcybgRl5KXS+KadlxFjUkI5I+AGFb+mKZqETHEsGHDEtMVhI2WNZNsOH7XJ87VWEYS082HfKNs+7SlOkJACLQ/AmJYbbBGNh+vuKGhz3rwwQcrd8rCAhRn51tuuSVu2nVlgwYNig2XVVdRBUJACFQGATGsNllKLPYOOOAAr9EgBssSrcOr8RZUgmETpSIpz1A4NLI0T58+vSkOy2Gf+hQCQqD1CIhhtX4NVo4A82xMuJOoSlEdEIkefvjhZuHChUnTDq6TVZhYkK4cZl4NqZIQEAKlQ0AMq82WDKMKYidGnSZrh0jGX3xFqkC2sFtxcyN2JOklbOlQ4u5RmRAQAtVBQAyrDdcSHRUhYGxE+Jqq+IosWLAgMDqxzTUsJ/ApsSOxDBQJASHQmQiIYbXputtOHojEbBHL23QqzmG9+uqriakZ8LPCcnCNNdZwtqWLQkAIVBsBMaw2Xl8YE7H7olTFNPDo7eJSn++8885mzJgxZvfdd49CoO9CQAh0KAJiWG2+8ESGxyKO2GqEciJZYtVo+fLl5txzzw3CVDFPMreSZwtG1YoAx1XDV/MRAlVBQAyrKiupeQgBISAEKo6AGFbFF1jTEwJCQAhUBQExrKqspOYhBISAEKg4AmJYFV9gTU8ICAEhUBUExLCqspKahxAQAkKg4giIYVV8gTU9ISAEhEBVEBDDqspKah5CQAgIgYojIIZV8QXW9ISAEBACVUFADKsqK6l5CAEhIAQqjoAYVsUXWNMTAkJACFQFATGsqqyk5iEEhIAQqDgCYlgVX2BNTwgIASFQFQTEsKqykpqHEBACQqDiCIhhVXyBNT0hIASEQFUQEMOqykpqHkJACAiBiiMghlXxBdb0hIAQEAJVQUAMqyorqXkIASEgBCqOgBhWxRdY0xMCQkAIVAUBMayqrKTmIQSEgBCoOAJiWBVfYE1PCAgBIVAVBMSwqrKSmocQEAJCoOIIiGFVfIE1PSEgBIRAVRAQw6rKSmoeQkAICIGKIyCGVfEF1vSEgBAQAlVBQAyrKiupeQgBISAEKo6AGFbFF1jTEwJCQAhUBQExrKqspOYhBISAEKg4AmJYFV9gTU8ICAEhUBUExLCqspKahxAQAkKg4giIYVV8gTU9ISAEhEBVEBDDqspKah5CQAgIgYojIIZV8QXW9ISAEBACVUFADKsqK6l5CAEhIAQqjoAYVsUXWNMTAkJACFQFATGsqqyk5iEEhIAQqDgCYlgVX2BNTwgIASFQFQTEsKqykpqHEBACQqDiCIhhVXyBNT0hIASEQFUQEMOqykpqHkJACAiBiiMghlXxBdb0hIAQEAJVQUAMqyorqXkIASEgBCqOgBhWxRdY0xMCQkAIVAUBMayqrKTmIQSEgBCoOAJiWBVfYE1PCAgBIVAVBMSwqrKSmocQEAJCoOIIiGFVfIE1PSEgBIRAVRAQw6rKSmoeQkAICIGKIyCGVfEF1vSEgBAQAlVBQAyrKiupeQgBISAEKo6AGFbFF1jTEwJCQAhUBQExrKqspOYhBISAEKg4AmJYFV9gTU8ICAEhUBUExLCqspKahxAQAkKg4giIYVV8gTU9ISAEhEBVEBDDqspKah5CQAgIgYojIIZV8QXW9ISAEBACVUFADKsqK6l5CAEhIAQqjoAYVsUXWNMTAkJACFQFATGsqqyk5iEEhIAQqDgCYlgVX2BNTwgIASFQFQTEsKqykpqHEBACQqDiCIhhVXyBNT0hIASEQFUQEMOqykpqHkJACAiBiiMghlXxBdb0hIAQEAJVQaB0DOvtt982M2bMMOeff75Zvny5WXfddc3WW29tBg8ebPbff/9U65JnW6+//rqZOHGiufnmm4MxdO/e3fTs2dMMHTrU9OnTJ9W4VFkICAEhIATqESgVw3r55ZfNiSeeaObPn18/k3dKuHb88cfHXqstzLOtRx99NOh30aJFtd0Ef8+cOdP069cv9poKhYAQEAJCwA+B0jAsmAGnleeff945s4ceesist956zjp5tjVv3jwzcuRIZ3+9e/c2119/vbOOLgoBISAEhIAbgdIwrD322MMsXrzYPZt3ro4bN86MGDHCWS+vtp577jnTt29fZ1/hxdmzZ5tevXqFf+pTCAgBISAEUiJQCoa1bNkywynFh/baay8zbdo0a9U825o7d645+uijrX1FL4wdO9a7bvQ+fRcCQkAICIH/IlAKhvXkk0+avffe22vNMMC47bbbrHXzbGv69OnmtNNOs/YVvTB8+HAzfvz4aJG+CwEhIASEQAoESsGw7r77bjNkyBCvaSUxrDzbwlJx8uTJXuM66qijzIQJE7zqqpIQEAJCQAjUI1AKhnX77bcbXvg+dMghhziZSJ5tTZkyxUyaNMlnWOacc84xBx10kFddVRICQkAICIF6BErBsLDqGzhwYP3oY0rmzJljdthhh5gr/y3Ks63rrrvOjB492tpXeGHTTTc1MMquXbuGRfoUAkJACAiBlAiUgmG99dZbZs8990y0Epw6daoZNGiQE4I823r66afNgAEDnP2tvfbagTMxokqREBACQkAIZEegFAyL6S1ZsiSwsnv88cfrZgszwKChf//+ddfiCvJsa8GCBWbUqFEGR+RagsliZi9mVYuM/hYCQkAIpEegNAwrnNq9995r7r//fvPss8+abt26BebunHK6dOkSVvH+zKutFStWBNE3nnjiCbN06VKz+eabm912283bFN97wKooBISAEOhgBErHsDp4rTR1ISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlgdvfyavBAQAkKgPAiIYZVnrTRSISAEhEBHIyCG1dHLr8kLASEgBMqDgBhWedZKIxUCQkAIdDQCYlhtvvxvv/22uffee80jjzxiFi9ebJYtW2a+/OUvm169erX5yNt3eGB61113maeeesrsv//+Zr311mvfwWpkQkAIrERADGslFO315T//+Y+ZNWuWufDCC82iRYvqBnf99deb3r1715WrwI3A448/bsaOHWvuu+++oOLaa69tvvrVr5ovfOEL5j3veY/7Zl0VAkKgpQiIYbUU/vjOly5dak488URz++23x1d4p/RjH/uYueaaa6zXdaEegZdeesnsueee5l//+lfdxQ9+8IPmkksuMVtuuWXdNRUIASHQHgiIYbXHOqwcxW9/+1szcuTI2JfqykrvfNl5553NjTfeGC3S9wQEfve735nPfe5z1lqcti6++GLz8Y9/3FpHF4SAEGgdAmJYrcO+rmcY0PHHH19XHlewxx57mOnTp8ddUpkFAfRWhx12mOXq/y9mHdgQiISAEGgvBMSw2mQ9Lr30UjNhwgTv0Zx++unBScz7hjasiAHJz3/+c7PWWmsFjKRLly6FjvLhhx82++23X2IfnLTmzJlj3v/+9yfWVQUh0A4IYEiEdIZ/gwcPNtttt107DCv3MYhh5Q5p+ga/973vBcYVvne+733vC/Rb3bp1872lreq9/vrrhjlffvnlK8d18MEHm7PPPnvl30V8Wb58uUFX5UOcsK699lpTNBP1GYvqCAEXAgsWLDDf/va3DQZFIf3yl78022+/ffhnZT7FsFq8lBdccIH5/ve/7z0KdDCjRo0yW221lfc97VTxjTfeMF/60pfMr3/961WGBSO59dZbVykr4g+wBnMfGjdunBkxYoRPVdURAi1BAMOso446qq7vH//4x2bvvfeuKy97gRhWC1fwhhtuCJiPzxCwCjz33HPNZptt5lM9cx0YCjqcZ555xmy99dZmp5128j6VJHX673//2xx99NFm/vz5dVWjDOvpp582v//9781zzz1nnn/+ecPJaMMNNzSbbLJJIOrg9LP66qvXteFTQFv4XkV3o6777rzzTtOjRw9XFV2rEAKc/nEjefHFFw1WpVjsYlX62muvmbfeesu8973vDf596EMfMn379jXrrLNOy2Z/9913myFDhsT2HzIs5vPggw+aJUuWBM8Sc0Lk3b179+Bdsttuu5kNNtggto12LBTDatGq3HPPPebQQw/16v2LX/yiOfXUUwsXT/HDPuGEE+r8vn72s58ZftiN0mWXXWbGjx9vbYad4m233RY4SFsrvXMBkSgnTZjfxhtv7Koae42HllPeAw88EHs9WghzO//886NF+l4xBN58802DWO3mm28OxMBpptevXz/zmc98xhxyyCFmzTXXTHNrQ3Vhohhevfzyy7Ht8LzCXH2kFrQzbNgws88++5jVVlsttr12KRTDasFK/P3vfzcDBw60/tiiQ+KlfNpppxX+Q7rpppvMcccdF+165XcexsmTJ6/8O8sXdqycEvOmMWPGGBh62pcFLymwrRVNxo3vF7/4henZs2fcJZWVGAFO8ldccUXgf9foNDbddNNgY7PLLrs02pTX/Wn13j6NfvSjHw0Mv9r5ty6G5bOSOdeBAfmYpDeLWblEC0x9r732MtOmTWsIhdGjR5vrrruuoTZsN/Og4T/FySsN/fWvfw3k/LZdatgWzsY/+clPwj/1WXIE/vGPf5izzjrL6xlMO1WiqPDcFkl//OMfzSc/+cnCujjnnHPMQQcdVFj7jTQshtUIehnuJSSQz4+hGczq1VdfNVOmTDE/+tGPnDNplGERB/Gzn/2ss49GL7LDnTFjRqB3S9MW+jTCMiWRQmElIVSO6/jiEYoraZPSyGxOOeUUc8wxxzTShPNeJAo+oj5nIwkX8Qcl2k67kRhWE1cEhf+nP/3pOh1R7RDQ5aDrKUqejCL26quvDnaZcWGKasfTKMPih9+MMFL4TaGHWHfddWun4PwbvR0m7C7C4gpFtqi8CPz0pz81+C82gxA19u/fP/euEGMOGDAg93bjGsQvNM4CMa5us8rEsJqF9Dv9IFY644wznD0eeOCBgT/Su971Lme9LBdhVDCOH/7wh6l2mI0wLHayH/nIR7IMN9M9jBUn7DRWhDgwE44piXn/6le/Mttuu22mcemm1iGAUy2MCibSLOLEz+k974DKEydONFOnTm3WNALHfkTu7UJiWE1aCZT8mMG6RBE4zyJbz9tZFUaFpd95553n7N8GRSMMi2jzKIjTEFHoN998c7PRRhsFjOeFF14w6Nlc2EXbx6oP6740hH4NPZuLEB2eeeaZriq61oYI8HtIazTEaZ0X9Qc+8IHApQLGw4aGTSc6JB/61re+ZT7/+c/7VPWqg2k9Y0raWEUbQ6+74447Bta0pNF55ZVXzJ///GezcOHCaDXrd1xb5s2bl/s7ydphwgUxrASA8rp81VVXGSzabMQxH7PvPJkVjAq9DhEk0vzIa8eYlWHhd4VloA+jwQ8L3QIGDuuvv37tEAxtEXYGMUXSC4OXDbvbNdZYo64dWwG78COPPNLccccdtipBOf5h+LGIyoEAhjV9+vTxHiybFsJ32U7SpP3BWZdn6tFHH3W2S8662bNnO+ukucim8+STT/a6BR04emP8KOOkDWwCMfzycdmA2WMp3A4khtWEVeBli8gJJ9g4yqp7iWuLMhwcidHHycaHWdjaCcuzMiycbocOHRo2Y/2EAcydO9fLQRcmDNOCEbsoy0PGzhOG6aJJkyZZnTVd9+laaxBAN4mOMokQ4V100UXeQY+RmGBckaT7zFOMjLFWmMfNNZ+vfOUr5pvf/KarysprJIclIazrPQE2+KnluZleOYCUX8SwUgKWpToWZl/72test3LkzitYJYYdPKAYH+RFWRkW+jofc/C0zIXTEI7ULqaVNV8YYlNXTEOldcnrV9WcdtDZ+li7cWpKG+6M3yFm7C4XFfTF6KUbJV8/xizShSeeeCJwxHdJYfIKHtAoDmJYjSLocT8e5LZQQDxMvilFkrriAfr617+euOurbQc5NScX2wkwC8PiVIk4wvUQMI6sMnLax3SYk5mN0HuxO0xDhKb61Kc+5Yy2UdXAomlwKkvdpBxozAMjhkGDBmWa0rPPPmuIdmGjb3zjG97h12xtUA5TxH8ziQjfdsABByRVq7uehNMRRxxhvvOd79Td1+wCMayCEUfOTeiWOOJlzQs3ja4lrp2wLG0gXe7jgUKEgBm9baeYhWH95je/CXRC4diinyiCQxEEPmA2fKL3xH3/y1/+Ynbfffe4S0FZVqU3YhxX0Fv0A+ysRe2PAL8zl5VqHnomnp9bbrklFoy8shDAhGyhxNiUsdlED8xmKquFsSugAWJ70vO0WiwohhX7M8uvEH0LZtZxlJe4IGybXVCS0UBYl1hjmMjCNCHXjzULw3L5Nl155ZWB+AUFNoFlG/E3c427kTiAhx9+eGDkEeIV/eThxRk664sh2pa+F48AAWI5bccR+hvEy42QS4zMc4Y4rRFy6VaxGuR5wj9riy22aMggKGkDiLMyTLGVJIZVIPooZjHRjhOLccogAG5epyum4RI9htPkZctpip1flFG4XvxpGRbz/fCHPxx2ucon80bRm9fL3vWQ0df999+/Sv++f2CRCNOykSJf2JBpv3LE8RgsxD2HeazjD37wg8C3MW7meWQGR8xHuKQ4Il2ObxDtuPtry1zvAeZJ0OlWkhhWgegjJkBcEEeYUJN0LU9yPTj0g2kq1kO8yGvJ9UNNy7BcaVPQsbkMUGrH5fP3scceazUfhjmSSiEtJeng2jV0Tdp5dkp9UtVgOctvk+8kPyUSBX6P0Y1bFjxcOdaIbEOcy6yEXppxLl68OLaJxx57LIjKHnsxQyHMnY1vHBH1AolRK0kMq0D0cTSNy/1El43obmxDJjYg+hVOByHBnGBUiMdcUZjzZFiueWexxgrnYvt0vTCIQo/xRxbC58UmzkGUSiqUKhG+OYiUcYvAdwdHU34/W265Zewmp0pzb2Qurth+nEjYSGYlUv6Q8j6OGm07rk2XdISAuzb1RlxbRZSJYRWB6jttYnXnSlHNyy7UH+U5BE4G+GqwI0OmTboDH0UpJy+bmXiaE5ZLyR1N0pjnnGfOnBn4xMS1iVl9km9V3H2UIbPnZWSjIpivrS/fcoxBknyDbG3FiczCuoiSMeln07PDDjsEugySGKZN6xK2V5VPUgW5NkREusD4JyshhbHFsOTkxgkub0KcH/dbaAeXDjGsvFf7f+0lBakkWsO73/3ugnpP3yxRJmy+W2kCv6IAtimxixAHMlMYhy1IZ1ofryhyhLHhIbXRuHHjnNaEtvuKKk9KE5N3v2xAeGkWsfHKe6xFtecSf9MnkTN8HJfjxsfm0xUppqioK0hjHnroobohNaITrmssY4EYVkbgkm5zvTxw7vO15kvqJ6/r7ARtyQwxPvCNB+iyVCwqEaIrP9BJJ51kTUzpgx1iEFsoqDwswHzG4FunmdHIwzFx8kJakCXzc9hGmT/JXI0puY1mzZoVSDls113lrlREjerGXP2in73xxhtjq/zpT3/yktjE3pxDoRhWDiDGNeGKbtEOsuDaMdt2VdTz3SX+85//NPi1xBG7M5wTG1Vwx7XtshRs1GyZU9Tll18e121Q1k4n5UsuuaQh8ZN1kgkX8nR+T+iqrS67xN8MFGbeiO8S0gFbrL8ikyy6dLfMJ236njwXTQwrTzQjbeE9j59THKWJ9RV3fxFlAwcOtObpwsOdk1MSsdNkxxlHGH7wABZBrgSRiDp9A4bGjY2dpisSCdddYsO4NosqczlrF9Un7Q4fPjxwlSiyj3ZsOyn0GP5fxJ7MSq7TPWtNJPkiyGVEUpQY0nceYlgWpBYtWhSYv+IlnyWnDSI0UmvEUTvuSJlnGH2idsy+yl1yDiGWiqO8/UWifbh0WL6nw2h70e9/+MMfglBN0bLo96zRNKJt5PWd1O+kkmg25e0A3+zxZ+mPqBNJIZDQ5xL0Ogu5YgcWrUsi8owtEv2TTz5punbtmmVKudxTeYZFcj6UlxtssIG37BVdSzTFNVZyvIzTKJddVndYcmF+3k7k2q35Ole6fuiY92+zzTaFTNkV4JTTFaesrLRixQprqgnaxOSYSAftQi7mXdQY8/YFKmqcebW7dOnSIAWJLfYm/TT6u3CtY6NtJ+Hg2ryykc/L6T9pHHHXK8mwcH5jdxPNMIpuBd8n14sZgGxWbsijEfP5pr12ObO2m3UZfjcuZowxBr44LnK92MGOHVtcXh5Xm77XWFccQOOI+IrkN2qEXBFEit7tZhn3Sy+9ZD0tJ7WHLx9iH/x/QvcI1z1F+BO6+mv1NU4+hx12mFV8zvj4TRCPMi6vm+/4p0yZYhUnFnmqd70L2sFYrBQMC2U+GWF5EImkjd+DTXnvciLlpYyexWZOTs6XYcOGWX9TvHgx9/QJp+SylmvUEMA6wIwXXM6CNEl4Ix5CFz311FPmE5/4RGyVov03XDL3OXPmBH5DsQPzLESEyynORrzcG3k52dpth3Je0JyO+ccas8Pmt4CVGgGCWx1brpkYEZkdZmWLOhGOBeMX3lON0KhRo4KoHHFtsBH33TjH3e8qw6jCtsEr0jLRNabotVIwLHYU/AhCIingd7/73Vim5WIU3G8zc+ZBJENnnMNc2C+fvqkIXMFT0/g1Rfsu6ntSVlb0OEkOoi6ZfhEe+SEWRGfYddddwz/rPjltZ9FBRhsisR+/Nxu1Q1BQ29hUng8C/I7YzNr0vGEvxPVj09wo4VeIWDCOKE+buyuunbgylyEJIdXwpWwllYJh4e9SKy+Os1xbsmRJopKTU1ZtSB3EIOwqYFpJ5BvM0sWw4saQ1G+R19kxMi8bPfPMM7ZLK8tdp1N2i6QxKYIwbLH5iOXlJ1Wr06ydR7skt6sdl/7OBwGXPinaA783Tj82CU60btJ3V3Zhnw1kUvtx14nO06dPH+um/eqrr3ZuDuPazLusFAwrTgkYdzx1xbALgatlFgSXRCk/e/bssErip88Ox8Ww6KDVDnjRSZJxFDzjyFdHQ9bkkSNHxjURlGO0kjeRnmTAgAFWEQ1hbQgy3Ci5xCS03Wl6nEbxLNP9l112mZfJPqJRguvm5aPk0pv6iOizYOzyHW3UpyzLeOLuKS3DImlZNMeN7y6olmFliQ4Qd7qrBTfJA76d/Hdc4jzf+H8u/OM2F7V4ZfkbvQqbFBuRvmWTTTaxXfYux9KUNDE2KlIJbuuz6HJE4zh6s+vGVH6zzTYrusu2ah8jIsRjtvia0cFijACzypIVINpO9LvLkT8PvWy0L76zcUclYjNnbzQmYm1/Wf8uLcOK7vwxW2dHYguhEwUnyrBcDqfRe2q/+yweputRK8XaNk455ZRVTOdrrzfz7zvvvNOgF4wjX4MJV4ptX6YX17+tLOkBI+AtgW/zIltAUNpvB9l+2nnCiDCoQNROug0MCogYgviX56hWV8PzhqKf5wcDEyK5s2lE+lGU9WfaOeVV/29/+1uQFii6Iba1DbPCICePjVG0D1eoNILhogfPk5KybPu6tuQ5pri2KsGwrrrqKjNmzJi4+dWVhQyLKMvs/Gt1Y3U3xBSQ5ZMdlYtwpnSlFchLv+Iag+81V1Ry33HyksMz30Z5i0BdY2YMvs7OtvHWlruiDqCfQ0/XDsSpktM7xij4HsJo+ITB4y/FOpDBNsm4yHcuuIvgpkGQ1ioQ+BC5I8kSkLkStZ5N6UYbbZT71F3x/HjX4TaTFyVt/orYcGYde2kZFhMmIjo7QvQYvgTDQpSEabkraGVSezz0rp0lu6CkBI2tDnMSztEVfghsbdErwvv5xJkSRm4jdFzbbbed7XKqcnRX++67r8FyK47SuB/E3R9X5hLRIDpypSGJay/vMl46WMC6zO/z7jPaHi9Q301j9L52+k64I8Km+TBzGDS5oUgEWQS5YlgiusNPKy9y6Z/po51E3qVmWCQqRFdkk7vGLSgMC9NTm2VZeA87dKxi2MnHEWK0Hj16xF0KylxMILypkVxNYRt5fDJPXnZx5BuoFybiMrXFsRcfljzIFbOQ9huNHxg3RuLC2UREec4trm+fMk5WeaZK9+mztg6p3JPCFdXe0y5/p9FlI27GvWWttdYqbPiuQMacmvMKJI2jMBFqbJs/JojvKSLgdqBSMyzEEWmYlS/giPs4Lbh2HklxwhYuXBhk+nX1eeCBBxpEh60m18OaxmDCJTbLS6eEbgWnzFodSxTDvIwtom26LFCJqM0JrJXEBgtjoFZSkf52Rc0L/feZZ57p1DdH++blzjObh+l6tN3a70mbsrx0Sq4oMYwJy98iLHxr5+v7d6kZlu8k09SL/hDYdWDMEUdJpsxE5SDbbxJh+FGUWCGp7/C6K7I8L2JbioPw/vCT+Ihz584N/6z7xHXAln6krnJMAac4RG+IdG103HHHWU+Ltnt8yl3RNKZNm2aIN9lKcgUebta4fMXHzRpPUj/osREBIqnxITIOIJnxyeDt056rDhtxmKON0mwkbW24DKXCe4oyoQ/bT/tZSYZFKB1OR3FZM10AkVocx7mQXOmvfaKPu7zVwz582gnrFvXpMhBJs2t2MT7GjsUhIbayBs90iUloH93VXXfdlZsvDG2G5PKLweinb9++YdWWfHJKgHG2ktpJ1+G8ZeXtAAAK5klEQVTC4bXXXgsSqKJj9jGuoC0MMdArufTWrj7TXnvzzTcTdb6NbJR8jM7aMatE5RgWljukeudFm4Zh2aIV2MyZSXtN6goXJR3rudfH4tDVRx7XXKlQ0uT0wTw66cVNiCNXvEbbfFwZnMN7ijR+sP0O6Pumm24K4luG42jFpw8+RYwL03aMAPidYE3WzkRqDIyh0hqmNBrxPysmp512mpk+fbr1drAnak9aXdry5csDlxqbfp4O0ZORFZ1NYDtR5RhW6FTnsuqqXYBTTz01sBqsLedvW4gUH50MzoecKpKsjnyioceNLa8y1+6c2IxpdCMu0Rnj5QFIm1Ldx4AFx17SkRchrsFnafvtt7fC3Q5iE5TniKBdur1wAryMwIvNHSlfMJbh5UdQZ36z6HX45B8vN4JP0y7+Sez8MZPH7wijoy222CJstu0+GTsvZXTSnLyTnsO4CbAJIlpK0TqruL59/ERJg4RPpy9xssJQzWZAFLbTyOktbKOIz0oxLAIz4sQJ+TIsDAXYddmO+jbzUl68+LXYosaHi0Uqa6ynXJS3maqrr7hrrtxdiELGjx8fd1tsWVL0CW7iRckud5111oltI1roSrMQ1mMtOM0W9fLEfcLlOuETazEca5GfttQ46NfIKA3TxbWgXSy+isSC3wNSkCxMyjYuGD2hl2DY/AsdpzFxLyoSiCvHXDhO/D2RKCUR4k/UFEkxU9l0wqjbkSrDsHgJIpoJU3/4MCx+cPywXfG/0LnYRH8+aarJAovzbdKD00pdFibtmLbHEYYURO3wJXbnZFlNcsju169fEInCtnPloTr77LMD8W5S33nkvHL14RK38bvjVN8uxO+ZqAWcLvjdkfKlCMfWdplv3DhcVq9x9fMoI+IFmxpE4px08wrTxOnQJ0K6K+UIzyTvMTbfSe8hfs833HBDYnaGPDDL0kZlGFbUug8gfBgWUbh79uzpxI3dsy33TJKlYNiw7wPUKtGgK9lkWpEDc8a/zGeHxs4U6zbWIDTEINIADAgDGB9qhtmtywk8jY7PZz6q0xgCPpZvjfXgdzciVyQnbBhc/olJrSGCpa0kRkM7EyZMCN57YW42RLowKqQ8SRtI7kdSgSVvI+OlnSKpEgwrzpItiWGlsWgi9UacNREmsYjTkogfDmKZuDai9xaph4n2U/vdZQGXJTUI6VqSNgLRMfCgkNMK660k2Xr0PvRrWHoliWWj92T57tLL8TJAzylqDwRcIY1aNUJOX6QvYnOTlPE8boy4lUyePDnuUmwZpyT6JGO0j06TRngG2SS6dLWxnTW5sBIMi5cc4r0ouRw92fmwi/d90U2cODHwbI+2z3d8inzTkiA2IhxUEjXblDQpQkXWOHlJJuhJOCRdxydm0qRJVt1j0v2+1xGnkOHatsNt1anYd/ydVs+1+WoHLNi4YsSBTtH3/YOhBPV9mU/aecKs0Cmn2WSm7SOv+qVgWMjibUdaWwZhm+6J3QdyYRbJl9Cn8IOJI5yLfTLaEuvt4IMPDnY9ce1EyzDLJ6VDMwgdm6svG75JY4MRsmkg7UjehBiQk20RFoG1Y8WwhriFcYQSHgtBUfsgkJTWp11GOnjw4CBihs3Yq3acnJaKOMmz0ScmYhmYFZiUgmFh7ROn0wBslx9CrTEBx2T8rbJY9NiiOKQxaXYlSoz+QIl3iH4trX9FtA3f70knrKjlpW+bYT12hPhcueKUhXV9P8877zzDw94s4mFGNxBHjIPxiNoHgSw6LETLvBswvsJoi5MPJ2v0R2+88UYQ2JnINWxc0/h2JqGSNpaozQo0qR/bdVxzCBoQ6rxs9dqpvBQMC2u8uMyxRFYYNGiQFU90R8SVQ5HPKQhRYFYmQO4gTlm1oqE0DIuBYvnm85LjdIJ/VDPIJUbhB03Mw6yEPgujDkRnjRD+bASZ5YTcLOKlxZrbdI+ttOxsFgZl7AddkY8ulI3hhRdemEpvg78bhlicePDt4nedVVTna44eXQNCn7F5bpRQPfBchsZOjbbXrPtLwbB4cWBNFs3+iVEFL35fOXAegJKZF1FUeGLIEjsNJ1SSr9legtFxkp4bK6OiyRWtnYcyy4k0OmbWj2gaWNulJUS3MG7EIc1+uDjlYiVpIwIcb7zxxrbLKm8RAmwuMeG+5ZZbrCNArIx+1kecb23knQuI+tkQ81tgY+3qs7YdmKVN3FxbN/r3ww8/HGQk8HmHRO/jO1IBIne4Mk3U3tNOf5eCYYWA4XGPArJr165B6JBmMqtwDOyweCDQn2T1tSDY5uGHHx42af3kZY1vGdEIiiQYCtZutTl2MDaxZSLOMh52vfi02fSR0TYJWUU6Eh7oNPrGaBuNfE9KauebibmRMejexhBgs4UqgeSiiPY4UaEPR2Lg47SepXfeUfjBYWTlYl6NRn1H0oPEwSerNs8PG3yeZQyIykylYlhlBrp27Pgp+fzYUOwTcoiHrWjiISD8DqJUdmChE3ae/dI2JxdOy4SeoU8eKER9RGEgHh2y9WbM1zUvl+8V97GDHzFihKsJXetwBJCmECWFDRq/c5hkGCWD0FZ5GA3RPmGUiDATnrjQx2277bbBRhcGjVvOmmuuWYnVEMNq0TLyA0Z3FP7IXMPghY7yPymwrKuNdrzGKYaIDO32MPmY5KPc33DDDdsRVo2pQxFAUgLlwQjbFUIxrBauTFpTVZ/IHC2cTum75oHHIjDp5It4xTdHWOlB0QSEQBshIIbV4sXAUghLPB9CTEasOFv8PZ82VCceAfSSWE2xiUiiMCNAUj1dFwJCIF8ExLDyxTN1a+zq8VXyMcOlcVIJuCzXUg9AN5g777wzMBVGTJtEhx56qMGcXSQEhEDzERDDaj7mdT3afLzqKr5TgD7rwQcf1CkrDpyUZThNYxnpG6cNR3X8YLp165ayJ1UXAkIgDwTEsPJAMYc28PE64IADvFrKwzfKq6MKV+JkiwgQEasPsVFo90jWPvNQHSFQZgTEsNpo9Xgh8hJNIsJRtdrsO2mM7X4dsR4BkH2JtSHYsUgICIHWISCG1TrsY3tmx5+UKfWOO+4IYp/FNqDCRATSpj+ZOXOmIeGkSAgIgdYiIIbVWvxje0dH5QrwSiiYKvtaxIKSYyF5t3xiEuK0jYm7K5p9jsNSU0JACCQgIIaVAFCrLtsC/hIJ4tZbb23VsCrTL/pC9IY2IpnmxRdfnDn8lq1dlQsBIZAdATGs7NgVfieMiWy3UWo0enq0rU7+fsUVV5ixY8fWQcCpCtcBsljrFFsHjwqEQEsREMNqKfzJnRMZfvr06UF8P0I5kbFUlA8CRKknPQTxDTFiIYkdkfTbLVRUPrNVK0Kg/AiIYZV/DTUDISAEhEBHICCG1RHLrEkKASEgBMqPgBhW+ddQMxACQkAIdAQCYlgdscyapBAQAkKg/AiIYZV/DTUDISAEhEBHICCG1RHLrEkKASEgBMqPgBhW+ddQMxACQkAIdAQCYlgdscyapBAQAkKg/AiIYZV/DTUDISAEhEBHICCG1RHLrEkKASEgBMqPgBhW+ddQMxACQkAIdAQCYlgdscyapBAQAkKg/AiIYZV/DTUDISAEhEBHICCG1RHLrEkKASEgBMqPgBhW+ddQMxACQkAIdAQCYlgdscyapBAQAkKg/AiIYZV/DTUDISAEhEBHICCG1RHLrEkKASEgBMqPgBhW+ddQMxACQkAIdAQCYlgdscyapBAQAkKg/Aj8Pz+7dasuRH5JAAAAAElFTkSuQmCC" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Strange we don't know the function inside, other wise we know that for an input of 1 and 3 the output will also be (like other example):\n", + "\n", + "![example_m-function.png](attachment:example_m-function.png)" + ] + }, + { + "attachments": { + "m_machine_learning.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can say that we have a dataset of input and output (all the example above) but we don't now the formula to transform `x`into `y` and that's were *machine learning come* !\n", + "\n", + "So machine learning is the process to find an algorithm (here a simple function) that fit with all the data we give it in parameter\n", + "\n", + "![m_machine_learning.png](attachment:m_machine_learning.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Many technics are possible in machine learning like the famous neural networks, the random forest, the k-nearest neighboors, and the most simple technique : the linear regression !" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## 2.0 The Linear Regression" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: Understanding the Formula\n", + "\n", + "\n", + "the formula for linear regression is just the representation of a simple polynomial function like this : \n", + "\n", + "$$ f(x) = a \\cdot x + b $$\n", + "\n", + "But today, we’re not going to stick to the pure math version. Instead, we’ll look at its “machine learning transcription” (same idea but with different names) since we’re going to talk about neurons and all that jazz. Yeah, I know we’re not technically in neural networks yet, but hey, this formula works for a single neuron too! Here’s the breakdown:\n", + "\n", + "A neuron is basically the simplest function in machine learning. With one input and one output, its formula looks like this:\n", + "\n", + "$$ y = x \\cdot w + b $$\n", + "\n", + "Where:\n", + "-\t $ x $ is the input.\n", + "-\t $ w $ is the weight of the neuron.\n", + "-\t $ y $ is the output.\n", + "- $ b $ is the bias of the neuron, which is just a value we add to the result.\n", + "\n", + "If we had two inputs, the formula would expand to:\n", + "\n", + "$$ y = x_1 \\cdot w_1 + x_2 \\cdot w_2 + b $$\n", + "\n", + "The goal here is to modifie $ w $ and $b$ so that our predicted output $( y_{\\text{pred}} )$ matches the actual output $( y )$.\n", + "\n", + "Alright, let’s get started by creating a dataset for our neuron to learn from!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Task: Create the training set\n", + "train_set = [\n", + " [0, 0],\n", + " [1, 2],\n", + " [2, 4],\n", + " [3, 6],\n", + " [4, 8],\n", + " [5, 10]\n", + "]\n", + "print(\"Training Set:\", train_set)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Task: Visualize the training set\n", + "\n", + "x = [i[0] for i in train_set]\n", + "y = [i[1] for i in train_set]\n", + "\n", + "plt.plot(x, y, 'ro')\n", + "plt.axis([0, 6, 0, 12])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2: Initialize the Weight and Bias\n", + "\n", + "We need to initialize both `w` and `b` with random values beetwen 0 and 10 at the start : " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "random.seed(0)\n", + "\n", + "\n", + "mini = 0\n", + "maxi = 10\n", + "\n", + "#TODO: randomise the weight and bias beetwen 0 and 10\n", + "w = ...\n", + "b = ...\n", + "\n", + "print(\"Initial Weight:\", w)\n", + "print(\"Initial Bias:\", b)\n", + "\n", + "assert w > 8.44 and w < 8.45, \"Weight is not correct\"\n", + "assert b > 7.57 and b < 7.58, \"Bias is not correct\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### step 3 : Make prediction using the neuron formula\n", + "\n", + "With the formula of above, calculate the `y_pred` of each input `x` in the train_set in a function call **forward** \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Define the forward function -> neuron function\n", + "def forward(x):\n", + " ...\n", + "\n", + "#TODO: Test the forward function\n", + "for x, y in train_set:\n", + " y_pred = ...\n", + " print(\"Prediction:\", y_pred, \"Actual:\", y)\n", + "\n", + "assert forward(0) > 7.57 and forward(0) < 7.58, \"Test Failed\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# graph of the prediction and the actual data\n", + "x = [i[0] for i in train_set]\n", + "y = [i[1] for i in train_set]\n", + "y_pred = [forward(i) for i in x]\n", + "\n", + "plt.plot(x, y, 'ro')\n", + "plt.plot(x, y_pred, 'bo')\n", + "plt.axis([0, 6, 0, 50])\n", + "plt.show()" + ] + }, + { + "attachments": { + "mse_loss_explication.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, there is a significant difference between the prediction and the result, which we refer to as the error. To measure this error, we use something called a loss function. The main goal is to minimize the loss function as much as possible, ideally getting it as close to zero as we can.\n", + "\n", + "One common way to calculate this error is by using the Mean Squared Error (MSE). First, we measure how far the prediction is from the actual output using this formula:\n", + "\n", + "$$ \\text{Error} = y - y_{\\text{pred}} $$\n", + "\n", + "Then, we square this error to make sure it’s always positive:\n", + "\n", + "$$ \\text{Error Squared} = (y - y_{\\text{pred}})^2 $$\n", + "\n", + "Finally, we sum up all these squared errors across the dataset and take the average to get the Mean Squared Error (MSE), which is one of the most popular loss functions out there:\n", + "\n", + "$$ \\text{MSE} = \\frac{1}{n} \\sum_{i=1}^n (y_i - y_{\\text{pred},i})^2 $$\n", + "\n", + "![mse_loss_explication.png](attachment:mse_loss_explication.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*You only need to set up a squared error since we’re learning example by example, so there’s no need to sum anything! But hey, as a bonus, you could try implementing it in a **mean_squared_error** function :)*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Define the loss function \n", + "def squared_error(y, y_pred):\n", + " ...\n", + "\n", + "#TODO: Test the Squared loss function\n", + "for x, y in train_set:\n", + " ...\n", + " error = ...\n", + " print(\"Error:\", error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### step 6 : Calcule the derivative \n", + "\n", + "After get this loss we need to reduce it near 0 ! To move from just measuring the error to actually reducing it, we need to figure out in which direction (+ or -) and by how much change our parameters `w`and `b`. This is when derivatives (the slope of your loss function) come in !\n", + "\n", + "Why Derivatives?\n", + "\n", + "•\tA derivative tells us how fast the loss changes as we vary a parameter.\n", + "\n", + "•\tIf the derivative is large and positive, it means the loss will decrease if we move the parameter in the negative direction.\n", + "\n", + "•\tIf the derivative is large and negative, it means the loss will decrease if we move the parameter in the positive direction.\n", + "\n", + "For a single data point (x, y), the derivative of w is:\n", + "\n", + "\n", + "Derivative for `w`\n", + "\n", + "$$\n", + "\\frac{\\partial L}{\\partial w}\n", + "= 2 x \\bigl(y - ypred\\bigr).\n", + "$$\n", + "\n", + "Derivative for `b`\n", + "\n", + "$$\n", + "\\frac{\\partial L}{\\partial b} = 2 \\bigl(y - ypred\\bigr).\n", + "$$\n", + "\n", + "*Bonus : To better understand and apply the derivative of the function MSE with the linear regression, try calculating it manually and use the principe of [chain rule](https://www.youtube.com/watch?v=NO3AqAaAE6o) ! :)*\n", + "\n", + "chain rule : \n", + "\n", + "$$ \n", + "\\frac{\\partial L}{\\partial w} = \\frac{\\partial L}{\\partial y_{\\text{pred}}} \\cdot \\frac{\\partial y_{\\text{pred}}}{\\partial w}.\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : setup the derivative of the loss function\n", + "def derivative_w(x, y, y_pred):\n", + " ... # 1 line of code\n", + "\n", + "def derivative_b(y, y_pred):\n", + " ... # 1 line of code\n", + "\n", + "def derivative(x, y, y_pred):\n", + " return (derivative_w(x, y, y_pred), derivative_b(y, y_pred))\n", + "\n", + "\n", + "# Print the result of the derivative\n", + "for x, y in train_set:\n", + " ... # here\n", + " derivative_weight, derivative_bias = ... # here\n", + "\n", + " print(f\"x: {x} ; y: {y} -> y_pred: {y_pred:.3f} | Derivative Weight: {derivative_weight:.3f} | Derivative Bias: {derivative_bias:.3f}\")\n", + " # save the derivatives for after \n", + " dw_values = [derivative_w(x, y, forward(x)) for x, y in train_set]\n", + " db_values = [derivative_b(y, forward(x)) for x, y in train_set]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(12, 6))\n", + "\n", + "# Plot derivative of w\n", + "plt.subplot(1, 2, 1)\n", + "plt.plot(range(len(dw_values)), dw_values, marker='o', color='blue', label=\"Derivative of w\")\n", + "plt.title(\"Derivative of Loss with respect to w\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/dw)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "# Plot derivative of b\n", + "plt.subplot(1, 2, 2)\n", + "plt.plot(range(len(db_values)), db_values, marker='o', color='orange', label=\"Derivative of b\")\n", + "plt.title(\"Derivative of Loss with respect to b\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/db)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "As you can see, the larger the error, the larger the gradient will be. This is because the gradient measures how much the weights and biases should change to reduce the error. During training, this gradient is applied. What this means is that at every iteration, the gradient is used to update the corresponding values (weights and biases) so the function can learn and improve.\n", + "\n", + "This process is called gradient descent. It is the step where we adjust the weights and biases based on the gradient using the following formula:\n", + "\n", + "$$\n", + "w = w - \\eta \\cdot \\frac{\\partial L}{\\partial w}\n", + "$$\n", + "\n", + "$$\n", + "b = b - \\eta \\cdot \\frac{\\partial L}{\\partial b}\n", + "$$\n", + "\n", + "Here:\n", + "\n", + "•\t $ w $ $ b $ is the weight, and is the bias.\n", + "\n", + "•\t $ \\eta $ is the learning rate, a parameter that controls how large each update step will be.\n", + "\n", + "•\t $ \\frac{\\partial L}{\\partial w} $ and $ \\frac{\\partial L}{\\partial b} $ are the gradients of the loss function with respect to the weight and bias.\n", + "\n", + "\n", + "If the gradient is negative, the weight increases ; if the gradient is positive, the weight decreases. This helps reduce the loss over time.\n", + "\n", + "We don't add all the gradient to the weight, as you can see he his to big to reach the '2' value. To counter this problem, we only add a percentage of the gradient, and this percentage is choose by the learning rate.\n", + "\n", + "The learning rate plays a crucial role in how the model learns:\n", + "\t•\tA large learning rate makes the updates bigger, but the model might “overshoot” the optimal values and never fully converge to a solution.\n", + "\t•\tA small learning rate makes the updates smaller, which helps the model converge more precisely but takes a lot longer to reach the optimal solution.\n", + "\n", + "Choosing the right learning rate is critical. Too large, and the model might not learn properly; too small, and training might take too long, **choose carefully !**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Update the weight and bias\n", + "LEARNING_RATE = ...\n", + "\n", + "w -= ...\n", + "b -= ...\n", + "\n", + "print(\"Updated Weight:\", w)\n", + "print(\"Updated Bias:\", b)\n", + "\n", + "for x, y in train_set:\n", + " y_pred = forward(x)\n", + " derivative_weight, derivative_bias = derivative(x, y, y_pred)\n", + " print(f\"x: {x} ; y: {y} -> y_pred: {y_pred:.3f} | Derivative Weight: {derivative_weight:.3f} | Derivative Bias: {derivative_bias:.3f}\")\n", + " # Plot the derivatives\n", + " dw_values = [derivative_w(x, y, forward(x)) for x, y in train_set]\n", + " db_values = [derivative_b(y, forward(x)) for x, y in train_set]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(12, 6))\n", + "\n", + "# Plot derivative of w\n", + "plt.subplot(1, 2, 1)\n", + "plt.plot(range(len(dw_values)), dw_values, marker='o', color='blue', label=\"Derivative of w\")\n", + "plt.title(\"Derivative of Loss with respect to w\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/dw)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "# Plot derivative of b\n", + "plt.subplot(1, 2, 2)\n", + "plt.plot(range(len(db_values)), db_values, marker='o', color='orange', label=\"Derivative of b\")\n", + "plt.title(\"Derivative of Loss with respect to b\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/db)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "# OPTIONNAL BUT RECOMMENDED : if you want reload the weight randomly at the first value \n", + "\n", + "w = random.uniform(mini, maxi)\n", + "b = random.uniform(mini, maxi)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the model is learning effectively because the gradients are decreasing. Now, we need to repeat this process for a certain number of epochs *(basically, going through the entire dataset multiple times to give the function more opportunities to learn)*.\n", + "\n", + "Try your best to implement the train function alone ! if you don't find the solution you can check just below this cell, there is a description of what is inside the train function, *good luck !* " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Train the model\n", + "\n", + "epochs = ...\n", + "learning_rate = ...\n", + "\n", + "\n", + "for ... :\n", + " for ... :\n", + " # ~ 5 lines to code \n", + " print(f\"Epoch: {epoch}, Loss: {loss:.5f}\")\n", + "\n", + "print(\"Final Weight:\", w)\n", + "print(\"Final Bias:\", b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your goal above is to create the function that will train your function ! A little help of what are the step :\n", + "- Define `Hyperparameters` (epochs and lr)\n", + "- Iterate to your `epochs` and for each epoch to your `train_set`\n", + "- apply a `forward` function to get a `y_pred`\n", + "- calculate the `squared error` \n", + "- calculate the `derivative`\n", + "- apply this derivative to the parameters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done you create your first machine learning function ! you can test is on many example you want by simple run the forward pass :)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 3\n", + "print (\"Prediction \", forward(x))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the result from -10 to 15 with the actual data\n", + "x_range = list(range(-5, 15))\n", + "y_pred_range = [forward(x) for x in x_range]\n", + "\n", + "# Actual data points\n", + "x_actual = [i[0] for i in train_set]\n", + "y_actual = [i[1] for i in train_set]\n", + "\n", + "plt.plot(x_actual, y_actual, 'ro', label='Actual Data')\n", + "plt.plot(x_range, y_pred_range, 'b-', label='Predicted Data')\n", + "plt.xlabel('x')\n", + "plt.ylabel('y')\n", + "plt.legend()\n", + "plt.title('Actual vs Predicted Data')\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "Congratulations on building your very first machine learning algorithm !! I bet you were itching to dive straight into AI, and here you are, well done, now let's discover another fundametals of machine learning, [logistic regression]()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day03/1 - Regression/logistic_regression.ipynb b/AI/Day03/1 - Regression/logistic_regression.ipynb new file mode 100644 index 0000000..dd347fb --- /dev/null +++ b/AI/Day03/1 - Regression/logistic_regression.ipynb @@ -0,0 +1,453 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 2: Neural Networks from Scratch\n", + " - ### Module 2: Logistic Regression\n", + "-----------" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you just dove into linear regression; let's discover another banger — **logistic regression**! \n", + "\n", + "While linear regression outputs continuous values, logistic regression predicts probabilities, making it ideal for classification tasks.\n", + "\n", + "The key difference lies in the **output function**:\n", + "- Linear regression: $$y = a * x + b$$\n", + "- Logistic regression: $$ y = sigmoid(a * x + b)$$\n", + "\n", + "Moreover, you might wonder: is it possible to perform logistic regression with a polynomial function? The answer is **yes**! Logistic regression can work with polynomial transformations of the input, allowing the model to capture non-linear decision boundaries.\n", + "\n", + "Let's dive into building logistic regression step by step, including polynomial transformations!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Create the training set\n", + "train_set = [\n", + " [1, 0],\n", + " [2, 0],\n", + " [3, 0],\n", + " [4, 0],\n", + " [5, 0],\n", + " [6, 1],\n", + " [7, 1],\n", + " [8, 1],\n", + " [9, 1],\n", + " [10, 1]\n", + "]\n", + "\n", + "print(train_set)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Why a dataset like this you will say ? Because logistic regression works well as find cluster of data and make a linear observation of it here's what you need to understand " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Separate the training set into two clusters based on the label\n", + "cluster_0 = [point[0] for point in train_set if point[1] == 0]\n", + "cluster_1 = [point[0] for point in train_set if point[1] == 1]\n", + "\n", + "# Plot the clusters\n", + "plt.scatter(cluster_0, [0] * len(cluster_0), color='red', label='Cluster 0')\n", + "plt.scatter(cluster_1, [1] * len(cluster_1), color='blue', label='Cluster 1')\n", + "plt.xlabel('Data Points')\n", + "plt.ylabel('Cluster')\n", + "plt.legend()\n", + "plt.title('Training Set Clusters')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2: Initialize the Weight and Bias\n", + "\n", + "We need to initialize both `w` and `b` with random values beetwen 0 and 10 at the start : " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "random.seed(0)\n", + "#TODO: randomise the weight and bias beetwen 0 and 10\n", + "\n", + "mini = 0\n", + "maxi = 10\n", + "\n", + "w = ...\n", + "b = ...\n", + "\n", + "print(\"Initial Weight:\", w)\n", + "print(\"Initial Bias:\", b)\n", + "\n", + "assert w > 8.44 and w < 8.45, \"Weight is not correct\"\n", + "assert b > 7.57 and b < 7.58, \"Bias is not correct\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### step 3 : Make prediction using the known formula (with sigmoid)\n", + "\n", + "To make predictions, we’ll use the sigmoid function, a fundamental tool in machine learning. The sigmoid is often used to squash values into the range [0, 1], which makes it particularly useful for binary classification tasks. It’s defined as:\n", + "\n", + "$$\n", + "\\sigma(y) = \\frac{1}{1 + e^{-y}}\n", + "$$\n", + "\n", + "Where:\n", + "•\t $ y = w \\cdot x + b $ (the neuron formula if you had forgotten)\n", + "\n", + "•\t $ e $ is the base of the natural logarithm.\n", + "\n", + "The sigmoid function ensures that large positive values of z approach 1, and large negative values approach 0, with a smooth curve in between.\n", + "\n", + "\n", + "With the formula of above, calculate the `y_pred` of each input `x` in the train_set in a function call **forward** and it will use your **sigmoid** function \n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import math\n", + "\n", + "e = math.e\n", + "\n", + "#TODO: Define the sigmoid function (use pow)\n", + "def sigmoid(x):\n", + " ...\n", + "\n", + "assert sigmoid(1) > 0.73 and sigmoid(1) < 0.74, \"Sigmoid is not correct\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Define the forward function -> neuron function\n", + "def forward(x):\n", + " ...\n", + "\n", + "#TODO: Test the forward function\n", + "for x, y in train_set:\n", + " y_pred = ...\n", + " print(\"Prediction:\", y_pred, \"Actual:\", y)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# graph of the prediction and the actual data\n", + "x = [i[0] for i in train_set]\n", + "y = [i[1] for i in train_set]\n", + "y_pred = [forward(i) for i in x]\n", + "\n", + "plt.plot(x, y, 'ro')\n", + "plt.plot(x, y_pred, 'bo')\n", + "plt.axis([0, 11, -0.5, 1.5])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Pretty close to having everything correct on the first try! As you can see, the separation between the two groups isn’t very clear yet. This is where the loss like before comes in to help us improve.\n", + "\n", + "For this case, we’re going to use a different loss function: **Binary Cross-Entropy Loss**. This loss function is specifically designed for binary classification tasks *(predicting values between 0 and 1)*, which aligns perfectly with the output of our sigmoid function. Pretty neat, right?\n", + "\n", + "Here’s a breakdown of how it works:\n", + "\n", + "---\n", + "\n", + "#### *Binary Cross-Entropy Loss Formula*\n", + "\n", + "The Binary Cross-Entropy Loss (Single prediction) is defined as:\n", + "\n", + "$$\n", + "\\text{Loss} = - \\left[ y \\cdot \\log(\\hat{y}) + (1 - y) \\cdot \\log(1 - \\hat{y}) \\right]\n", + "$$\n", + "\n", + "\n", + "Where:\n", + "•\t $ y $ is the true label (0 or 1) for sample .\n", + "\n", + "•\t $ \\hat{y} $ is the predicted probability for sample  (the output of the function).\n", + "\n", + "•\t$\\log$ is the natural logarithm.\n", + "\n", + "\n", + "The overall goal of this loss function is to minimize the difference between the true labels  and the predicted probabilities , guiding the model to make better predictions.\n", + "\n", + "Next, implement this formula in your code to calculate the loss for your predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from math import log\n", + "\n", + "eps = 1e-15\n", + "\n", + "def binary_cross_entropy_loss(y, y_pred):\n", + " # We clamp the prediction value to avoid log(0)\n", + " y_pred_clamped = max(min(y_pred, 1 - eps), eps)\n", + " ...\n", + "\n", + "#TODO: Test the binary cross entropy loss function\n", + "for x, y in train_set:\n", + " ...\n", + " error = ...\n", + " print(\"Error:\", error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "as you can see the error for the data that are one is so small, but the first one is the one we need to update, the derivative are more simple here !\n", + "\n", + "The derivative for `w`\n", + "$$\n", + "\\frac{\\partial L}{\\partial w} = (y_{\\text{pred}} - y) \\, x.\n", + "\n", + "$$\n", + "\n", + "The derivative for `b`\n", + "$$\n", + "\\frac{\\partial L}{\\partial b} = (y_{\\text{pred}} - y)\n", + "\n", + "$$\n", + "\n", + "*Bonus : To better understand and apply the derivative of the function BCE with the logistic regression, try calculating it manually and never forget the chain rule ! :)* " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO setup the derivative of the loss function \n", + "def derivative_w(x, y, y_pred):\n", + " ...\n", + "\n", + "def derivative_b(y, y_pred):\n", + " ...\n", + "\n", + "def derivative(x, y, y_pred):\n", + " return (derivative_w(x, y, y_pred), derivative_b(y, y_pred))\n", + "\n", + "for x, y in train_set:\n", + " ...\n", + " derivative_weight, derivative_bias = ...\n", + "\n", + " print(f\"x: {x}, y: {y}, y_pred: {y_pred:.3f}, Derivative Weight: {derivative_weight:.3f}, Derivative Bias: {derivative_bias:.3f}\")\n", + " # Plot the derivatives\n", + " dw_values = [derivative_w(x, y, forward(x)) for x, y in train_set]\n", + " db_values = [derivative_b(y, forward(x)) for x, y in train_set]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(12, 6))\n", + "\n", + "# Plot derivative of w\n", + "plt.subplot(1, 2, 1)\n", + "plt.plot(range(len(dw_values)), dw_values, marker='o', color='blue', label=\"Derivative of w\")\n", + "plt.title(\"Derivative of Loss with respect to w\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/dw)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "# Plot derivative of b\n", + "plt.subplot(1, 2, 2)\n", + "plt.plot(range(len(db_values)), db_values, marker='o', color='orange', label=\"Derivative of b\")\n", + "plt.title(\"Derivative of Loss with respect to b\")\n", + "plt.xlabel(\"Index of point in dataset\")\n", + "plt.ylabel(\"Derivative (dL/db)\")\n", + "plt.axhline(0, color='grey', linestyle='--', linewidth=0.8)\n", + "plt.legend()\n", + "plt.grid(True)\n", + "\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "# OPTIONNAL BUT RECOMMENDED : if you want reload the weight randomly at the first value \n", + "\n", + "w = random.uniform(mini, maxi)\n", + "b = random.uniform(mini, maxi)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now try your best to implement thetrain version ! *The same way you did for the linear regression remember it !*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Train the model\n", + "\n", + "epochs = ...\n", + "learning_rate = ...\n", + "\n", + "\n", + "for ... :\n", + " for ... :\n", + " # ~ Also 5 lines to code \n", + " print(f\"Epoch: {epoch}, Loss: {loss:.5f}\")\n", + "\n", + "print(\"Final Weight:\", w)\n", + "print(\"Final Bias:\", b)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 0\n", + "print (\"Prediction \", forward(x))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for x, y in train_set:\n", + " y_pred = forward(x)\n", + " print (x, y_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "# Plot the actual data points\n", + "plt.scatter(cluster_0, [0] * len(cluster_0), color='red', label='Cluster 0')\n", + "plt.scatter(cluster_1, [1] * len(cluster_1), color='blue', label='Cluster 1')\n", + "\n", + "# Plot the predicted probabilities\n", + "x_values = range(0, 11)\n", + "y_pred_values = [forward(x) for x in x_values]\n", + "plt.plot(x_values, y_pred_values, color='green', linestyle='-', linewidth=2, marker='o', label='Predicted Probability')\n", + "\n", + "# Add a horizontal line at 0.5\n", + "plt.axhline(y=0.5, color='grey', linestyle='--', linewidth=0.8, label='Threshold 0.5')\n", + "\n", + "# Add labels and title\n", + "plt.xlabel('Data Points')\n", + "plt.ylabel('Probability')\n", + "plt.title('Logistic Regression Predictions')\n", + "plt.legend()\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "Congratulations on building your second machine learning algorithm !! now let's level up the difficulty and introduce you to the concept of neural network, good luck ! " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day03/2 - Neural Networks/images/One_Hot_encoding .png b/AI/Day03/2 - Neural Networks/images/One_Hot_encoding .png new file mode 100644 index 0000000..b4ec1a8 Binary files /dev/null and b/AI/Day03/2 - Neural Networks/images/One_Hot_encoding .png differ diff --git a/AI/Day03/2 - Neural Networks/images/backpropagation.png b/AI/Day03/2 - Neural Networks/images/backpropagation.png new file mode 100644 index 0000000..c0677b6 Binary files /dev/null and b/AI/Day03/2 - Neural Networks/images/backpropagation.png differ diff --git a/AI/Day03/2 - Neural Networks/images/matrix_multiplication.png b/AI/Day03/2 - Neural Networks/images/matrix_multiplication.png new file mode 100644 index 0000000..fdd1f20 Binary files /dev/null and b/AI/Day03/2 - Neural Networks/images/matrix_multiplication.png differ diff --git a/AI/Day03/2 - Neural Networks/images/y_prediction.png b/AI/Day03/2 - Neural Networks/images/y_prediction.png new file mode 100644 index 0000000..344afe4 Binary files /dev/null and b/AI/Day03/2 - Neural Networks/images/y_prediction.png differ diff --git a/AI/Day03/2 - Neural Networks/neural_networks.ipynb b/AI/Day03/2 - Neural Networks/neural_networks.ipynb new file mode 100644 index 0000000..39c7df4 --- /dev/null +++ b/AI/Day03/2 - Neural Networks/neural_networks.ipynb @@ -0,0 +1,678 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 2: Neural Networks from Scratch\n", + " - ### Module 2: Neural Networks\n", + "-----------\n", + "\n", + "During this notebook you will see concept that you already see in the previous notebook ! They are just here to remind you and to show how are they apply to neural networks... good luck !" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "#import the libraries \n", + "\n", + "import numpy as np\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "attachments": { + "matrix_multiplication.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's go deeper in the machine learning by talking about neural networks, \n", + "Neural networks are just a bunch of layer filled of neurons that are linked together.\n", + "\n", + "There are using the same approach of the machine learning like you do before but here you will use matrices to process multiples calculs.\n", + "\n", + "did you remember the following formula for process multiple neurons ? \n", + "\n", + "$$ y = W_1 \\cdot x_1 + W_2 \\cdot x_2 + b $$\n", + "\n", + "it is what are performs in neural network but with a lot of neurons !\n", + "\n", + "![matrix_multiplication.png](attachment:matrix_multiplication.png)\n", + "\n", + "This notebook is axed especially in the maths behind the neural network so good luck to everyone !\n" + ] + }, + { + "attachments": { + "y1_formula.png": { + "image/png": "" + }, + "y_prediction.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1 Be a neural network...\n", + "\n", + "for this first task you will do the same work as a neural network will do to learn from an output. *(we remove the bias to simplify your calculation :)*)\n", + "\n", + "imagine you have 4 neurons in input and 4 other neurons in output, these neurons can be see as two layers of each 4 neurons, these layers can be represented as two numpy matrices with shape of (4,1).\n", + "\n", + "In a neural network all the neurons of a layers are connected to the other like you already know\n", + "\n", + "if you have to predict the value of a neuron called $y1$ it will be : \n", + "\n", + "![y_prediction.png](attachment:y_prediction.png)\n", + "\n", + "your goal here is to create to matrices each with 4 neurons with respective values : \n", + "\n", + "- [0, 1, 0, 1]\n", + "- [1, 0, 1, 0]\n", + "\n", + "and to calcul with your *intelligence* all the respective weight to get the second matrices.\n", + "\n", + "we can set the representation like this : \n", + "\n", + "\n", + "$$\n", + "y1 = \\begin{pmatrix} 0_{a} & 1_{b} & 0_{c} & 1_{d} \\end{pmatrix} \\times \\begin{pmatrix}\n", + "w_{a1} & w_{a2} & w_{a3} & w_{a4} \\\\\n", + "w_{b1} & w_{b2} & w_{b3} & w_{b4} \\\\\n", + "w_{c1} & w_{c2} & w_{c3} & w_{c4} \\\\\n", + "w_{d1} & w_{d2} & w_{d3} & w_{d4}\n", + "\\end{pmatrix} = \\begin{pmatrix} 1 & 0 & 1 & 0 \\end{pmatrix}\n", + "$$\n", + "\n", + "Don't hesitate to check the [NumPy Documentation](https://numpy.org/doc/stable/) if needed.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Create the two matrix\n", + "N_neurons = ...\n", + "Y_neurons = ...\n", + "\n", + "\n", + "# TODO: Complete the matrix W\n", + "W_weight = ...\n", + "\n", + "Y_pred = N_neurons @ W_weight # the @ operator is used for matrix multiplication\n", + "\n", + "print(\"Result after multiplication :\")\n", + "print(Y_pred ,\"vs\", Y_neurons)\n", + "\n", + "assert np.array_equal(Y_pred, Y_neurons), \"Output matrices doesn't not correspond to what we want\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What you just did here is exactly what a model does during the backpropagation pass (you will see after)! The model repeatedly adjusts its weights to better fit the data, just as you did. Keep in mind that this is how the model updates his parameters to learn over time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "___\n", + "## Step 2 concept \n", + "\n", + "## 1. Data Preparation\n", + "\n", + "**Dataset Overview**\n", + "\n", + "*The Iris dataset is a classic in machine learning—like the “Hello, World!” for ML.\n", + "It has 150 samples of flowers, each sample with 4 features:*\n", + "\n", + "- Sepal length\n", + "- Sepal width\n", + "- Petal length\n", + "- Petal width\n", + "\n", + "*There are 3 classes (species of Iris):*\n", + "\n", + "- Iris Setosa (Class 0)\n", + "- Iris Versicolor (Class 1)\n", + "- Iris Virginica (Class 2)\n", + "\n", + "\n", + "### 1.1 Import the data \n", + "\n", + "You need to import the data in two different dataset call train and test, respectivly for training the model and testing it with data that he never seen.\n", + "\n", + "### 1.2 Normalization\n", + "\n", + "Each feature (sepal length, sepal width, etc.) can range over different numeric values. For example, sepal length can vary between roughly 1 cm and 8 cm. To make the training more stable (and keep our brain from exploding with large numbers!), we often normalize or standardize these features:\n", + "\n", + "An simple demo, if we have a sepal lenght equal to 4 and all the value are between 1 and 8 if we normalise into the range $[0,1]$. Think of it like converting percentages, it becomes about 0.5 (or 50%)\n", + "\n", + "Normalization typically scales each feature into something like [0,1].\n", + "Standardization transforms each feature to have mean 0 and standard deviation 1.\n", + "\n", + "For simplicity, let’s do a quick normalization to [0, 1] by dividing by the maximum value (or you can also do min-max). That makes everything neat and tidy for the network." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# 1) Load the dataset\n", + "iris = load_iris()\n", + "X = iris.data # shape (150, 4)\n", + "y = iris.target # shape (150,)\n", + "\n", + "# TODO: 2) Normalize (simple min-max, for example)\n", + "X_min = ...\n", + "X_max = ...\n", + "X_norm = ... # scale each column to [0, 1]\n", + "\n", + "# 3) Split into train/test for a bit of realism\n", + "train_inputs, test_inputs, train_results, test_results = train_test_split(X_norm, y, test_size=0.2, random_state=42)\n", + "\n", + "# TODO: 4) Print the shapes\n", + "...\n", + "\n", + "# TODO: print an example\n", + "print(\"-\"*40)\n", + "...\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Model Architecture\n", + "\n", + "### 2.1 The Building Blocks\n", + "\n", + "A neural network is made of layers. Each layer performs a simple math operation on its inputs.\n", + "\n", + "### Layers and Their Components\n", + "\n", + "1. **Input Layer:**\n", + " - **What it does:** Simply receives the data.\n", + " - **Our case:** we have 4 features as input\n", + "2. **Hidden Layer (or layers):**\n", + " - **Layer size :** Let’s pick, for example, 8 neurons in one hidden layer—just because we can.\n", + " - **What it does:** Performs a mathematical transformation using *weights* and *biases*.\n", + " - **Weights (W):** Numbers that multiply each input. Think of them as “importance factors.”\n", + " - **Biases (b):** Numbers that get added to the result, helping the network adjust its output.\n", + "3. **Output Layer:**\n", + " - **What it does:** Produces a score for each possibilties (3). Then, we use the softmax function to turn these scores into probabilities.\n", + "\n", + "### A Simple Example\n", + "\n", + "Imagine you have an input vector XX with 2 numbers:\n", + "\n", + "$$X = \\begin{bmatrix} 2 \\\\ 3 \\end{bmatrix}$$\n", + "\n", + "And you have one neuron (a simple unit) that computes an output using:\n", + "\n", + "$$Z=W⋅X+b$$\n", + "\n", + "If W is $[0.1,0.2]$ and $b=0.5$, then the calculation is:\n", + "\n", + "$$Z = 0.1 \\times 2 + 0.2 \\times 3 + 0.5 = 0.2 + 0.6 + 0.5 = 1.3$$\n", + "\n", + "This is the basic math performed in each neuron.\n", + "\n", + "### 2.2 Our Network Design\n", + "\n", + "We will build a network with:\n", + "\n", + "- **Hidden Layer:** 8 neurons with ReLU activation.\n", + "- **Output Layer:** 3 neurons with softmax activation.\n", + "\n", + "You might wonder, “Why 8 in the hidden layer?” Because 7 would be too few, and 9 might make the universe collapse (just kidding!). There’s no magical reason—just a decent small size for a simple dataset. *But think that use mutliple of two between layer make the calcul more efficient*.\n", + "\n", + "\n", + "\n", + "### Parameter Initialization\n", + "\n", + "We start by randomly initializing the weights (small random numbers) and setting the biases to zero." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(42) # Set a random seed for reproducibility\n", + "\n", + "# TODO: Define dimensions\n", + "input_dim = ... # 4 for the Iris dataset\n", + "hidden_dim = ... # Number of neurons in the hidden layer\n", + "output_dim = ... # 3 classes in the output layer\n", + "\n", + "# TODO: Initialize weights and biases\n", + "W1 = np.random.rand(...) # Shape: (hidden_dim, input_dim)\n", + "b1 = np.ones(...) # Shape: (hidden_dim, 1)\n", + "\n", + "W2 = ... # Shape: (output_dim, hidden_dim)\n", + "b2 = ... # Shape: (output_dim, 1)\n", + "\n", + "assert W1.sum() < 13.94 and W1.sum() > 13.93 , \"W1 not well initialized\" \n", + "assert b1.sum() == 8 , \"b1 not well initialized\"\n", + "assert W2.sum() < 13.46 and W2.sum() > 13.45 , \"W2 not well initialized\"\n", + "assert b2.sum() == 3 , \"b2 not well initialized\"\n", + "\n", + "# TODO: Print the shapes and the matrices of W1, b1, W2, and b2 to ensure they match the expected dimensions.\n", + "..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Forward Pass\n", + "\n", + "### 3.1 What Is a Forward Pass?\n", + "\n", + "A forward pass means sending the input through the network to get an output. It consists of:\n", + "\n", + "1. **Calculating the linear transformation:**\n", + "\n", + " $$Z = W \\cdot X + b$$\n", + "\n", + "2. **Applying an activation function:**\n", + " - **ReLU:** Sets negative numbers to 0.\n", + " - **Softmax:** Converts raw scores into probabilities.\n", + "\n", + "### 3.2 Detailed Steps and Examples\n", + "\n", + "### Hidden Layer Computation\n", + "\n", + "1. **Linear Transformation:**\n", + "\n", + " For the hidden layer, we compute:\n", + "\n", + " $$Z^{(1)} = W^{(1)} \\cdot X + b^{(1)}$$\n", + "\n", + " Here, $X$ is the 4×1 vector, $W^{(1)}$ is 4×8, and $b^{(1)}$ is 8×1.\n", + "\n", + " The result $Z^{(1)}$ is a 8×1 vector (one value per neuron).\n", + "\n", + "2. **ReLU Activation:**\n", + "\n", + " ReLU is defined as:\n", + "\n", + " $$\\text{ReLU}(z) = \\max(0, z)$$\n", + "\n", + " For each number in $Z^{(1)}$, if it’s negative, we set it to 0; if it’s positive, we leave it as is.\n", + "\n", + "\n", + "### Output Layer Computation\n", + "\n", + "1. **Linear Transformation:**\n", + "\n", + " Next, for the output layer:\n", + "\n", + " $Z^{(2)} = W^{(2)} \\cdot A^{(1)} + b^{(2)}$\n", + "\n", + " where $A^{(1)}$ is the output of the hidden layer (after applying ReLU).\n", + "\n", + "2. **Softmax Activation:**\n", + "\n", + " Softmax turns the 3 scores in $Z^{(2)}$ into probabilities:\n", + "\n", + " $Y\\hat{Y}_j = \\frac{e^{Z^{(2)}_j}}{\\sum_{k=1}^{3} e^{Z^{(2)}_k}}$\n", + "\n", + " This ensures that all 3 probabilities add up to 1." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: implement the relu function\n", + "def relu(z):\n", + " ...\n", + "\n", + "#TODO: implement the softmax function\n", + "def softmax(z):\n", + " ...\n", + "\n", + "def forward_pass(X, W1, b1, W2, b2):\n", + " \"\"\"\n", + " Perform the forward pass through the network.\n", + " Input:\n", + " - X: A column vector of shape (4, 1)\n", + " Returns:\n", + " - A1: Activations from the hidden layer.\n", + " - A2: Output probabilities from the output layer.\n", + " \"\"\"\n", + " X = X.reshape(-1, 1) # Reshape to a column vector if needed.\n", + "\n", + " # Hidden layer computation:\n", + " Z1 = np.dot(W1, X) + b1 # This performs the linear combination.\n", + " # TODO: Apply the ReLU function to Z1 to get A1.\n", + " A1 = ...\n", + "\n", + " # Output layer computation:\n", + " Z2 = np.dot(W2, A1) + b2 # Linear combination for the output.\n", + " # TODO: Apply the softmax function to Z2 to get A2.\n", + " A2 = ...\n", + "\n", + " return Z1, A1, A2\n", + "\n", + "# Example: Run a forward pass for the first training sample.\n", + "Z1_example, A1_example, A2_example = forward_pass(train_inputs[0], W1, b1, W2, b2)\n", + "\n", + "# TODO: Print the hidden layer's pre-activation (Z1) and post-activation (A1) and (A2) values.\n", + "..." + ] + }, + { + "attachments": { + "One_Hot_encoding .png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Loss Function\n", + "\n", + "### 4.1 What Is the Loss Function?\n", + "\n", + "The loss function tells you how \"wrong\" your network's prediction is compared to the true label. For classification, we use **cross-entropy loss**.\n", + "\n", + "### 4.2 Explaining Cross-Entropy Loss\n", + "\n", + "Suppose your network produces a probability for each class. For a given sample:\n", + "\n", + "- Let be the output probabilities.\n", + "\n", + " $\\hat{y} = [\\hat{y}_1, \\hat{y}_2, \\dots, \\hat{y}_{10}]$\n", + "\n", + "- The true label is, say, class $c$. In one-hot encoding, the correct answer is represented as a vector with a 1 at position $c$ and 0s elsewhere.\n", + "\n", + "- Here is an example of a representation of a one-hot encoding data :\n", + "\n", + " ![One_Hot_encoding .png]()\n", + "\n", + "The cross-entropy loss is defined as:\n", + "\n", + "$L = -\\log(\\hat{y}_c)$\n", + "\n", + "This means if your network assigns a high probability to the correct class (close to 1), the loss will be small (since log⁡(1)=0\\log(1) = 0). But if the probability is low, the loss is high.\n", + "\n", + "### A Simple Example\n", + "\n", + "If the correct class is 3 and the network outputs:\n", + "\n", + "$\\hat{y} = \\begin{bmatrix} 0.1 \\\\ 0.2 \\\\ 0.05 \\\\ 0.6 \\\\ 0.02 \\\\ 0.01 \\\\ 0.005 \\\\ 0.005 \\\\ 0.005 \\\\ 0.005 \\end{bmatrix}$\n", + "\n", + "Then:\n", + "\n", + "$L = -\\log(0.6) \\approx 0.51$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cross_entropy_loss(y_true, y_pred):\n", + " \"\"\"\n", + " Compute the cross-entropy loss for one sample.\n", + " Input:\n", + " - y_true: The true label as an integer (0-9).\n", + " - y_pred: The predicted probabilities as a numpy array of shape (10, 1).\n", + " Returns:\n", + " - A scalar representing the loss.\n", + " \"\"\"\n", + " epsilon = 1e-12 # To avoid log(0)\n", + " # TODO: implement cross-entropy loss computation.\n", + " loss = ...\n", + " return loss\n", + "\n", + "# Example test:\n", + "dummy_pred = np.array([[0.1], [0.2], [0.05], [0.6], [0.02], [0.01], [0.005], [0.005], [0.005], [0.005]])\n", + "dummy_label = 3 # The correct class is 3\n", + "\n", + "# TODO: Compute and print the loss for dummy_pred.\n", + "..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Backward Pass (Backpropagation)\n", + "\n", + "### 5.1 What Is Backpropagation?\n", + "\n", + "Backpropagation is the process of figuring out how to adjust the weights and biases to make the network’s predictions closer to the truth. It does this by computing **gradients** (which are like slopes in calculus) that tell us which direction to move our parameters.\n", + "\n", + "### The Chain Rule (In Simple Terms)\n", + "\n", + "If you have a function inside another function, the chain rule tells you how changes in the inner function affect the outer function.\n", + "\n", + "- For example, if , then:\n", + "\n", + " $L = f(g(x))$\n", + "\n", + " $\\frac{dL}{dx} = \\frac{df}{dg} \\times \\frac{dg}{dx}$\n", + "\n", + "In our network, the loss LL depends on the outputs, which in turn depend on the weights. We use the chain rule to “chain” these derivatives together.\n", + "\n", + "### 5.2 Backpropagation in Our Network\n", + "\n", + "### Output Layer Gradients\n", + "\n", + "For the output layer with softmax, it can be shown that:\n", + "\n", + "$\\frac{\\partial L}{\\partial Z^{(2)}} = \\hat{y} - Y$\n", + "\n", + "where:\n", + "\n", + "- $\\hat{y}$ is the predicted probability vector,\n", + "- $Y$ is the one-hot encoded true label.\n", + "\n", + "For the weights in the output layer:\n", + "\n", + "$\\frac{\\partial L}{\\partial W^{(2)}} = (\\hat{y} - Y) \\cdot A^{(1)^T}$\n", + "\n", + "And for the biases:\n", + "\n", + "$\\frac{\\partial L}{\\partial b^{(2)}} = \\hat{y} - Y$\n", + "\n", + "### Hidden Layer Gradients\n", + "\n", + "For the hidden layer, we first compute how the loss changes with respect to the hidden activations. Then, since we used ReLU, we multiply by the derivative of ReLU:\n", + "\n", + "$\\text{ReLU}'(z) =\n", + "\\begin{cases}\n", + "1, & \\text{if } z > 0 \\\\\n", + "0, & \\text{otherwise}\n", + "\\end{cases}$\n", + "\n", + "Thus:\n", + "\n", + "$\\frac{\\partial L}{\\partial Z^{(1)}} = \\left( W^{(2)^T} \\cdot \\frac{\\partial L}{\\partial Z^{(2)}} \\right) \\circ \\mathbf{1}_{\\{Z^{(1)} > 0\\}}$\n", + "\n", + "where $\\circ$ means element-wise multiplication." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def backward_pass(X, Z1, A1, A2, y_true, W1, W2):\n", + " \"\"\"\n", + " Compute the gradients for one training sample.\n", + " Inputs:\n", + " - X: Input vector (4, 1)\n", + " - Z1: Pre-activation of hidden layer (8, 1) => i.e. W1@X + b1\n", + " - A1: Activation (ReLU) of hidden layer (8, 1)\n", + " - A2: Output probabilities (3, 1) [= softmax(W2@A1 + b2)]\n", + " - y_true: The true label as an integer (0, 1, or 2)\n", + " - W1: Weights of hidden layer (8, 4)\n", + " - W2: Weights of output layer (3, 8)\n", + " Returns:\n", + " - dW1, db1, dW2, db2 (gradients, same shapes as W1, b1, W2, b2)\n", + " \"\"\"\n", + " # Ensure X is a column vector\n", + " X = X.reshape(-1, 1) # Shape (4, 1)\n", + "\n", + " # 1) One-hot encode the true label\n", + " Y = np.zeros_like(A2)\n", + " Y[y_true] = 1.0\n", + "\n", + " # 2) Gradient wrt Z2 (output layer pre-activation)\n", + " dZ2 = A2 - Y # Shape (3, 1)\n", + "\n", + " # TODO: 3) Gradients for the output layer\n", + " # Compute dW2 as the outer product of dZ2 and the transpose of A1.\n", + " dW2 = ... # Shape (3, 8)\n", + " db2 = dZ2 # Shape (3, 1)\n", + "\n", + " # 4) Backpropagate to the hidden layer\n", + " # dA1 is the gradient coming back from the output layer:\n", + " dA1 = np.dot(W2.T, dZ2) # Shape (8, 1)\n", + "\n", + " # TODO: Derivative of ReLU\n", + " dZ1 = ... # Shape (8, 1)\n", + "\n", + " # TODO: 5) Gradients for the hidden layer\n", + " # Compute dW1 as the outer product of dZ1 and the transpose of X.\n", + " dW1 = ... # Shape (8, 4)\n", + " db1 = ... # Shape (8, 1)\n", + "\n", + " return dW1, db1, dW2, db2\n", + "\n", + "\n", + "# Example: Run a backward pass for the first training sample.\n", + "dW1_example, db1_example, dW2_example, db2_example = backward_pass(train_inputs[0], Z1_example, A1_example, A2_example, train_results[0], W1, W2)\n", + "print(\"dW1_example:\", dW1_example)\n", + "print(\"db1_example:\", db1_example)\n", + "print(\"dW2_example:\", dW2_example)\n", + "print(\"db2_example:\", db2_example)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Optimization\n", + "\n", + "### 6.1 Gradient Descent\n", + "\n", + "Once we have the gradients (the slopes), we update our parameters by taking a small step in the opposite direction of the gradient. This is the gradient descent update rule:\n", + "\n", + "$\\theta := \\theta - \\eta \\cdot \\frac{\\partial L}{\\partial \\theta}$\n", + "\n", + "- $\\theta$ represents a weight or bias.\n", + "- $\\eta$ is the learning rate (think of it as the step size)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "#TODO: define hyperparameters\n", + "epochs = ... # How many times to go through the training data\n", + "learning_rate = ...\n", + "\n", + "size = len(train_inputs)\n", + "\n", + "#TODO: Implement the update_parameters function\n", + "def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):\n", + " \"\"\"\n", + " Update the parameters using the gradients.\n", + " \"\"\"\n", + " ...\n", + "\n", + "\n", + "for epoch in range(epochs):\n", + " epoch_loss = 0\n", + " for i in range(size):\n", + " # Get one training sample and its label\n", + " X = train_inputs[i].reshape(-1, 1) # force reshape (4,) to (4, 1) to avoid error\n", + " y = train_results[i]\n", + "\n", + " # Forward pass: compute activations\n", + " Z1, A1, A2 = forward_pass(X, W1, b1, W2, b2)\n", + "\n", + " # Compute the loss for this sample\n", + " loss = cross_entropy_loss(y, A2)\n", + " epoch_loss += loss\n", + "\n", + " # Backward pass: compute gradients\n", + " dW1, db1, dW2, db2 = backward_pass(X, Z1, A1, A2, y, W1, W2)\n", + "\n", + " # Update parameters using gradient descent\n", + " W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)\n", + "\n", + " # Print the average loss for this epoch.\n", + " print(f\"Epoch {epoch + 1}/{epochs}, Loss: {epoch_loss / size}\")\n", + "\n", + "print(\"Training complete!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "Well done if you are here You finish this day bravo !\n", + "\n", + "if you want to go deeper again, you can try to add more layers into your neural network or try to play with another dataset like the mnist... :)\n", + "to load the mnist dataset you can see [here](https://gist.github.com/alperyeg/ca5e5e9b5ffb442a9ce5caca7c8399c1)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day03/README.md b/AI/Day03/README.md new file mode 100644 index 0000000..fde37e8 --- /dev/null +++ b/AI/Day03/README.md @@ -0,0 +1,30 @@ +# ~ PoC AI Pool 2025 ~ + +- ## Day 3: Understand Machine Learning + - ### Module 1: Linear Regression + - **Notebook:** [`linear_regression.ipynb`](<1 - Regression/linear_regression.ipynb>) + - ### Module 2: Logistic Regression + - **Notebook:** [`logistic_regression.ipynb`](<1 - Regression/logistic_regression.ipynb>) + - ### Module 3: Neural Network theory + - **Notebook:** [`neural_networks.ipynb`](<2 - Neural Networks/neural_networks.ipynb>) + +--- + +**Hooray : You've made it to AI !** +On today's menu, we'll enter the wonderful world of machine learning with two major algorithms : Linear and Logistic Regression, followed by a introduction of the theory behind neural networks ! + +> Here's a list of resources that we believe can be useful to follow along (and that we've ourselves used to learn these topics before being able to write the subjects): + +## Ressources + +- [3b1b's neural network series](https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) +- [sentdex's NNFS series & book](https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3) +- [andriy burkov's 100-page ML book](https://themlbook.com/) particularly [chapter 4](https://www.dropbox.com/s/xpd5x6p6jte3th5/Chapter4.pdf?dl=0) +- [the math sorcerer's video on partial derivatives](https://www.youtube.com/watch?app=desktop&v=gnkhT3XDU5s) +- [khan academy's video on the chain rule](https://www.youtube.com/watch?v=NO3AqAaAE6o) + + +- [derivating bce loss](https://www.google.com/search?sca_esv=593424282&rlz=1C5CHFA_enFR1086FR1086&sxsrf=AM9HkKnhFXQw46XVx7yP5nyzZOxkebfGWw:1703424283700&q=sigmoid&tbm=isch&source=lnms&sa=X&sqi=2&ved=2ahUKEwiXptP6laiDAxVBU6QEHVYpCxIQ0pQJegQIDhAB&biw=1512&bih=738&dpr=2) + + + diff --git a/AI/Day04/1 - Torch/Introduction_Torch.ipynb b/AI/Day04/1 - Torch/Introduction_Torch.ipynb new file mode 100644 index 0000000..dd99c9c --- /dev/null +++ b/AI/Day04/1 - Torch/Introduction_Torch.ipynb @@ -0,0 +1,880 @@ +{ + "cells": [ + { + "attachments": { + "torch_logo.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![torch_logo.png](attachment:torch_logo.png)\n", + "\n", + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 3: Neural Network\n", + " - ### Module 3: Neural Network with torch\n", + "-----------\n", + "\n", + "## Introduction\n", + "\n", + "\n", + "[PyTorch](https://pytorch.org/) is the most used framework when it comes to Machine Learning, especialy Deep Learning.\\\n", + "Whether for computer vision or language processing, PyTorch allows you to build the state-of-the-art in AI.\n", + "\n", + "\n", + "Developed by Meta AI, PyTorch is now part of the Linux foundation and is completely [open-source](https://github.com/pytorch/pytorch).\\\n", + "When you hear about deep learning, PyTorch is never far away as it is present in Tesla cars, it is used by OpenAI, Google, AMD, Nvidia, AWS, Microsoft, Meta, Netflix and many others !\n", + "\n", + "As you can see, PyTorch has distinguished itself as the AI framework par excellence.\n", + "\n", + "---\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Why use PyTorch\n", + "\n", + "But do you know what PyTorch is for?\n", + "\n", + "Yesterday, you dive into the theory of machine learning and neural network\n", + "\n", + "Now imagine that you have millions of parameters, a complex architecture and that you have to create a neural network from scratch every time.\n", + "\n", + "Well, **PyTorch allows you to build neural networks very easily**, forget about the mathematics behind it and build complex neural networks in a few functions and parameters. \n", + "\n", + "Excuse me, **don't entirely forget the mathematics**, it's important sometimes ;)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import torch \n", + "import math\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part.1 The Tensor\n", + "### What is a Tensor ?\n", + "\n", + "As I told you before, the data in PyTorch is in the form of tensors.\n", + "\n", + "Concretely, a **Tensor is an object** that is similar to an array or a matrix.\\\n", + "Actually, tensors are **similar to arrays in Numpy** with a **few differences**.\n", + "\n", + "The main strength of tensors is that **they can run on GPUs** or other hardware acceleration devices.\\\n", + "You may already know this, but AI models are often accelerated using graphics cards.\n", + "\n", + "In addition, **Tensors are optimized to calculate gradients** in the gradient descent algorithm.\\\n", + "If you remember gradient descent is the algorithm that allows to adjust the weights of our neural network and thus to make it learn new things.\n", + "\n", + "In short, Tensors are used to encode the input and output data of our neural networks as well as the weights of our networks.\\\n", + "They have the advantage of being able to run on a GPU and to be optimized for gradient descent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1 - Build a Tensor from data\n", + "\n", + "In this first exercise you will have to **create a tensor** from the array ``data``\\\n", + "Be careful, the tensor must be built from the array and thus contain the same data.\n", + "> You might want to take a look to the [documentation](https://pytorch.org/docs/stable/tensors.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = [[1, 2],[3, 4]]\n", + "\n", + "#TODO : Tensorise the data with torch\n", + "tensor_data = ...\n", + "\n", + "# Print the info of the tensor\n", + "print(tensor_data)\n", + "print(\"-\"*20)\n", + "print(tensor_data.shape)\n", + "\n", + "assert torch.is_tensor(tensor_data), \"Your object is not a tensor\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2 - Build a Tensor from shape\n", + "\n", + "One of the many interesting features of PyTorch is that you can generate Tensors from shapes.\\\n", + "This can be useful when you want to initialize neural network weights for example.\n", + "\n", + "Create a Tensor **filled with 0** and of shape ``(2, 3)``." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Create a tensor of shape (2, 3) filled with zeros\n", + "shape = ...\n", + "\n", + "tensor_shape_zeros = ...\n", + "\n", + "print(tensor_shape_zeros)\n", + "print(\"-\"*20)\n", + "print(tensor_shape_zeros.shape)\n", + "\n", + "assert tensor_shape_zeros.sum().item() == 0, \"Your tensor is not filled with zeros.\"\n", + "assert list(tensor_shape_zeros.shape) == [2, 3], \"Your tensor does not have a shape (2, 3,)\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 3 - Print Tensor's attributes\n", + "\n", + "You now know how to create a tensor.\\\n", + "Create a tensor with random value on the size you want !\n", + "Display **four pieces of information** about this tensor:\n", + "\n", + "* The values itself\n", + "* Its shape\n", + "* The data type of the tensor\n", + "* The device on which the tensor is stored\n", + "\n", + "Don't hesitate to take a look at the [documentation](https://pytorch.org/docs/stable/tensor_attributes.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : initalise a tensor with randome value\n", + "tensor = ...\n", + "\n", + "############################################\n", + "#TODO print the infos of the tensor \n", + "..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4 - Use GPU if available\n", + "\n", + "If you look carefully at which device your tensor is stored on, it will be on your CPU even if you have a GPU.\n", + "And this is normal.\\\n", + "If you don't indicate that you want to store your tensor on your GPU then it will use the CPU by default.\n", + "\n", + "Look at the [documentation](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html) about this.\\\n", + "For checking if cuda is available [check this](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html#torch.cuda.is_available).\n", + "And for Mac user check if mps is [available](https://pytorch.org/docs/stable/notes/mps.html). \n", + "\n", + "Add a condition to know if a GPU is available, if it is the case move your tensor on your GPU.\n", + "Do the same for Mac user use MPS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tensor_device = torch.rand(3, 2)\n", + "\n", + "print(\"before :\", tensor_device.device)\n", + "#TODO: check if cuda or mlx is available and move the tensor to cuda or mlx if it is ; ~4 lines\n", + "...\n", + "\n", + "print(\"after :\",tensor_device.device)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 5 - Apply an arithmetic operation to a Tensor\n", + "\n", + "In the same way as for a numpy array, one can easily apply arithmetic operations on a Tensor.\n", + "\n", + "Multiply the data of the Tensor by **42**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tensor = torch.ones((3, 3), dtype=torch.float)\n", + "print(tensor)\n", + "\n", + "#TODO: multiply the tensor by 42\n", + "tensor = ...\n", + "print(tensor)\n", + "assert int(tensor.sum().item()) == 378, 'The tensor is not multiply by 42.'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 6 - Reshape a Tensor\n", + "\n", + "Again in the same way as a numpy array, you can reshape your Tensor using the ``reshape`` method.\n", + "\n", + "Turn your shape tensor ``(3, 9)`` into a shape tensor ``(3, 3, 3)``." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tensor = ...\n", + "print(tensor)\n", + "\n", + "### TODO: code here (~ 1 line)\n", + "tensor = ...\n", + "print(\"-\"*50)\n", + "print(tensor)\n", + "assert list(tensor.shape) == [3, 3, 3], \"Your tensor does not have a shape (3, 3, 3)\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: Neural Network Layers\n", + "### What is a Neural Network Layer?\n", + "\n", + "In PyTorch, **neural network layers** are the building blocks of deep learning models. \n", + "\n", + "There are two fundamental types of layers that are widely used in AI:\n", + "\n", + "1. **Linear Layers (`torch.nn.Linear`)** \n", + " A **Linear Layer** performs a simple mathematical operation:\n", + " $\n", + " y = x * W^T + b\n", + " $\n", + "\n", + " Linear layers are used to map input features to output features and are common in fully connected layers of neural networks.\n", + "\n", + "2. **Convolutional Layers (`torch.nn.Conv2d`)** \n", + " A **Convolutional Layer** is primarily used for images. Instead of processing individual features, it processes small patches of the input using filters or kernels:\n", + " - **Kernels**: Small learnable tensors that slide over the input, detecting patterns like edges or textures.\n", + " - **Key Parameters**:\n", + " - **Kernel Size**: The size of the sliding window (e.g., 3x3).\n", + " - **Stride**: The step size of the kernel.\n", + " - **Padding**: Adds extra borders around the input to control output size.\n", + "\n", + " These layers are essential in tasks like image classification, object detection, and image segmentation.\n", + "\n", + "### Layers in PyTorch\n", + "\n", + "In PyTorch, layers are defined in the `torch.nn` module where you can find the doc [here](https://pytorch.org/docs/stable/nn.html)\n", + "\n", + "In the next sections, we will explore these layers in depth with practical examples, good luck !" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import torch.nn as nn" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 2.1 Linear Layers \n", + "\n", + "\n", + "\n", + "ou might want to take a look to the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) of `linear layers`! \n", + "### Step 1 - Build a simple layer\n", + "\n", + "\n", + "In this first exercise you will have to **create a layer** with that take 3 features and output 2 features !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Create a linear layer with 3 inputs and 2 outputs\n", + "layer = ...\n", + "\n", + "print(layer)\n", + "print(\"-\"*30)\n", + "print(layer.weight)\n", + "print(\"-\"*30)\n", + "print(layer.bias)\n", + "\n", + "assert list(layer.weight.shape) == [2, 3], \"The weight of the layer is not the right shape\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2 - Apply a Linear Layer to a Tensor\n", + "\n", + "- Your goal here is to create a tensor with random value of shape (1, 4)\n", + "- Pass this tensor to a linear layer that output 2 values !\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Create a tensor with shape (1, 4) and pass it through the layer\n", + "tensor = ...\n", + "\n", + "layer = ...\n", + "\n", + "output = ...\n", + "\n", + "print(\"input tensor :\",tensor)\n", + "print(\"-\"*30)\n", + "print(\"output tensor :\",output)\n", + "\n", + "assert list(output.shape) == [1, 2], \"The output of the layer is not the right shape\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Congratulations you make your first forward operation of a neural network ! (remember the forward function you made yesterday is the same as here :)*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 3 - Chain two Linear Layers\n", + "\n", + "In the same way of above, create two layers and forward a tensor to it !\n", + "\n", + "- The input of the **first** layer size need to be 3 and the output of the **second** layer need to be 3\n", + "- Try to find out what are the value you need to put beetween the both." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Create a neural network with 2 linear layers\n", + "\n", + "tensor = ...\n", + "layer_1 = ...\n", + "layer_2 = ...\n", + "\n", + "assert layer_1.out_features == layer_2.in_features, \"The dimensions between the two layers are not compatible and need to be the same\"\n", + "\n", + "#TODO: Create a forward function that takes a tensor as input and passes it through the two layers ~3 lines\n", + "def forward(x):\n", + " ...\n", + "\n", + "output = forward(tensor)\n", + "\n", + "print(\"output tensor :\",output)\n", + "assert list(output.shape) == [1, 3], \"The output of the forward function is not the right shape\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4 - Understand Batch\n", + "\n", + "- create two tensor, one is the shape (1, 3) and the other is shape (3, 3)\n", + "- pass it to your model and compare the result, what can you find ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Create to tensor with different shapes and pass them through the forward function\n", + "tensor_shape_1 = ...\n", + "tensor_shape_3 = ...\n", + "\n", + "result_1 = forward(tensor_shape_1)\n", + "result_3 = forward(tensor_shape_3)\n", + "\n", + "print(\"Tensor with a batch of 1 :\",result_1)\n", + "print(\"-\"*30)\n", + "print(\"Tensor with a batch of 3 :\",result_3)\n", + "\n", + "assert list(result_1.shape) == [1, 3], \"The output of the forward function is not the right shape\"\n", + "assert list(result_3.shape) == [3, 3], \"The output of the forward function is not the right shape\"" + ] + }, + { + "attachments": { + "understand_batch.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the shape of the output matches the first value (in the shape) of the tensor we pass to the model, which is basically the number of examples we send in. It doesn’t have to be just one—this is where PyTorch really shines with parallelization! All the examples are processed at the same time, making everything faster and more efficient\n", + "\n", + "Here an image to understand it :\n", + "\n", + "![understand_batch.png](attachment:understand_batch.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 2.2 Convolutional layer \n", + "\n", + "here is the link to the [documentation](https://pytorch.org/docs/stable/nn.html#convolution-layers) of convolutional layer (all types, choose which you want) ! " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1 - Create a layer\n", + "your goal here is to create a Conv2d layer with 1 input and 1 output but with a kernel size of 3 !\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Initialize a Conv2d layer\n", + "conv_layer = ...\n", + "\n", + "# Print layer weights\n", + "print(\"Layer\", conv_layer)\n", + "print(\"-\"*60)\n", + "print(\"Weights:\", conv_layer.weight)\n", + "print(\"-\"*60)\n", + "\n", + "print(\"Weight Shape:\", conv_layer.weight.shape)" + ] + }, + { + "attachments": { + "convolution_gif.gif": { + "image/gif": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Interesting, yeah? Why do we have a shape like this? Let me break it down for you:\n", + "\n", + "We have a Conv2D layer with:\n", + "\n", + "•\t***in = 1***: This means there is 1 input channel. So, our input image is likely a grayscale image, which only has one color channel *(for example an input of 3 represent the RGB scale )*.\n", + "\n", + "•\t***out = 1***: This means we have 1 output channel. This is the number of filters that will be applied to the image, and here we only have 1 filter.\n", + "\n", + "•\t***kernel size = 3***: This refers to the size of the filter, which is a 3x3 grid of numbers.\n", + "\n", + "*Now, the weight tensor shape is [1, 1, 3, 3]. Here’s what each part means:*\n", + "\n", + "•\t***The first 1***: This represents the number of output channels (filters). We only have one filter, so it’s 1.\n", + "\n", + "•\t***The second 1***: This represents the number of input channels. Since we are using a grayscale image (1 channel), it’s also 1.\n", + "\n", + "•\t***The 3 (third dimension)***: This is the height of the filter, which is 3 pixels.\n", + "\n", + "•\t***The 3 (fourth dimension)***: This is the width of the filter, also 3 pixels.\n", + "\n", + "So, in simple terms, the filter is a 3x3 matrix that will scan over the input image (which also has 1 channel), and it will produce 1 output channel after applying the filter.\n", + "\n", + "\n", + "\n", + "*We can also see a stride=(1,1) that simply represent by how many the kernel need to move in height and width*\n", + "\n", + "Let's explain it with a visualisation\n", + "\n", + "![convolution_gif.gif](attachment:convolution_gif.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step - 2 pass a tensor to this layer\n", + "\n", + "Your goal here is to reproduce the gif above ! (with random value)\n", + "\n", + "create a tensor :\n", + "\n", + "- batch_size = 1\n", + "- channel = 1 \n", + "- height = 5 \n", + "- width = 5\n", + "\n", + "pass this tensor to you layer made above " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Create a tensor and pass it through the Conv2d layer\n", + "tensor = ...\n", + "\n", + "output = ...\n", + "\n", + "print(\"Output:\", output)\n", + "print(\"-\"*80)\n", + "print(\"Output Shape:\", output.shape)\n", + "\n", + "assert list(output.shape) == [1, 1, 3, 3], \"The output of the Conv2d layer is not the right shape\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step - 3 Create a layer with padding \n", + "\n", + "Your goal is now to recreate the same layer created in step 1 but with a padding = 1\n", + "\n", + "Pass the tensor created above in the new layers\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Initialize a Conv2d layer with padding and pass a tensor through it\n", + "layers_with_padding = ...\n", + "\n", + "new_tensor = ...\n", + "\n", + "output = layers_with_padding(new_tensor)\n", + "\n", + "print(\"Output with padding:\", output)\n", + "print(\"-\"*80)\n", + "print(\"Output Shape with padding:\", output.shape)\n", + "\n", + "assert list(output.shape) == [1, 1, 5, 5], \"The output of the Conv2d layer with padding is not the right shape\"" + ] + }, + { + "attachments": { + "padding_example.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the image didn't change of size !\n", + "\n", + "This is what the padding are for, he simply add a number of layer around the image *(matrix)* as the model will process and analyse also the border of the image, \n", + "here is a image to understand it \n", + "\n", + "![padding_example.png](attachment:padding_example.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4 - Chain two layers together!\n", + "\n", + "Just like you did with the linear layers, now try it with convolutional layers!\n", + "\n", + "Your goal here is to pass a 5x5 grayscale tensor through the first convolutional layer, using a kernel size of 3 and a padding of… (take a guess!). For the second layer, make sure it takes an image of the same size as the tensor, but with 3 channels (so it’s not grayscale anymore, but with a depth of 3!) and output 1 channel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO create the two conv layer \n", + "\n", + "tensor = ...\n", + "\n", + "conv_1 = ...\n", + "conv_2 = ...\n", + "\n", + "#TODO: Create a forward function that takes a tensor as input and passes it through the two layers ~3 lines\n", + "def forward(x):\n", + " ...\n", + "\n", + "output = forward(tensor)\n", + "print(\"Input tensor:\",tensor)\n", + "print(\"-\"*70)\n", + "print(\"Output tensor:\",output)\n", + "\n", + "assert conv_1.out_channels == conv_2.in_channels, \"The dimensions of the two layers are not compatible and need to be the same, tips it's 3...\"\n", + "assert list(output.shape) == [1, 1, 5, 5], \"The output of the forward function is not the right shape\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 5 - Get a 1D tensor from an image\n", + "\n", + "Imagine you want to get your *Image* as the output of your forward function coded above and flatten it into a 2D tensor.\n", + "\n", + "Why flatten a tensor, you may ask?\n", + "\n", + "It’s because the Conv2D layer outputs a 4D tensor *(batch size, channels, height, width)*, but the Linear layer expects a 2D tensor *(batch size, features)*.\n", + "\n", + "Flattening the tensor converts the 4D shape into a 2D shape, allowing it to be passed correctly to the Linear layer. This process combines all the spatial information from the convolutional layers into a single long vector of features, which can then be used by the fully connected layers for further processing.\n", + "\n", + "Take a look at the PyTorch [documentation](https://pytorch.org/docs/stable/nn.html) and find the function that can flatten the tensor for you!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO flatten the output tensor\n", + "flatten_output = ...\n", + "\n", + "print(\"Output tensor:\", output)\n", + "print(\"-\"*70)\n", + "print(\"Flatten output tensor:\",flatten_output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 6 - Combine convolutional with linear Layers !\n", + "\n", + "Your goal here is to start with an 5*5 *image* and at the end ressort with an tensor with a shape of (1, 2) ! \n", + "\n", + "Add as much layer you want between, for create a real neural network ! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : create a tensor with shape (1, 1, 5, 5)\n", + "\n", + "input_tensor = ...\n", + "\n", + "#TODO : create conv(s) layer(s)\n", + "conv_1 = ...\n", + "\n", + "#TODO : create a flatten layer\n", + "flatten = ...\n", + "\n", + "#TODO : create linear(s) layer(s)\n", + "linear = ...\n", + "\n", + "#TODO: Create a forward function that takes a tensor as input and passes it through the layers ~4 lines\n", + "def forward(x):\n", + " ...\n", + "\n", + "output = forward(input_tensor)\n", + "\n", + "print(\"Input tensor:\",input_tensor)\n", + "print(\"-\"*70)\n", + "print(\"Output tensor:\",output)\n", + "\n", + "assert list(output.shape) == [1, 2], \"The output of the forward function is not the right shape\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done! Now that you’ve made it through all of that, let’s take a little break with some simple functions that will be very useful for you as you create complex neural networks!\n", + "\n", + "___\n", + "\n", + "Yesterday, you discovered many algorithms (depending on how far you got, but don’t worry, we’re not diving into the theory *you can always go back to yesterday to refresh your memory!*)\n", + "\n", + "All the algorithms used for machine learning, especially in deep learning and neural networks, are already built into PyTorch! (Pretty exciting, right? What a funny joke, all that theory just to call it with torch… but the math is important!)\n", + "\n", + "Let’s start with the loss functions in PyTorch, and here’s a simple documentation to get you started:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step - 1 Initialise loss function \n", + "\n", + "You need to initialise 4 values ***(in tensor ! remember the first exercice of today)*** here :\n", + "\n", + "- a prediction of a linear regression model and it's actual target\n", + "- a predction of a logistic regression model and alos it's actual target\n", + "\n", + "let's start with linear regression (li-r) model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : initalise the tensor values for li-r model\n", + "lir_y_pred = ...\n", + "lir_y = ...\n", + "\n", + "#TODO : compute the mean squared error\n", + "lir_mse = ...\n", + "\n", + "output = lir_mse(...)\n", + "\n", + "print(\"MSE:\",output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "and now logistic regression (lo-r) model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : initalise the tensor values for lo-r model\n", + "lor_y_pred = ...\n", + "lor_y = ...\n", + "\n", + "#TODO : compute the binary cross entropy\n", + "lor_bce = ...\n", + "\n", + "output = lor_bce(...)\n", + "\n", + "print(\"BCE:\",output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the gradient descend is already implemented in torch you just need to calcul your loss and *backward* it !\n", + "here's the [documentation](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step - 2 Activations functions\n", + "\n", + "As the same of loss function, activation function are also available in torch ! let's create ....\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : initialise the tensor values for the model\n", + "model_layers_relu = ...\n", + "model_layers_softmax = ...\n", + "\n", + "#TODO : compute the relu and softmax\n", + "relu = ...\n", + "softmax = ...\n", + "\n", + "output_relu = relu(model_layers_relu)\n", + "output_softmax = softmax(model_layers_softmax)\n", + "\n", + "print(\"ReLU:\",output_relu)\n", + "print(\"-\"*70)\n", + "print(\"Softmax:\",output_softmax)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "Well done guys, you have know the base concept to create a model with torch !\n", + "dive into the vision parts to create your first linear model ! \n", + "here is the [notebook](<../2 - Vision-Models/2.1 - Minst/Minst.ipynb>)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day04/1 - Torch/images/convolution_gif.gif b/AI/Day04/1 - Torch/images/convolution_gif.gif new file mode 100644 index 0000000..a96670a Binary files /dev/null and b/AI/Day04/1 - Torch/images/convolution_gif.gif differ diff --git a/AI/Day04/1 - Torch/images/padding_example.png b/AI/Day04/1 - Torch/images/padding_example.png new file mode 100644 index 0000000..3d56bc5 Binary files /dev/null and b/AI/Day04/1 - Torch/images/padding_example.png differ diff --git a/AI/Day04/1 - Torch/images/tasse_convolutional.gif b/AI/Day04/1 - Torch/images/tasse_convolutional.gif new file mode 100644 index 0000000..985b5cc Binary files /dev/null and b/AI/Day04/1 - Torch/images/tasse_convolutional.gif differ diff --git a/AI/Day04/1 - Torch/images/torch_logo.png b/AI/Day04/1 - Torch/images/torch_logo.png new file mode 100644 index 0000000..a30307e Binary files /dev/null and b/AI/Day04/1 - Torch/images/torch_logo.png differ diff --git a/AI/Day04/1 - Torch/images/understand_batch.png b/AI/Day04/1 - Torch/images/understand_batch.png new file mode 100644 index 0000000..9312247 Binary files /dev/null and b/AI/Day04/1 - Torch/images/understand_batch.png differ diff --git a/AI/Day04/2 - Vision-Models/2.1 - Minst/Images/cnn_background.jpeg b/AI/Day04/2 - Vision-Models/2.1 - Minst/Images/cnn_background.jpeg new file mode 100644 index 0000000..4706cdf Binary files /dev/null and b/AI/Day04/2 - Vision-Models/2.1 - Minst/Images/cnn_background.jpeg differ diff --git a/AI/Day04/2 - Vision-Models/2.1 - Minst/Minst.ipynb b/AI/Day04/2 - Vision-Models/2.1 - Minst/Minst.ipynb new file mode 100644 index 0000000..38e1036 --- /dev/null +++ b/AI/Day04/2 - Vision-Models/2.1 - Minst/Minst.ipynb @@ -0,0 +1,538 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 3: Deep Learning\n", + " - ### Module 2: Convolutional Neural Network\n", + "-----------\n", + "\n", + "## Minst" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done, you've arrived here ! You now understand key concepts of neural networks and how they are trained, but you haven't really created one yet...\n", + "Don't worry this task will guide you in recreating a neural network trained to detect any handwritten digit on a 28 by 28 pixel image !\n", + "\n", + "Your will start by setup the dataset, your model and at the end, play with it ! " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "#Just import the necessary libraries\n", + "\n", + "import time\n", + "import torch\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "\n", + "#For the model don't forget\n", + "import torch.nn as nn\n", + "import torch.optim as optim" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part - 1 Prepare the data \n", + "\n", + "before actually create a neural network we need to preparate our data that we will fit to your model,\n", + "\n", + "remember ***THE MOST important in machine learning is the quality of the data*** and not really the model....\n", + "\n", + "your goal here is to specify how we want the data, this can be process by initialise a data and transform it in a [tensor](https://pytorch.org/vision/main/generated/torchvision.transforms.ToTensor.html) and normalize it if you want. you can check the doc of transform [here](https://pytorch.org/vision/0.9/transforms.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: define the transforms compose\n", + "transform = ...\n", + "\n", + "train_set = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "eval_set = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "print(f\"Len train dataset : {len(train_set)}\")\n", + "print(f\"Len test dataset : {len(eval_set)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You will say why whe created two dataset ? \n", + "\n", + "It's because one will be for the training of the model and the other for evaluate this one by passing data he never seen, to see if the model didn't overfit the data.\n", + "\n", + "To understand what's inside this code you can try below to visualise some of the examples !\n", + "\n", + "***Don't hesitate to change the NUMBER_OF_ELEMENTS enum to see mutliples examples or no***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualisation of some element of the dataset you can change the number if you want\n", + "NUMBER_OF_ELEMENTS = 4\n", + "\n", + "def imshow(img):\n", + " # img = img * 0.5 + 0.5 # Denormalisation if you have normalised the data\n", + " npimg = img.numpy()\n", + " plt.imshow(np.transpose(npimg, (1, 2, 0)), cmap='gray')\n", + " plt.axis('off')\n", + " plt.show()\n", + "\n", + "train_loader_vis = torch.utils.data.DataLoader(train_set, batch_size=NUMBER_OF_ELEMENTS, shuffle=True)\n", + "\n", + "# Random image \n", + "dataiter = iter(train_loader_vis)\n", + "images, labels = next(dataiter)\n", + "\n", + "imshow(torchvision.utils.make_grid(images))\n", + "print('Labels :', ' '.join(f'{labels[j].item()};' for j in range(NUMBER_OF_ELEMENTS)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also look at different attributes like the number of images in the dataset, the size of each image or the label of an image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image, label = train_set[0]\n", + "\n", + "print(\"image :\", image) # pixels value if you want to see the matrix\n", + "print(\"-\"*60)\n", + "print(\"image shape :\", image.shape) # pixels value\n", + "print(\"label :\", label) # Number represented in the image \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, we have images **28 pixels high and 28 pixels wide**, with **one channel** (grayscale !).\n", + "\n", + "These images represent a number from 0 to 9, we have **10 different labels** (or 10 different possible output).\\\n", + "The first picture represents a 5, therefore its label is 5." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Batch-Size\n", + "\n", + "Did you remember when we talk about batch and parallelization of multiple example with torch ? This is very important here !\n", + "\n", + "**60,000** is a lot of images to process one by one, to make it easier for our model to process this data while training we are going to use ``batch_size``.\n", + "\n", + "for one who forget , ``batch_size`` is a hyperparameter that defines the number of samples to work through before updating the internal model parameters. In other words, before calculating the error and apply backpropagation after each image, if our batch size is 64 we will go through 64 images before doing it. **This improves the learning of our AI** by **applying the backpropagation on the error average.**\n", + "\n", + "As in the previous notebook we will use a [**``dataloader``**](https://pytorch.org/docs/stable/data.html), this time we don't need to redefine a ``Dataset`` class since we are using a ``builtin`` dataset in ``torchvision``.\n", + "\n", + "Remember to specify that you use the ``train_set`` and you want a ``batch_size`` of ``64`` and also ``shuffle`` it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Define the batch size\n", + "BATCH_SIZE = ...\n", + "\n", + "train_loader = ...\n", + "\n", + "assert len(train_loader) == 938, \"Your train loader is not well implemented, remember that the batch size is 64\"\n", + "\n", + "batch = next(iter(train_loader)) # obtain the first batch\n", + "images, labels = batch\n", + "print(\"image shape :\", images.shape)\n", + "print(\"labels shape :\", labels.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now have `938` lots containing `64` images each (and their equivalent labels).\\\n", + "This will **drastically decrease our training time** because with one backward propagation, 64 images are processed.\n", + "\n", + "\n", + "> Pytorch is built to be used with batch, it is thus quite simple to implement it in our code. \n", + "\n", + "*you can try after to change your batch and see the difference in the learning (remove the assert for test it)* !" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: Also load the test set with the same batch_size...\n", + "\n", + "eval_loader = ...\n", + "\n", + "assert len(eval_loader) == 157, \"Your eval loader is not well implemented\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Model!\n", + "\n", + "And your moment has arrived!\n", + "\n", + "I’m sure you’ve been eagerly anticipating this step, and now you’re ready to build your very first real neural network, complete with a more complex architecture.\n", + "\n", + "A quick tip for working with PyTorch: today’s task is a classification problem, as we’ve defined specific output labels. For this, we’ll be using the **[cross-entropy](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)** loss function. (Remember, yesterday you used the **[binary cross-entropy](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html)** loss with logistic regression, since the output was restricted to just 0 or 1.)\n", + "\n", + "*Don't hesistate to jump at the end of the torch introduction as helping you for initialize the model and train it !*\n", + "\n", + "IF you encounter difficulties to create your model, at the end of this notebook there is a pseudo code of the architecture as to help you to create the model, but try to do it alone ! (with everything you see before)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Define the learning rate\n", + "LEARNING_RATE = ...\n", + "\n", + "\n", + "class MNISTModel(nn.Module):\n", + " def __init__(self):\n", + " super(MNISTModel, self).__init__()\n", + " self.flatten = ...\n", + " self.fc1 = ... \n", + " #TODO : add other layers if you want\n", + " ...\n", + "\n", + " self.loss = ... # Loss function cross entropy\n", + " self.optimizer = optim.Adam(self.parameters(), lr=LEARNING_RATE) # Optimizer Adam\n", + " #TODO : add other optimizers if you want\n", + " self.relu = ... # Activation function\n", + " ...\n", + " \n", + " # Device choice \n", + " if torch.cuda.is_available():\n", + " self.device = torch.device('cuda')\n", + " elif torch.backends.mps.is_available():\n", + " self.device = torch.device('mps')\n", + " else:\n", + " self.device = torch.device('cpu')\n", + " print(f\"Device : {self.device}\")\n", + " self.to(self.device)\n", + "\n", + " def forward(self, x):\n", + " #TODO : Define the forward pass\n", + " x = ... \n", + " ...\n", + " return ...\n", + "\n", + "\n", + " def train_model(self, epochs, train_loader):\n", + " self.train() # Training mode\n", + "\n", + " for epoch in range(epochs):\n", + " start_time = time.time() # Start time of the epoch\n", + " running_loss = 0.0\n", + " total_batches = 0\n", + "\n", + " for i, data in enumerate(train_loader): # Enumerate the data, all the dataset\n", + " inputs, labels = data\n", + " inputs, labels = inputs.to(self.device), labels.to(self.device)\n", + " \n", + " #TODO Compute the training part ~ 5 lines\n", + " ...\n", + " ###################################\n", + "\n", + " running_loss += loss.item()\n", + " total_batches += 1 # just help for print \n", + "\n", + " # print every 8 mini-batches\n", + " if (i + 1) % 8 == 0 or (i + 1) == len(train_loader):\n", + " print(f\"\\rEpochs {epoch + 1}/{epochs} | Lot {i + 1}/{len(train_loader)} | Loss : {loss.item():.4f}\", end='')\n", + "\n", + " \n", + " avg_loss = running_loss / len(train_loader)\n", + " epoch_time = time.time() - start_time\n", + "\n", + " print(\"\\n\")\n", + " print(\"-\" * 60)\n", + " print(f\"Epochs {epoch + 1}/{epochs} finish | Average Loss : {avg_loss:.4f} | Time : {epoch_time:.2f} seconds\")\n", + " print(\"-\" * 60)\n", + "\n", + " # change the model_path if you want\n", + " model_path = \"mnist_model.pth\"\n", + " print('Training finished, saving model to :', model_path)\n", + " torch.save(self.state_dict(), model_path)\n", + "\n", + "\n", + " def eval_model(self, test_loader):\n", + " self.eval() # Evaluation mode\n", + " correct = 0\n", + " total = 0\n", + " with torch.no_grad():\n", + " for data in test_loader:\n", + " images, labels = data\n", + " images, labels = images.to(self.device), labels.to(self.device)\n", + " outputs = self(images)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + " print(f'Accuracy of the model on {total} images is : {100 * correct / total:.2f}%')\n", + "\n", + " def load_weights(self, model_path):\n", + " self.load_state_dict(torch.load(model_path, weights_only=True, map_location=self.device))\n", + " self.eval()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done ! you need now to initialise your model by simple call your python class, \n", + "\n", + "It permits that if you want to restart the training with random weights, you can restart this cell. Otherwise, the training if (you restart it) will continue from the **`last loss value`** and the **`last weight`**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_model = MNISTModel()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO: define number of epochs\n", + "EPOCHS = ...\n", + "\n", + "my_model.train_model(EPOCHS, train_loader)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "you can now test your model by simply call the eval function !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_model.eval_model(eval_loader)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you’d like to retrain and check for better results, simply re-run the training cell or initialize a new model to start fresh!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Play with your model !\n", + "\n", + "Now it's time to test your own model! **Please paste your model architecture** (*`__init__`* and *`forward`* methods) into the file [model.py](model.py), and run the following command in the terminal:\n", + "\n", + "```bash\n", + "python app.py\n", + "```\n", + "after this break, you have two option : \n", + "\n", + "- ***2.2 - Cifar*** -> try to implemente an really complex architecture called VAE-GAN for another task \n", + "\n", + "- ***3.1 - My torch*** -> try to recreate some function of torch, to really understand how this is work (it my be help you for creating a VAE-GAN architecture :))\n", + "\n", + "choose one ! *(you can do both also if you finish in advance)*\n", + "\n", + "---\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO : Define the learning rate\n", + "LEARNING_RATE = ...\n", + "\n", + "\n", + "class MNISTModel(nn.Module):\n", + " def __init__(self):\n", + " super(MNISTModel, self).__init__()\n", + " self.flatten = ... # Flatten the data\n", + " self.fc1 = ... # Fully connected layer from 28**28 to 128\n", + " self.fc2 = ... # Fully connected layer from 128 to 64\n", + " self.fc3 = ... # Fully connected layer from 64 to 10\n", + "\n", + " self.loss = ... # Loss function cross entropy\n", + " self.optimizer = optim.Adam(self.parameters(), lr=LEARNING_RATE) # Optimizer Adam\n", + " self.relu = ... # Activation function\n", + " \n", + " # Device choice \n", + " if torch.cuda.is_available():\n", + " self.device = torch.device('cuda')\n", + " elif torch.backends.mps.is_available():\n", + " self.device = torch.device('mps')\n", + " else:\n", + " self.device = torch.device('cpu')\n", + " print(f\"Device : {self.device}\")\n", + " self.to(self.device)\n", + "\n", + " def forward(self, x):\n", + "\n", + " x = ... # Flatten the data\n", + "\n", + " ... # Compute your self.fc1\n", + " ... # Activation function\n", + "\n", + " ... # Compute your self.fc2\n", + " ... # Activation function\n", + " ... # Compute your self.fc3\n", + "\n", + " return ...\n", + "\n", + "\n", + " def train_model(self, epochs, train_loader):\n", + " self.train() # Training mode\n", + "\n", + " for epoch in range(epochs):\n", + " start_time = time.time() # Start time of the epoch\n", + " running_loss = 0.0\n", + " total_batches = 0\n", + "\n", + " for i, data in enumerate(train_loader): # Enumerate the data, all the dataset\n", + " inputs, labels = data\n", + " inputs, labels = inputs.to(self.device), labels.to(self.device)\n", + " \n", + " # Gradient to zero\n", + " ...\n", + "\n", + " # Forward pass\n", + " outputs = ...\n", + "\n", + " # Loss calculation\n", + " loss = ...\n", + "\n", + " # Backward pass\n", + " ...\n", + "\n", + " # Optimisation step\n", + " ...\n", + "\n", + " running_loss += loss.item()\n", + " total_batches += 1 # just help for print \n", + "\n", + " # print every 8 mini-batches\n", + " if (i + 1) % 8 == 0 or (i + 1) == len(train_loader):\n", + " print(f\"\\rEpochs {epoch + 1}/{epochs} | Lot {i + 1}/{len(train_loader)} | Loss : {loss.item():.4f}\", end='')\n", + "\n", + " \n", + " avg_loss = running_loss / len(train_loader)\n", + " epoch_time = time.time() - start_time\n", + "\n", + " print(\"\\n\")\n", + " print(\"-\" * 60)\n", + " print(f\"Epochs {epoch + 1}/{epochs} finish | Average Loss : {avg_loss:.4f} | Time : {epoch_time:.2f} seconds\")\n", + " print(\"-\" * 60)\n", + "\n", + " # change the model_path if you want\n", + " model_path = \"mnist_model.pth\"\n", + " print('Training finished, saving model to :', model_path)\n", + " torch.save(self.state_dict(), model_path)\n", + "\n", + "\n", + " def eval_model(self, test_loader):\n", + " self.eval() # Evaluation mode\n", + " correct = 0\n", + " total = 0\n", + " with torch.no_grad():\n", + " for data in test_loader:\n", + " images, labels = data\n", + " images, labels = images.to(self.device), labels.to(self.device)\n", + " outputs = self(images)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + " print(f'Accuracy of the model on {total} images is : {100 * correct / total:.2f}%')\n", + "\n", + " def load_weights(self, model_path):\n", + " self.load_state_dict(torch.load(model_path, weights_only=True, map_location=self.device))\n", + " self.eval()\n", + " \n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day04/2 - Vision-Models/2.1 - Minst/app.py b/AI/Day04/2 - Vision-Models/2.1 - Minst/app.py new file mode 100644 index 0000000..3022fd0 --- /dev/null +++ b/AI/Day04/2 - Vision-Models/2.1 - Minst/app.py @@ -0,0 +1,45 @@ +from flask import Flask, request, jsonify, render_template +import numpy as np +import torch +from torchvision import transforms +from model import load_model + +model = load_model('mnist_model.pth') +app = Flask(__name__) + +transform = transforms.Compose([ + transforms.ToTensor(), + # transforms.Normalize((0.5,), (0.5,)) # optionnal normalization +]) + +@app.route('/') +def index(): + return render_template('index.html') + +@app.route('/predict', methods=['POST']) +def predict(): + try: + data = request.json['image'] + image_array = np.array(data, dtype=np.float32).reshape(28, 28) + + image_array = 1 - image_array + + image_tensor = transform(image_array).unsqueeze(0) + + with torch.no_grad(): + output = model(image_tensor) + probabilities = torch.nn.functional.softmax(output, dim=1) + _, predicted = torch.max(probabilities, 1) + predicted_digit = predicted.item() + probabilities_list = probabilities[0].cpu().numpy().tolist() + + return jsonify({'prediction': predicted_digit, 'probabilities': probabilities_list}) + + except Exception as e: + print("Erreur dans la route /predict :", e) + return jsonify({'error': 'Une erreur est survenue lors de la prédiction'}), 500 + +if __name__ == '__main__': + app.run(debug=True, port=5003) + +# Try to change the port if it's already in use \ No newline at end of file diff --git a/AI/Day04/2 - Vision-Models/2.1 - Minst/model.py b/AI/Day04/2 - Vision-Models/2.1 - Minst/model.py new file mode 100644 index 0000000..b44335b --- /dev/null +++ b/AI/Day04/2 - Vision-Models/2.1 - Minst/model.py @@ -0,0 +1,25 @@ +import os +import torch +import torch.nn as nn +# import torch.nn.functional as F +# import torch.optim as optim + +LEARNING_RATE = 0.001 + +class MNISTModel(nn.Module): + def __init__(self): + super(MNISTModel, self).__init__() + ... + + def forward(self, x): + ... + + +# change the model_path to correspond to your weights file +def load_model(model_path='mnist_model.pth'): + model = MNISTModel() + + model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu'), weights_only=True)) + assert os.path.isfile(model_path), f"Model file not found: {model_path} please change the model_path parameter in the model.py file" + model.eval() + return model \ No newline at end of file diff --git a/AI/Day04/2 - Vision-Models/2.1 - Minst/templates/index.html b/AI/Day04/2 - Vision-Models/2.1 - Minst/templates/index.html new file mode 100644 index 0000000..820f32a --- /dev/null +++ b/AI/Day04/2 - Vision-Models/2.1 - Minst/templates/index.html @@ -0,0 +1,211 @@ + + + + + PoC AI Pool 2025 - Minst + + + +

Reconnaissance de chiffres MNIST

+ + +
+ + +
+ +

+
+ + + + \ No newline at end of file diff --git a/AI/Day04/2 - Vision-Models/2.2 - Cifar/Cifar.ipynb b/AI/Day04/2 - Vision-Models/2.2 - Cifar/Cifar.ipynb new file mode 100644 index 0000000..fc42b92 --- /dev/null +++ b/AI/Day04/2 - Vision-Models/2.2 - Cifar/Cifar.ipynb @@ -0,0 +1,325 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 3: Deep Learning\n", + " - ### Module 2: Convolutional Neural Network\n", + "-----------\n", + "\n", + "## Cifar" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's level up the difficulty , you just setup a linear model for classification, but as you know even if you process matricess bloc, it's even a image, so convolutional layers should performed well ! \n", + "\n", + "For this task your goal is to create a convolutional model in the dataset of cifar a dataset that represent " + ] + }, + { + "attachments": { + "cnn_background.jpeg": { + "image/jpeg": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### What is a Convolutional Neural Network\n", + "\n", + "A convolutional neural network is a **deep learning algorithm** that takes visual data as input.\\\n", + "Its architecture is inspired by the organization of neurons and the visual cortex in the human brain.\n", + "\n", + "![cnn_background.jpeg](attachment:cnn_background.jpeg)\n", + "\n", + "> I advise you to look at this [article](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53) to really understand what is convolution before leading in the subject." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "import torch\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import torch.nn as nn\n", + "import torch.optim as optim" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part - 1 Prepare the data \n", + "\n", + "Like previous let's load our data, but know do it alone !\n", + "Your goal is to load the data of the cifar10 dataset. \n", + "\n", + "your goal here is to specify how we want the data, this can be process by initialise a data and transform it in a [tensor](https://pytorch.org/vision/main/generated/torchvision.transforms.ToTensor.html) and normalize it if you want. you can check the doc of transform [here](https://pytorch.org/vision/0.9/transforms.html).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO Implement the dataset\n", + "transform = ...\n", + "\n", + "train_set = ...\n", + "eval_set = ...\n", + "\n", + "print(f\"Len of train_set: {len(train_set)}\")\n", + "print(f\"Len of train_loader: {len(eval_set)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "and know let's visualise our dataset what's inside ? " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "NUMBER_OF_ELEMENTS = 5\n", + "\n", + "def imshow(img):\n", + " img = img / 2 + 0.5\n", + " npimg = img.numpy()\n", + " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n", + " plt.show()\n", + "\n", + "train_loader_vis = torch.utils.data.DataLoader(train_set, batch_size=NUMBER_OF_ELEMENTS, shuffle=True, num_workers=2)\n", + "\n", + "dataiter = iter(train_loader_vis)\n", + "images, labels = next(dataiter)\n", + "\n", + "imshow(torchvision.utils.make_grid(images))\n", + "print(' '.join('%5s' % labels[j] for j in range(NUMBER_OF_ELEMENTS)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay so like you see this dataset represent some picture of random object and the goal of your model is to predict what is it on your image ! \n", + "\n", + "Firstly print the info of the dataset, like the shape of an image and all the labels present in the dataset ! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO Print all the labels present in the dataset\n", + "labels = ...\n", + "print(\"Labels present in the dataset:\", labels)\n", + "\n", + "shapes_dataset= ...\n", + "print(\"Shape of the inputes:\", shapes_dataset.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO Implement the dataloader\n", + "BATCH_SIZE =...\n", + "\n", + "train_loader = ...\n", + "\n", + "eval_loader = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Batch_Size \n", + "\n", + "Implement the batch size alone !\n", + "\n", + "Try to do the same as you did for the mnist dataset and choss whatever you want for the batch_size, thing about perf/speed" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO Implement the model\n", + "LEARNING_RATE = 0.001\n", + "\n", + "class CIFARModel(nn.Module):\n", + " def __init__(self):\n", + " super(CIFARModel, self).__init__()\n", + " #TODO Implement the model\n", + " self.conv1 = ...\n", + " #TODO Add other layers, what you want\n", + " ...\n", + " \n", + "\n", + " if torch.cuda.is_available():\n", + " self.device = torch.device('cuda')\n", + " elif torch.backends.mps.is_available():\n", + " self.device = torch.device('mps')\n", + " else:\n", + " self.device = torch.device('cpu')\n", + " print(f\"Device : {self.device}\")\n", + " self.to(self.device)\n", + "\n", + " def forward(self, x):\n", + " #TODO Implement the forward pass\n", + " ...\n", + "\n", + "\n", + " def train_model(self, epochs, train_loader):\n", + " self.train()\n", + "\n", + " for epoch in range(epochs):\n", + " start_time = time.time() # Start time of the epoch\n", + " running_loss = 0.0\n", + " total_batches = 0\n", + "\n", + " for i, data in enumerate(train_loader): # Enumerate the data, all the dataset\n", + " inputs, labels = data\n", + " inputs, labels = inputs.to(self.device), labels.to(self.device)\n", + " \n", + " #TODO Implement the training loop\n", + " ...\n", + "\n", + " ##############################################\n", + "\n", + " running_loss += loss.item()\n", + " total_batches += 1 # just help for print \n", + "\n", + " # print every 8 mini-batches\n", + " if (i + 1) % 8 == 0 or (i + 1) == len(train_loader):\n", + " print(f\"\\rEpochs {epoch + 1}/{epochs} | Lot {i + 1}/{len(train_loader)} | Loss : {loss.item():.4f}\", end='')\n", + "\n", + " \n", + " avg_loss = running_loss / len(train_loader)\n", + " epoch_time = time.time() - start_time\n", + "\n", + " print(\"\\n\")\n", + " print(\"-\" * 60)\n", + " print(f\"Epochs {epoch + 1}/{epochs} finish | Average Loss : {avg_loss:.4f} | Time : {epoch_time:.2f} seconds\")\n", + " print(\"-\" * 60)\n", + " \n", + "\n", + " # change the model_path if you want\n", + " model_path = \"cifar_model.pth\"\n", + " print('Training finished, saving model to :', model_path)\n", + " torch.save(self.state_dict(), model_path)\n", + "\n", + "\n", + " def eval_model(self, test_loader):\n", + " self.eval() # Evaluation mode\n", + " correct = 0\n", + " total = 0\n", + " with torch.no_grad():\n", + " for data in test_loader:\n", + " images, labels = data\n", + " images, labels = images.to(self.device), labels.to(self.device)\n", + " outputs = self(images)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + " print(f'Accuracy of the model on {total} images is : {100 * correct / total:.2f}%')\n", + "\n", + " def load_weights(self, model_path):\n", + " self.load_state_dict(torch.load(model_path, weights_only=True, map_location=self.device))\n", + " self.eval()\n", + " \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_model = CIFARModel()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#TODO Implement the number of epochs\n", + "EPOCHS = ...\n", + "\n", + "my_model.train_model(EPOCHS, train_loader)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_model.eval_model(eval_loader)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "Bravo ! amazing you are here, you can know try to make your model better ! Or to create your torch if it's not already did :) !" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day04/2 - Vision-Models/2.2 - Cifar/Images/cnn_background.jpeg b/AI/Day04/2 - Vision-Models/2.2 - Cifar/Images/cnn_background.jpeg new file mode 100644 index 0000000..4706cdf Binary files /dev/null and b/AI/Day04/2 - Vision-Models/2.2 - Cifar/Images/cnn_background.jpeg differ diff --git a/AI/Day04/3 - MyTorch/MynnTorch.ipynb b/AI/Day04/3 - MyTorch/MynnTorch.ipynb new file mode 100644 index 0000000..a5a448a --- /dev/null +++ b/AI/Day04/3 - MyTorch/MynnTorch.ipynb @@ -0,0 +1,452 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d7d966e65a955a08", + "metadata": { + "collapsed": false + }, + "source": [ + "# ~ PoC AI Pool 2025 ~\n", + "- ## Day 3: Deep Learning\n", + " - ### Module 3: My_torch\n", + "-----------\n", + "\n", + "## My_torch\n", + "\n", + "### Utils-Methode\n", + "\n", + "After having a little introduction to the PyTorch library, we will now implement some of the methods in the library, with the objective of understanding how they work, how they are implemented and finally getting a better understanding of the library.\n", + "\n", + "### Let's Start\n", + "\n", + "First things first, we will implement easy methods such as ReLU, LeakyReLU, and Sigmoid.\n", + "\n", + "Then we will implement the class MyLinear, which is a simple linear (technically, affine) transformation, and the class BatchNorm2d, which is a simple batch normalization.\n", + "\n", + "Finally, we will implement the class MyConv2d, which is a simple 2D convolution, and the class ConvTranspose2d, which is a simple 2D transposed convolution." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "eb2624afca073dab", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "import torch as t\n", + "import torch.nn as nn\n", + "from typing import Tuple, Union, List\n", + "import torchvision.transforms as transforms\n", + "from PIL import Image\n", + "\n", + "\n", + "IntOrPair = Union[int, Tuple[int, int]]\n", + "Pair = Tuple[int, int]\n", + "\n", + "t1 = t.tensor([-1, -2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=t.float32)\n", + "t2 = t.tensor([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], dtype=t.float32)\n", + "\n", + "def force_pair(v: IntOrPair) -> Pair:\n", + " if isinstance(v, tuple):\n", + " if len(v) != 2:\n", + " raise ValueError(v)\n", + " return int(v[0]), int(v[1])\n", + " elif isinstance(v, int):\n", + " return (v, v)\n", + " raise ValueError(v)\n", + "image = Image.new('RGB', (256, 256), (255, 255, 255))\n", + "transform = transforms.ToTensor()\n", + "tensor_image = transform(image)\n", + "tensor_image = tensor_image.unsqueeze(0)\n", + "weights = t.tensor([[[[1, 0, -1], [1, 0, -1], [1, 0, -1]], [[1, 0, -1], [1, 0, -1], [1, 0, -1]], [[1, 0, -1], [1, 0, -1], [1, 0, -1]]]], dtype=t.float32)" + ] + }, + { + "attachments": { + "ReLU.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "118f34cc612f14e7", + "metadata": { + "collapsed": false + }, + "source": [ + "## ReLU\n", + "The fonction ReLu looks like this :\n", + "\n", + "![ReLU.png](attachment:ReLU.png)\n", + "\n", + "The principe of ReLU is to return the maximum between 0 and the input value. Why ? Because in AI, we want 3 things, the first one is to have a non-linear function, and the second one is to prevent the vanishing gradient problem(when the gradient is too small, the network doesn't learn anything), and the last one is to have more efficient computation.\n", + "\n", + "To have more information about ReLU and it's implementation, you can check this [link](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "cef06308ce6aa517", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "#TODO: implement a ReLU fonction\n", + "def ReLU(x : t.Tensor) -> t.Tensor:\n", + " ...\n", + "assert ReLU(t1).equal(t.tensor([0, 0, 3, 4, 5, 6, 7, 8, 9, 10])), \"Error in ReLU\"" + ] + }, + { + "attachments": { + "LeakyReLU.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "4573ff484918ae02", + "metadata": { + "collapsed": false + }, + "source": [ + "## LeakyReLU\n", + "The fonction LeakyReLU looks like this :\n", + "\n", + "![LeakyReLU.png](attachment:LeakyReLU.png)\n", + "\n", + "The principe of LeakyReLU is to return the maximum between 0 and the input value, but with a small slope for the negative values. Why ? Because here, we want to be sure that neurons won't die (a dead neuron is a neuron that always returns the same value, and so it doesn't learn anything) and we want to use the benefits of the ReLU function and at the same time, we want to learn a bit from the negative values.\n", + "\n", + "To have more information about ReLU and it's implementation, you can check this [link](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e80496989234e575", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# TODO: implement a LeakyReLU fonction\n", + "def MyLeakyReLU(x : t.Tensor, negative_slope : float = 0.01) -> t.Tensor:\n", + " ...\n", + "assert MyLeakyReLU(t1).equal(t.tensor([-0.01, -0.02, 3, 4, 5, 6, 7, 8, 9, 10])), \"Error in LeakyReLU\"" + ] + }, + { + "attachments": { + "Sigmoid.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "b10fd97b4c7192ea", + "metadata": { + "collapsed": false + }, + "source": [ + "## Sigmoid\n", + "The fonction Sigmoid looks like this (if you forget):\n", + "\n", + "![Sigmoid.png](attachment:Sigmoid.png)\n", + "\n", + "The principe of Sigmoid is to return a value between 0 and 1 to transform the input value into a probability and to stabilize the output of the network.\n", + "\n", + "To have more information about Sigmoid and it's implementation, you can check this [link](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5b779dcb2912ac22", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "#TODO: implement a Sigmoid fonction\n", + "def MySigmoid(x : t.Tensor) -> t.Tensor:\n", + " ...\n", + "assert MySigmoid(t1).round(decimals=4).equal(t.tensor([0.2689, 0.1192, 0.9526, 0.9820, 0.9933, 0.9975, 0.9991, 0.9997, 0.9999, 1.0000])), \"Error in Sigmoid\"" + ] + }, + { + "attachments": { + "Tanh.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "faa40a8c953b5aea", + "metadata": { + "collapsed": false + }, + "source": [ + "## Tanh\n", + "The fonction Tanh looks like this :\n", + "\n", + "![Tanh.png](attachment:Tanh.png)\n", + "\n", + "The principe of Tanh is to return a value between -1 and 1 it's has the same propriety as the Sigmoid function but with a range between -1 and 1.\n", + "\n", + "To have more information about Tanh and it's implementation, you can check this [link](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ac87645ffe1392d9", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "#TODO: implement a Tanh fonction\n", + "def Tanh(x : t.Tensor) -> t.Tensor:\n", + " ...\n", + "assert Tanh(t1).round(decimals=4).equal(t.tensor([-0.7616, -0.9640, 0.9951, 0.9993, 0.9999, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])), \"Error in Tanh\"" + ] + }, + { + "cell_type": "markdown", + "id": "5994758437f4d5df", + "metadata": { + "collapsed": false + }, + "source": [ + "## MyLinear\n", + "The class MyLinear is a simple linear (technically, affine) transformation.\n", + "\n", + "It's a useful tool to modulate the simple relation between input and output characteristic.\n", + "\n", + "Here you will need a bit more help :\n", + "the in_features are the dimensions of the entry vector\n", + "\n", + "the out_features are the dimensions of the return vector.\n", + "\n", + "the bias is the b that we add in the form of y = w . x + b where w is the weight and x is the input.\n", + "\n", + "In that exercise you will just have to implement the forward method. and you have to use the einsum method from PyTorch. and to be sure that the bias is added only if it's not None." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "9cd4167e85eda019", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "class MyLinear(nn.Module):\n", + " def __init__(self, in_features: int, out_features: int, bias=True):\n", + " super().__init__()\n", + " # set all the parameters\n", + " \n", + " ...\n", + " \n", + " \n", + " bound = in_features**-0.5\n", + " self.weight = nn.parameter.Parameter(t.empty(..., ...).uniform_(-bound, bound))\n", + " if bias:\n", + " self.bias = nn.parameter.Parameter(t.empty(...).uniform_(-bound, bound))\n", + " else:\n", + " self.bias = None\n", + "\n", + " # The forward method is the same as the torch.nn.Linear\n", + " def forward(self, x: t.Tensor) -> t.Tensor:\n", + " pass\n", + "\n", + " def extra_repr(self) -> str:\n", + " return f\"in_features={self.in_features}, out_features={self.out_features}, bias={self.bias is not None}\"\n", + "\n", + "linear = MyLinear(3, 3)\n", + "tensor = t.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=t.float32)\n", + "t.manual_seed(0)\n", + "assert linear(tensor).round(decimals=4).equal(t.tensor([[-0.6577, -0.5797, 0.6369], [-1.1670, -2.0570, 1.8222], [-1.6764, -3.5343, 3.0075]])), \"Error in MyLinear\"" + ] + }, + { + "attachments": { + "padding_and_stride.gif": { + "image/gif": "" + } + }, + "cell_type": "markdown", + "id": "ec69a0c2502b4800", + "metadata": { + "collapsed": false + }, + "source": [ + "## Conv2d\n", + "The class Conv2d is a convolution in 2 dimensions. It's an essential tool for an AI model to learn be able to treat images, you should already know it with the CNN model that you made this morning ^^.\n", + "\n", + "Here we will do 2 things, first of all we will work on myConv2d, who is a function that make the convolution, and then we will create the class convolution, who implement that class.\n", + "\n", + "the padding and the stride are a bit tricky, basically, the padding is the number of pixels that we add to the input image (around the image to make the calculs), and the stride is the number of pixels that we add to make the convolution.\n", + "\n", + "![padding_and_stride.gif](attachment:padding_and_stride.gif)\n", + "\n", + "Here the padding is in grey, the kernel is in green, and the image is in blue. If we only use the kernel, we will have a smaller image, and so we add some pixels around the image to have the same size of the input image for the output.\n", + " \n", + "for more info go to check this [link](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24eaa1e65dba41e2", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def myConv2d(x : t.Tensor, weights : t.Tensor, stride : IntOrPair, padding : IntOrPair) -> t.Tensor:\n", + " # Get the stride and the padding as a pair\n", + " padding_height, padding_width = ...\n", + "\n", + " # Get the dimensions of the input x and the weights\n", + " ...\n", + " # create the output height and output width using the formula : output = (input + 2 * padding - kernel) / stride + 1 \n", + " ...\n", + " # create a tensor with the size of the output and fill it with 0 ask for some help if you need\n", + " out = ...\n", + " # create a strided tensor from the input tensor it's a way that I recommend to do it, it's not the only one\n", + " out[... , # don't replace the ...\n", + " padding_height : padding_height + ... , # replace the ... by the correct varriable\n", + " padding_width : padding_width + ... ] = x # replace the ... by the correct varriable\n", + " \n", + " # create the conv_size and the conv_stride\n", + " conv_size = ...\n", + " # hint \n", + " batch_stride, in_chanel_stride, image_height_stride, image_width_stride = out.stride()\n", + " conv_stride = ...\n", + " strided_x = t.as_strided(out, size=conv_size, stride=conv_stride)\n", + "\n", + " # and then you have to return the result of the convolution through the einsum method if you want an hint, continue to read\n", + " # the strided_x is a tensor with the shape (batch, out_height, out_width, in_channel, kernel_height, kernel_width)\n", + " # the weights is a tensor with the shape (out_channel, in_channel, kernel_height, kernel_width)\n", + " # the result is a tensor with the shape (batch, out_channel, out_height, out_width)\n", + " return ...\n", + "\n", + "assert myConv2d(tensor_image, weights, 1, 1).shape == (1, 1, 256, 256), \"Error in myConv2d\"" + ] + }, + { + "cell_type": "markdown", + "id": "75962a1136873bea", + "metadata": { + "collapsed": false + }, + "source": [ + "## extra_repr\n", + "\n", + "The extra_repr method is a method that is used to print the parameters of the class, it's a useful tool to have a better understanding of the class and to debug it.\n", + "\n", + "Ignore it if you don't understand it is not relevant for the exercise, but if you want to understand it, you can check this [link](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.extra_repr)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e09e17e9a65a10d", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def extra_repr(module, arg_names: List[str], kwarg_names: List[str]) -> str:\n", + " reprs = [repr(getattr(module, arg_name)) for arg_name in arg_names] + [\n", + " f\"{k}={getattr(module, k)}\" for k in kwarg_names\n", + " ]\n", + " return \", \".join(reprs)" + ] + }, + { + "cell_type": "markdown", + "id": "fff519a772d0256b", + "metadata": { + "collapsed": false + }, + "source": [ + "## MyConv2d\n", + "\n", + "Here you will have to implement the class MyConv2d, who is a simple 2D convolution, but ad a class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d81702d74dfaff66", + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "class MyConv2d(t.nn.Module):\n", + " def __init__(\n", + " self,\n", + " in_channels: int,\n", + " out_channels: int,\n", + " kernel_size: IntOrPair,\n", + " stride: IntOrPair = 1,\n", + " padding: IntOrPair = 0,\n", + " ):\n", + " \n", + " super().__init__()\n", + " # set all the parameters\n", + "\n", + " in_features = ...\n", + " bound = in_features**-0.5\n", + " self.weight = nn.parameter.Parameter(\n", + " t.empty((out_channels, in_channels, *self.kernel_size)).uniform_(-bound, bound)\n", + " )\n", + "\n", + " # The forward method is the same as the myConv2d function\n", + " def forward(self, x: t.Tensor) -> t.Tensor:\n", + " pass\n", + " def extra_repr(self) -> str:\n", + " return extra_repr(self, [\"in_channels\", \"out_channels\"], [\"kernel_size\", \"stride\"])\n", + " \n", + "assert MyConv2d(3, 3, 3, 1, 1)(tensor_image).shape == nn.Conv2d(3, 3, 3, 1, 1)(tensor_image).shape, \"Error in MyConv2d\"" + ] + }, + { + "cell_type": "markdown", + "id": "4c1dcc45", + "metadata": {}, + "source": [ + "---\n", + "Well done guys you are a master of torch ! gg wp\n", + "\n", + "You can know try to implement a model with your version of torch :) " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/AI/Day04/3 - MyTorch/assets/LeakyReLU.png b/AI/Day04/3 - MyTorch/assets/LeakyReLU.png new file mode 100644 index 0000000..210a250 Binary files /dev/null and b/AI/Day04/3 - MyTorch/assets/LeakyReLU.png differ diff --git a/AI/Day04/3 - MyTorch/assets/ReLU.png b/AI/Day04/3 - MyTorch/assets/ReLU.png new file mode 100644 index 0000000..d771ede Binary files /dev/null and b/AI/Day04/3 - MyTorch/assets/ReLU.png differ diff --git a/AI/Day04/3 - MyTorch/assets/Sigmoid.png b/AI/Day04/3 - MyTorch/assets/Sigmoid.png new file mode 100644 index 0000000..2f1b112 Binary files /dev/null and b/AI/Day04/3 - MyTorch/assets/Sigmoid.png differ diff --git a/AI/Day04/3 - MyTorch/assets/Tanh.png b/AI/Day04/3 - MyTorch/assets/Tanh.png new file mode 100644 index 0000000..3832658 Binary files /dev/null and b/AI/Day04/3 - MyTorch/assets/Tanh.png differ diff --git a/AI/Day04/3 - MyTorch/assets/padding_and_stride.gif b/AI/Day04/3 - MyTorch/assets/padding_and_stride.gif new file mode 100644 index 0000000..2ed4ab7 Binary files /dev/null and b/AI/Day04/3 - MyTorch/assets/padding_and_stride.gif differ diff --git a/AI/Day04/README.md b/AI/Day04/README.md new file mode 100644 index 0000000..89455ba --- /dev/null +++ b/AI/Day04/README.md @@ -0,0 +1,27 @@ +# ~ PoC AI Pool 2025 ~ + +- ## Day 4: Neural Networks + - ### Module 1: Torch + - **Notebook:** [`introduction_to_torch.ipynb`](<1 - Torch/Introduction_Torch.ipynb>) + - ### Module 2: Vision models + - **Notebook 2.1 :** [`mnist.ipynb`](<2 - Vision-Models/2.1 -Minst.ipynb>) + - **Notebook 2.2 :** [`cifar.ipynb`](<2 - Vision-Models/2.1 - Cifar.ipynb>) + - ### Module 3: My Torch + - **Notebook:** [`my_torch.ipynb`](<3 - MyTorch/MynnTorch.ipynb>) + +--- + +**Already the third day !!** +On today's menu, you will dive into the wonderful world of torch, by starting of use it and create advanced neural network after ! + +It's up to you to choose the order of the 2 last notebook, start by `my_torch` before do the `cifar` is a good things also ! + +> Here's a list of resources that we believe can be useful to follow along (and that we've ourselves used to learn these topics before being able to write the subjects): + +## Ressources + +[Torch in 100 seconds](https://www.youtube.com/watch?v=ORMx45xqWkA) + +[Neural Networks Mnist video](https://www.youtube.com/watch?v=aircAruvnKk&t=34s) (*3blue1brown*) + +[Convolution explained](https://www.youtube.com/watch?v=KuXjwB4LzSA) (*3blue1brown*) \ No newline at end of file diff --git a/AI/Day05/.gitignore b/AI/Day05/.gitignore new file mode 100644 index 0000000..4de10de --- /dev/null +++ b/AI/Day05/.gitignore @@ -0,0 +1,2 @@ +*/__pycache__/* +*/runs/* diff --git a/AI/Day05/1.Introduction/Q_Learning.ipynb b/AI/Day05/1.Introduction/Q_Learning.ipynb new file mode 100644 index 0000000..c68cabe --- /dev/null +++ b/AI/Day05/1.Introduction/Q_Learning.ipynb @@ -0,0 +1,509 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Value-based method: Q Learning\n", + "\n", + "In the first notebook of the day, we will learn about one of the most popular RL algorithms: Q Learning !\n", + "\n", + "**Key facts**:\n", + "- [It was first defined in 1989 by Christopher J.C.H. WATKINS](https://link.springer.com/content/pdf/10.1007/BF00992698.pdf?pdf=button)\n", + "- It uses a **temporal difference (TD)** approach\n", + "- It is an **Action Value** function\n", + "- It is **off-policy**" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### a. Temporal Difference (TD)\n", + "\n", + "There are two different learning strategies on how to train a value or policy function :\\\n", + "One of them is 'Monte Carlo', in which the agent experiences an entire episode of the environment before learning (updating its value function). This means that it stores the state, action and reward inside a memory which it unwraps at the end of each episode.\n", + "> Monte Carlo will be explained in more detail inside the `REINFORCE.ipynb` notebook.\n", + "\n", + "Q-Learning uses 'Temporal Difference', which means it learns at each time step of the environment. In other words, the agent updates its value function using the current state, action, reward and resulting state.\n", + "\n", + "![Temporal difference](./assets/fig9.svg)\n", + "> Formula for temporal difference\n", + "\n", + "Don't let the mathematical expressions scare you, all you need to understand is that we update the state's value at each time step by adding the difference between the target and the old value, multiplied by a learning rate, to our old value.\n", + "\n", + "If it helps, here is a version of this formula in pseudo code:\n", + "\n", + "```py\n", + "LR = 0.05\n", + "GAMMA = 0.99\n", + "\n", + "state_values = [...] # the list of values for each of our states\n", + "action = agent_choice(state) # choosing an action based on the state\n", + "new_state, reward = environment_step(action) # retrieving a new state and a reward from the environment\n", + "\n", + "target = reward + GAMMA * state_values[new_state] # computing the target\n", + "state_values[state] = state_values[state] + LR * (target - state_values[state]) # updating the state value \n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Action value\n", + "\n", + "Q-Learning is a value-based function and there are also two different types of those as well !\n", + "> We promise, these ones are easy to differentiate !\n", + "\n", + "- State-value functions, where each state has a different value\n", + "- Action-value functions, where each (state,action) pair has a different value\n", + "\n", + "![Action and state values](./assets/fig10.svg)\n", + "\n", + "Notice how there are (state,action) pairs where the value is 0. That is because our agent never performed the actions at those states." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perhaps it's time to take a short break from the theory and get into implementation, shall we ?\n", + "> You'll see, it'll be much easier to understand if you take it all one step at a time !\n", + "\n", + "Let's begin by importing some libraries and defining some constants..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import the necessary libraries\n", + "import numpy as np\n", + "import random\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib import animation\n", + "from seaborn import heatmap\n", + "from scipy.ndimage import gaussian_filter1d\n", + "\n", + "from IPython.display import Image\n", + "from moviepy.editor import ImageSequenceClip\n", + "\n", + "# Import the Environment class from the envi module\n", + "from envi import Environment\n", + "\n", + "# Define the actions that the agent can take\n", + "ACTIONS = {'UP': 0, 'LEFT': 1, 'DOWN': 2, 'RIGHT': 3}\n", + "\n", + "# Define the size of the gridworld\n", + "MAP_SIZE = 10\n", + "\n", + "# Define the number of episodes to train for\n", + "EPISODES = 10_000\n", + "\n", + "# Define the learning rate\n", + "LR = 5e-3\n", + "\n", + "# Define the discount factor\n", + "GAMMA = 0.99" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's create an `Agent` class which will contain our action values and a method to update them !\n", + "\n", + "In Q-Learning, the table which contains our action values is called the Q-Table ! (catchy, right ?)\n", + "\n", + "In order to define this Q-Table, we simply create an array of shape `(number_of_states, number_of_actions)` and initialize all of its values to 0.\n", + "You can use `numpy`'s `zeros()` method to achieve this by defining the `self.q_table` property inside the `__init__()` method of our Agent.\n", + "\n", + "Our environment is a grid world, so you can consider each square a separate state, meaning that if our grid is of size 2 * 2, the number of states is 4.\n", + "\n", + "\n", + "| Action | State 1 | State 2 | State 3 | State 4 |\n", + "| ---------- | ------- | ------- | ------- | ------- |\n", + "| **UP** | 0.0 | 0.0 | 0.0 | 0.0 |\n", + "| **DOWN** | 0.0 | 0.0 | 0.0 | 0.0 |\n", + "| **RIGHT** | 0.0 | 0.0 | 0.0 | 0.0 |\n", + "| **LEFT** | 0.0 | 0.0 | 0.0 | 0.0 |\n", + "> This is what a fresh Q-Table of a 4 * 4 gridworld environment with 4 actions should look like." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Agent:\n", + " \"\"\"\n", + " This class defines our Agent which will interact with the environment and update its Q Table\n", + " \"\"\"\n", + " \n", + " def __init__(self):\n", + " # We initialize a value called `epsilon` (we will soon learn more about it)\n", + " self.epsilon = 1.0\n", + "\n", + " # Initialize the Q Table for the agent with zeros\n", + " self.q_table = None\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, that was easy. Let's implement the Q-Function now !\\\n", + "Remember the TD formula ? Well, the Q-Function is the same as that one, except we are updating the action-value, not the state-value:\n", + "\n", + "Knowing this, let's update our temporal difference formula using Action values instead:\n", + "\n", + "![Q-Learning formula](./assets/fig11.svg)\n", + "\n", + "Now, define a new method which implements this formula in python code !\n", + "\n", + "> - The `update_q_table()` method doesn't need to return anything, you must update the q_table directly inside the method.\n", + "> - Refer to the `Agent` class above if you don't remember the contents of `self` and how they can be useful." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def update_q_table(self, new_state, state, action, reward):\n", + " \"\"\"\n", + " This method updates the Q Table\n", + " \"\"\"\n", + " pass\n", + "\n", + "Agent.update_q_table = update_q_table" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## c. Off-policy\n", + "\n", + "There is one more useful concept you need to understand before we continue !\n", + "\n", + "Because Q-Learning is off-policy, we do not know which action to choose for any given state.\\\n", + "All we have is our Q-Table, which contains valuable information which our agent will need in order to form an opinion.\n", + "\n", + "But how should the agent form an opinion ?\n", + "\n", + "An easy answer is to simply pick the action with the highest value.\\\n", + "This is called a **greedy policy**.\n", + "\n", + "There is one flaw with this policy, though. While it is a good idea to pick the action that we believe is optimal, we do not have access to an optimal policy. Therefore, while a greedy policy works once our agent is well trained (because that means our estimated policy will be close to the optimal policy), it will not work quite as well if the agent is barely discovering the environment.\n", + "\n", + "Imagine this scenario:\n", + "\n", + "![Greedy policy flaw](./assets/fig12.svg)\n", + "\n", + "Our agent has two choices: either left or right ! \n", + "\n", + "If he chooses left, he receives +10 reward !!!\\\n", + "On the other hand, if he chooses right, he receives a measly +1 reward...\n", + "\n", + "Alas, robot boy goes to the right on his first attempt, while both action-values are 0.\\\n", + "Because of this, he believes going right is the best choice, despite never having attempted to go left !\n", + "\n", + "It is a bit like deciding you hate sitcoms because you've only ever seen 'Big Bang Theory' and you hated it.\\\n", + "But because of your **greedy policy**, you miss out on a show like 'Seinfeld' ! What a bummer !\n", + "\n", + "Thankfully, there's another policy you can try: **Epsilon-Greedy policy** !\n", + "\n", + "With the epsilon greedy policy, you start by picking actions at random before gradually choosing the actions you value !\n", + "\n", + "```py\n", + "epsilon = 1.0\n", + "\n", + "for i in range(1000):\n", + " use_greedy_policy = random.random() > epsilon # use greedy policy with a probability of epsilon\n", + " if use_greedy_policy: # if epsilon is high, this will happen more often\n", + " action = greedy_action(state)\n", + " else: # if epsilon is low, this will happen more often instead\n", + " action = random_action()\n", + "\n", + " epsilon = max(epsilon * 0.995, 0.05) # decaying epsilon so that we gain confidence in our Q-Table (we tend to keep a small probability of random policy during training so we don't go below 0.05)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def epsilon_greedy_policy(self, state):\n", + " \"\"\"\n", + " This method is an implementation of the epsilon greedy policy\n", + " \"\"\"\n", + " pass\n", + "\n", + "Agent.epsilon_greedy_policy = epsilon_greedy_policy" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q-Learning Algorithm\n", + "\n", + "Well, with that out of the way, we've defined a nice Agent class. It will come in handy for the next part, which is training the agent to solve our gridworld environment !\n", + "\n", + "Let's start by initializing our Agent and Environment instances, as well as some lists we will use to store our rewards throughout the training for plotting purposes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create an environment and an agent\n", + "from collections import deque\n", + "\n", + "env = Environment(MAP_SIZE, ACTIONS)\n", + "agent = Agent()\n", + "\n", + "# Initialize empty lists for rewards and losses\n", + "recent_rewards = deque(maxlen=1_000)\n", + "train_rewards = []" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, your task is to implement the following algorithm (figure taken from [Sutton and Bartol's 'Reinforcement Learning: an Introduction'](http://incompleteideas.net/book/RLbook2020.pdf))\n", + "\n", + "![Q-Learning algorithm](./assets/fig8.svg)\n", + "\n", + "You've already implemented most of the algorithm inside the `Agent` class. Try to understand which lines correspond to which methods.\n", + "\n", + "The initialization is covered by `agent = Agent()` which we declared above.\n", + "The epsilon greedy policy is a method inside `Agent` and so is the penultimate line of second loop: updating the q-table !\n", + "\n", + "The Environment class has two methods you should know about:\n", + "\n", + "- `env.reset()` resets the environment and returns a state\n", + "- `env.step()` updates the environment by taking an action as argument and returns a tuple containing `new_state, reward, done`. The latter is a boolean which tells us whether the episode is terminated or not.\n", + "\n", + "With this info, see if you can fill in the blanks and build your Q-Learning algorithm :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Iterate over the number of episodes\n", + "for episode in range(EPISODES):\n", + " # Reset the environment to get the initial state\n", + " state = env.reset()\n", + "\n", + " # Initialize empty lists for rewards and losses in this episode\n", + " episode_reward = []\n", + "\n", + " # Iterate over the time steps in the episode\n", + " for i in range(1000):\n", + " action = agent.epsilon_greedy_policy(state)\n", + "\n", + " # Interact with the environment to get the new state, reward, and done flag \n", + "\n", + " # Set the new state as the current state\n", + "\n", + " # If the episode is done, break out of the loop\n", + " \n", + " # Log the rewards and losses for this episode\n", + " train_rewards.append(np.sum(episode_reward))\n", + " recent_rewards.append(train_rewards[-1])\n", + "\n", + " # Print a table of information about the episode every 5,000 episodes\n", + " if episode % 1_000 == 0:\n", + " print(f\"Episode {episode:>6}: \\tR:{np.mean(recent_rewards):>6.3f}\\t Epsilon:{agent.epsilon:>6.3f}\\t State:{state:>6}\")\n", + "\n", + "# Reset the environment to get the initial state\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots()\n", + "\n", + "# plotting rewards\n", + "ax.plot(gaussian_filter1d(train_rewards, sigma=10))\n", + "ax.set_title('Rewards')\n", + "# show figure\n", + "fig.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If all went well, your rewards should increase before reaching a treshold.\n", + "> This graph depends on the parameters you set at the beginning of the notebook.\\\n", + "> You can try changing the MAP_SIZE for example for very different results.\\\n", + "> It is advised to stay below 30 for the MAP_SIZE, otherwise your agent might find that it is a better idea to kill itself rather than reach its goal ! " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check out our Q-table and observe our estimated policy as well as the values for each action-state pair !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Extract the optimal actions from the Q-table\n", + "best_actions = [np.argmax(x) if np.mean(x) != x[0] else -1 for x in agent.q_table]\n", + "\n", + "# Initialize a matrix for the policy\n", + "policy = np.zeros((MAP_SIZE ** 2, len(ACTIONS)))\n", + "\n", + "# Fill in the policy matrix\n", + "for y in range(MAP_SIZE ** 2):\n", + " for x in range(MAP_SIZE):\n", + " if x == best_actions[y]:\n", + " policy[y][x] = 1\n", + "\n", + "# Create a figure with two subplots\n", + "fig, ax = plt.subplots(1,2)\n", + "\n", + "# Plot the policy matrix as a heatmap\n", + "heatmap(policy, ax=ax[0], xticklabels=ACTIONS, cbar=False)\n", + "\n", + "# Plot the Q-table as a heatmap\n", + "heatmap(agent.q_table, ax=ax[1], xticklabels=ACTIONS, annot=MAP_SIZE<6)\n", + "\n", + "# Show the figure\n", + "fig.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This code extracts the optimal actions from the Q-table and uses them to create a matrix representing the policy. It then plots the policy matrix and the Q-table as heatmaps. The policy matrix shows which actions are considered optimal in which states, while the Q-table shows the values of the actions in each state." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's record a video of our agent solving the grid world !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "frames = []\n", + "\n", + "# Iterate over the time steps in the episode\n", + "for i in range(1000):\n", + " # Add the current state of the environment to the list of frames\n", + " frames.append(np.array(env.graphic()))\n", + "\n", + " # Choose the greedy action for the current state\n", + " action = agent.epsilon_greedy_policy(state)\n", + "\n", + " # Interact with the environment to get the new state, reward, and done flag\n", + " new_state, reward, done = env.step(action)\n", + "\n", + " # Set the new state as the current state\n", + " state = new_state\n", + "\n", + " # If the episode is done, reset the environment and break out of the loop\n", + " if done is True:\n", + " frames.append(np.array(env.graphic()))\n", + " state = env.reset()\n", + " break\n", + "\n", + "clip = ImageSequenceClip(list(frames), fps=20)\n", + "clip.resize(width=300)\n", + "clip.write_gif('output.gif', fps=20)\n", + "Image('output.gif', width=300)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conclusion\n", + "\n", + "Awesome ! You've managed to implement the Q-Learning algorithm in python !\n", + "\n", + "Isn't it cool to see the agent go from not knowing anything to seamlessly running through the maze while avoiding all obstacles ?\n", + "\n", + "What happens if the maze is bigger, though ? Increase the value of the `MAP_SIZE` constant at the top of the notebook to see the changes. Beware though; this environment is not suited for large sizes, so stay below 30 if you want good results. Also, the larger the size, the long it will take your agent to learn. You can also change the episode count and length, if you want !\n", + "\n", + "Hopefully this notebook was fun. We decided to spare you the creation of the environment because that doesn't teach you anything about AI and it would be a little time consuming for what it's worth.\n", + "\n", + "In the next notebook, `REINFORCE.ipynb`, we'll show you a great tool that is used in RL to easily deal with pre-made environments which are tailor-made for RL ! We will also be implementing a policy-based, on-policy, monte carlo algorithm !\\\n", + "Basically the opposite to Q-Learning !" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.10.8 ('pool')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0]" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "6b483bbea0ef867292651300ca303e9b91f9a0c7db919f54df8d16a1790f2d11" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day05/1.Introduction/README.md b/AI/Day05/1.Introduction/README.md new file mode 100644 index 0000000..337ff2a --- /dev/null +++ b/AI/Day05/1.Introduction/README.md @@ -0,0 +1,80 @@ +![The Machine Learning Trinity](./assets/fig1.svg) + +So far, this Pool has covered supervised learning.\ +There are two other major fields inside machine learning: **Unsupervised** and **Reinforcement learning** ! + +In this fifth day, we will be exploring the various algorithms and libraries used in the latter. + +# Reinforcement Learning: The Basics + +Reinforcement Learning is perhaps the most appealing field in machine learning: teaching an artificial intelligence to play a video game or building a self-driving car can't possibly sound lame to anyone ! + +But a lot of math is involved behind all of this fun, so let's quickly get into the theory (don't skip this part, you **will** regret it) : + +# 1. Introduction + +![Reinforcement Learning Model](assets/fig2.svg) +> a basic representation of a reinforcement learning model: the agent sends an 'action' to the environment which returns a 'state' and a 'reward' to the agent. + +The key concepts in reinforcement learning are the following: +- **The Agent**, which takes an action based on the state of the environment. The agent is synonymous with the `Player` in a video game, for example, Mario in 'Super Mario Bros.`\ +In certain implementations of RL, this Agent can have a Memory of the previous events within the environment. +- **an action**, which is taken by the Agent within the environment. For example, a mapping of all the possible buttons on a keyboard, gamepad or simply a list of defined possible actions within the environment.\ +In a grid world problem, like 'Sokoban', the actions could simply be 'left', 'right', 'up', 'down'. +- **The Environment**, which receives an action from the agent and returns a state and reward based on the given action. +- a **state**, which represents all the information regarding the environment. It can also be a simple observation of itself.\ +For example, in a game of chess, the player always receives the full state of the game: there is no **hidden** information from the agent.\ +On the contrary, in a game of 'Super Mario Bros.', the player only receives an observation of the state: a grid of pixels the size of the screen. The player can not, at all times, see every detail of the state. + +![Example of observation](assets/fig3.svg) +> Example of an observation + +![Example of state](assets/fig4.svg) +> Example of a state + +- **a reward**, which is a value given by the environment to 'rate' the action the player has taken. This value can be anything but most importantly, a **negative** reward means that the action was probably not a good idea for the current state and a **positive** reward means that the action was of high value ! + +| Action | Reward | +| -------------------- | -------- | +| Taking a pawn | +1 | +| Losing a pawn | -1 | +| Taking a rook | +5 | +| Losing a rook | -5 | +| Castling | +0 | +| Winning by checkmate | **+100** | +| Losing by checkmate | **-100** | +> Possible rewards based on different chess situations + +### As you may have guessed, the goal of Reinforcement Learning is to build an AI which develops an optimal policy for solving a certain environment by attempting to maximise its rewards ! + +# 2. Two main approaches + +There are two main approaches to solving this problem and finding this policy:\ +**Keep in mind** that both methods are just as good as the other and can provide better results in different situations. Most methods follow up on either one of these approaches, though, so it is important to learn both ! + +## a. Direct approach (Policy-based) + +The first approach is to directly learn the policy function which will indicate the best action to take at each state of the enviornment. + +![Policy-based representation](./assets/fig6.svg) +> The arrows represent the optimal actions for each state, the red diamonds are obstacles (negative reward) and the green circle is the goal (positive reward). The blue robot is our agent. + +## b. Indirect approach (Value-based) + +The second approach is to indirectly learn the policy function by first defining a value for each state and picking the action that leads to the best state at each step ! + +![Value-based representation](./assets/fig7.svg) + +# 3. It's up to you ! + +Now that you know the basic concepts of RL, we invite you to start implementing two basic algorithms: +- Q Learning, which is a value-based method +- and REINFORCE algorithm, which is a policy-based method + +>While there is no particular obligation to learn value-based methods before policy-based methods, we encourage you to follow this order today because we will begin using a very important python library inside the `REINFORCE.ipynb` notebook ! + +**STEPS:** +- Follow the `Q_Learning.ipynb` notebook to get started with this day's first task !\ +You will learn the implementation of the Q Learning algorithm which is a value-based approach to solve a custom made grid world environment ! +- Follow the `REINFORCE.ipynb` notebook to implement a policy based approach, the REINFORCE algorithm in order to solve an OpenAI Gym environment, `Cartpole` ! +- Use your favorite method to solve the `Cartpole` environment and share your results ! \ No newline at end of file diff --git a/AI/Day05/1.Introduction/REINFORCE.ipynb b/AI/Day05/1.Introduction/REINFORCE.ipynb new file mode 100644 index 0000000..c061e6f --- /dev/null +++ b/AI/Day05/1.Introduction/REINFORCE.ipynb @@ -0,0 +1,673 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Policy-based method: REINFORCE\n", + "\n", + "Now that we've implemented a value-based algorithm, it's only right that we should try out a policy-based one as well, right ? So let's learn about REINFORCE !\n", + "\n", + "**Key facts**:\n", + "- It was first defined in ['Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning' by Ronald J. WILLIAMS in 1992](https://link.springer.com/content/pdf/10.1007/BF00992696.pdf?pdf=button).\n", + "- It uses a **monte carlo** method\n", + "\n", + "### Monte Carlo\n", + "\n", + "As explained in the previous notebook, you can think of Monte Carlo as a method in which our agent learns after each episode instead of doing so at each time step like Temporal Difference.\n", + "\n", + "This implies that there is no need to estimate the target: we can compute the episodic reward for each timestep using the memory batch:\n", + "\n", + "![Monte Carlo formula](./assets/fig13.svg)\n", + "\n", + "Let's begin by implementing this formula !\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But before we begin, let's import some modules and define some constants...\\\n", + "Notice this time we're using pytorch because it will make it easier for us to deal with optimization since pytorch has a built in 'Adam' optimizer which will improve our `REINFORCE` algorithm." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import necessary libraries\n", + "import numpy as np\n", + "\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from scipy.ndimage import gaussian_filter1d\n", + "\n", + "# Set the learning rate and discount factor\n", + "lr = 1e-3\n", + "gamma = 0.995\n", + "\n", + "# Set the number of episodes to run\n", + "episodes = 300" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## OpenAI Gym\n", + "\n", + "From now on, we will be using a popular RL framework called OpenAI Gym !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import gym\n", + "\n", + "# Set the environment to use\n", + "env_name = 'CartPole-v1'\n", + "\n", + "# Create the environment\n", + "env = gym.make(env_name, render_mode=\"rgb_array\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": "We will `make` our gym environment by providing an environment name. Here, we choose the [Cartpole environment](https://gymnasium.farama.org/environments/classic_control/cart_pole/). Feel free to take some time to read its documentation." + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Cart pole](./assets/fig14.gif)\n", + "> gif representing the cartpole environment taken from the official documentation" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's see what information we can retrieve from the `env` variable..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f'Action space: {env.action_space} ({env.action_space.n} possible actions)')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The action space is `Discrete(2)`, a discrete action space with 2 possible actions.\n", + "\n", + "A discrete action space means there is a finite set of actions that the agent can take, for example going left or right. On the contrary, a continuous action space means the actions can depend on various variables, like for example all the different ways you can move a pawn, as well as all the different ways you can move a knight and so on in a chess game." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f'Observation space: {env.observation_space}')\n", + "print()\n", + "print(f'State shape: {env.observation_space.shape}')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Observation space is of shape (4,) meaning it has 4 values. We've printed the maximum and minimum for each of these values above.\n", + "\n", + "You can see that the first value's minimum is `-4.8` and it's maximum is `4.8`.\\\n", + "It corresponds to the cart's position.\n", + "\n", + "The second and fourth values ranges from `-infinity` to `infinity` (`3.8e+38` representing infinity in this case).\\\n", + "These values correspond to the Cart's velocity and the Pole's angular velocity respectively.\n", + "\n", + "The third value ranges from `-0.42` to `0.42`.\\\n", + "It represents the pole's angle in radians.\n", + "\n", + "[Read this part of the documentation for more details](https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alright, know that we have these values, we have all we need to build our neural network because we know what our input size and action size are !" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's build our Neural Network which we will use as our policy function.\\\n", + "Its input will be the environment's state and its output will be a list of probabilities for each action.\\\n", + "You can do whatever you want with the hidden layer(s).\n", + "\n", + "As for the activations, apply ReLU for the first linear function followed by softmax for the output layer.\n", + "\n", + "> Use `env` to access the input and output sizes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a neural network to model the policy\n", + "class NeuralNetwork(nn.Module):\n", + " def __init__(self, env):\n", + " super().__init__()\n", + "\n", + " # Create fully-connected layers with ReLU activations\n", + " self.fc1 = None\n", + " self.fc2 = None\n", + "\n", + " self.actions, self.states, self.rewards = [], [], []\n", + "\n", + " def forward(self, x):\n", + " # Convert the input tensor to a float tensor\n", + "\n", + " # Apply ReLU activations to the fully-connected layers\n", + "\n", + " # Apply a softmax activation to the final layer, to get probabilities for each action\n", + "\n", + " return x\n", + "\n", + "network = NeuralNetwork(env)\n", + "\n", + "# Use Adam optimizer to optimize the neural network\n", + "optim = None" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Neural Network](./assets/fig16.svg)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Awesome, now let's see what else gym can do !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "state, info = env.reset()\n", + "\n", + "print(state)\n", + "print()\n", + "print(info)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By using `env.reset()`, we have access to the four values inside our state ! If you reload this cell, you'll notice that these values are initialized randomly.\n", + "\n", + "We also receive an empty dictionary which for other environments can contain additional information.\\\n", + "From now on, we will be receiving `state, _` from `env.reset()` because we don't have any need for the `info` dictionary." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "action = env.action_space.sample() # use this method to get a random action from the action space\n", + "print(f\"We choose the action {action}...\")\n", + "\n", + "new_state, reward, termination, truncation, _ = env.step(action) # the last return is the info dictionary\n", + "\n", + "print(\"And we receive:\")\n", + "print()\n", + "print(f'Our new state: {new_state}') \n", + "print(f'The reward: {reward}')\n", + "print(f'Whether our episode was terminated: {termination}')\n", + "print(f'Or truncated: {truncation}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(env.render())" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`env.render()` returns an rgb array representing our environment which we can plot using matplotlib's `imshow()` method !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from time import sleep\n", + "\n", + "\n", + "env = gym.make(env_name, render_mode=\"human\")\n", + "\n", + "for _ in range(5):\n", + " env.reset()\n", + " termination = False\n", + " while termination is not True:\n", + " _, _, termination, _, _ = env.step(env.action_space.sample())\n", + "\n", + "env.close()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With this simple loop, we can see how our agent fares when it chooses an action at random. Not so well, huh ? Well let's train it using REINFORCE and see how it improves !\n", + "\n", + "Here's the REINFORCE algorithm as defined in Chapter 13 of [Sutton and Bartol's 'Reinforcement Learning: an Introduction'](http://incompleteideas.net/book/RLbook2020.pdf):\n", + "\n", + "![REINFORCE algorithm](./assets/fig15.svg)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's begin by setting up a few lists we'll be using for logging our reward and loss..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import the deque class from the collections module\n", + "from collections import deque\n", + "\n", + "# Initialize empty lists for rewards and losses\n", + "recent_rewards = deque(maxlen=100)\n", + "train_rewards = []\n", + "train_loss = []\n", + "\n", + "# We will avoid rendering our environment during training: \n", + "# it would tremendously slow down the process\n", + "env = gym.make(env_name) " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, it's up to you to implement the REINFORCE algorithm using OpenAI Gym's Cartpole environment !\n", + "\n", + "Let's begin by defining our policy:\n", + "\n", + ">- Create a `policy_action()` method which returns an action based on the policy\n", + ">- Check out [Pytorch's Categorial Class](https://pytorch.org/docs/stable/distributions.html) which provides a great tool for probability distributions.\\\n", + ">The provided link explains its usage within REINFORCE in particular ! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from torch.distributions import Categorical\n", + "\n", + "def policy_action(self, state):\n", + " # Get the probabilities for each action, using the current state\n", + "\n", + " # Create a distribution according to the probabilities\n", + "\n", + " # Sample an action from the distribution\n", + "\n", + " # Return the chosen action\n", + " pass\n", + "\n", + "NeuralNetwork.policy_action = policy_action" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next up, we need to define a simple method which stores our `action, state, reward` tuple at each time step.\n", + "\n", + ">- Simply add the arguments, `Action, State, Reward`, to their respective lists inside the `network` object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def remember(self, Action, State, Reward):\n", + " pass\n", + " \n", + "NeuralNetwork.remember = remember" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we need to compute our discounted rewards for each time step, in other words, the 'G' variable in the algorithm:\n", + "\n", + "You can think of discounting as a way to help the agent become a better long-term planner as opposed to a short-term opportunist. We do this by discounting the value of rewards based on the time step.\n", + "\n", + "$$ G = \\sum_{k=t+1}^{T} \\gamma^{k-t-1} R_k $$" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What this means:\n", + "\n", + "- We need to return a list of `T` numbers (`T` being the episode length in steps)\n", + " - Each of these numbers ( $ \\sum^{T}_{k=t+1} $ ) is defined as such: \n", + " - $ \\gamma^{k-t-1} * R_k $\n", + "\n", + "- You are free to achieve this using either loops or `numpy` methods like [`power()`](https://numpy.org/doc/stable/reference/generated/numpy.power.html) and [`cumsum`](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html)\n", + "\n", + "- Don't forget we've declared a `gamma` constant at the top of the notebook !" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def discount_rewards(self):\n", + " ## Discount the returns using the discount factor\n", + " pass\n", + "\n", + "NeuralNetwork.discount_rewards = discount_rewards\n", + "\n", + "network.rewards = [0.2, 0.6, 0.1, 1.2, 0.9]\n", + "network.discount_rewards()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Expected output: `[2.9602269 , 2.7602269 , 2.1632269 , 2.0642244 , 0.88213455]`\n", + "> If your values are close to these, it means you've correctly implemented the discounting." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's implement our gradient ascent !\n", + "\n", + "Since Adam, our optimizer, does most of the job for us, we only need to provide it the correct loss.\\\n", + "In REINFORCE, the loss is defined as follows:\n", + "\n", + "$$ L = G \\nabla \\ln \\pi(A_t|S_t,\\theta) $$\n", + "\n", + "The $ \\nabla $ symbol represents the gradient, so you can understand this formula as: `loss = G * log_prob` \n", + "\n", + "Because we are attempting to find the parameters of our policy which **maximize** the expected cumulative reward, we will not use the familiar **gradient descent** and instead use what is known as **gradient ascent**.\n", + "\n", + "Fear not, because it is not very complicated !\n", + "\n", + "With gradient descent, we update the parameters in the opposite direction of the gradient, which decreases the training cost or the expected cumulative reward, in this case.\n", + "\n", + "So in order to use gradient **ascent**, we must aim to **increase** the expected cumulative reward. Which means we \n", + "can achieve this by simply doing a `backward()` on a negative loss !\n", + "\n", + "This leads us back to the Monte Carlo formula for value-based methods:\n", + "\n", + "$$ G_t - V(S_t) $$\n", + "\n", + "or, for policy-based methods:\n", + "\n", + "$$ G[- \\nabla \\ln \\pi(A_t|S_t,\\theta)] $$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def gradient_ascent(self, discounted_rewards):\n", + " # Perform gradient ascent to update the probabilities in the distribution\n", + " for State, Action, G in zip(self.states, self.actions, discounted_rewards):\n", + " # Get the probabilities for the current state\n", + " probs = None\n", + "\n", + " # Calculate the loss as the negative log probability of the chosen action\n", + " # multiplied by the discounted return\n", + " loss = None\n", + "\n", + " # Clear the gradients, backpropagate the loss, and update the network parameters\n", + " \n", + " \n", + " \n", + " #\n", + "\n", + "NeuralNetwork.gradient_ascent = gradient_ascent" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All that remains is to use this `NeuralNetwork` class inside a training loop to make our agent learn how to solve Cartpole using the REINFORCE algorithm !\n", + "\n", + "We'll leave this part up to you ! (Use the screenshot of the REINFORCE pseudo code from earlier in the notebook for reference)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Iterate over the number of episodes\n", + "for episode in range(episodes):\n", + " # Reset the environment and initialize empty lists for actions, states, and rewards\n", + " state, _ = env.reset()\n", + " network.actions, network.states, network.rewards = [], [], []\n", + "\n", + " # Train the agent for a single episode\n", + " for _ in range(1000):\n", + " action = network.policy_action(state)\n", + "\n", + " # Take the action in the environment and get the new state, reward, and done flag\n", + " new_state, reward, termination, truncation, _ = env.step(action)\n", + "\n", + " # Save the action, state, and reward for later\n", + " network.remember(action, state, reward)\n", + "\n", + " state = new_state\n", + "\n", + " # If the episode is done or the time limit is reached, stop training\n", + " if termination or truncation:\n", + " break\n", + "\n", + " # Perform gradient ascent\n", + " network.gradient_ascent(network.discount_rewards())\n", + "\n", + " # Save the total reward for the episode and append it to the recent rewards queue\n", + " train_rewards.append(np.sum(network.rewards))\n", + " recent_rewards.append(train_rewards[-1])\n", + "\n", + " # Print the mean recent reward every 50 episodes\n", + " if episode % 50 == 0:\n", + " print(f\"Episode {episode:>6}: \\tR:{np.mean(recent_rewards):>6.3f}\")\n", + "\n", + " if np.mean(recent_rewards) > 400:\n", + " break" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots()\n", + "\n", + "ax.plot(train_rewards)\n", + "ax.plot(gaussian_filter1d(train_rewards, sigma=20), linewidth=4)\n", + "ax.set_title('Rewards')\n", + "\n", + "fig.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's display five episodes of our trained agent to see how glorious it is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "env = gym.make(env_name, render_mode=\"human\")\n", + "\n", + "for _ in range(5):\n", + " Rewards = []\n", + " \n", + " state, _ = env.reset()\n", + " done = False\n", + " \n", + " for _ in range(1000):\n", + " # Calculate the probabilities of taking each action using the trained\n", + " # neural network\n", + " probs = network.forward(state)\n", + " \n", + " # Sample an action from the resulting distribution using the \n", + " # torch.distributions.Categorical() method\n", + " action = None\n", + " \n", + " new_state, reward, termination, truncation, _ = env.step(action)\n", + " \n", + " state = new_state\n", + "\n", + " Rewards.append(reward)\n", + "\n", + " if termination or truncation:\n", + " break\n", + " \n", + " # Print the total rewards for the current episode\n", + " print(f'Reward: {sum(Rewards)}')\n", + "\n", + "# Close the environment\n", + "env.close()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wow, you really did it ! You've succesfully implemented both a value-based and a policy-based method in reinforcement learning ! And, to top it all off, you even managed to solve CartPole using OpenAI Gym !\n", + "\n", + "If you're up for it, let's head over to section 2 and go **deeper** within the field of RL by returning to value-based methods and implementing the successor to Q-Learning, Deep Q Network, or DQN for short !\n", + "\n", + "Good luck !" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pool", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.8" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "6b483bbea0ef867292651300ca303e9b91f9a0c7db919f54df8d16a1790f2d11" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/AI/Day05/1.Introduction/assets/fig1.svg b/AI/Day05/1.Introduction/assets/fig1.svg new file mode 100644 index 0000000..149828e --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig1.svg @@ -0,0 +1,16 @@ + + + + + + + Machine LearningSupervised LearningUnsupervised LearningReinforcement Learning \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig10.svg b/AI/Day05/1.Introduction/assets/fig10.svg new file mode 100644 index 0000000..cb5c9f1 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig10.svg @@ -0,0 +1,16 @@ + + + + + + + 0.10.51.0-1.00.10.1-1.01.00.50.10.10.10.00.00.00.00.00.00.00.0State valueAction value \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig11.svg b/AI/Day05/1.Introduction/assets/fig11.svg new file mode 100644 index 0000000..94551a2 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig11.svg @@ -0,0 +1,16 @@ + + + + + + + ** \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig12.svg b/AI/Day05/1.Introduction/assets/fig12.svg new file mode 100644 index 0000000..1216f9f --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig12.svg @@ -0,0 +1,16 @@ + + + + + + + 0.01.0reward = +10reward = +1 \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig13.svg b/AI/Day05/1.Introduction/assets/fig13.svg new file mode 100644 index 0000000..29d2a27 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig13.svg @@ -0,0 +1,16 @@ + + + + + + + Updated state valueOld state valueOld state valueTarget* \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig14.gif b/AI/Day05/1.Introduction/assets/fig14.gif new file mode 100644 index 0000000..60e5fed Binary files /dev/null and b/AI/Day05/1.Introduction/assets/fig14.gif differ diff --git a/AI/Day05/1.Introduction/assets/fig15.svg b/AI/Day05/1.Introduction/assets/fig15.svg new file mode 100644 index 0000000..33d2c1e --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig15.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig16.svg b/AI/Day05/1.Introduction/assets/fig16.svg new file mode 100644 index 0000000..bae124b --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig16.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig2.svg b/AI/Day05/1.Introduction/assets/fig2.svg new file mode 100644 index 0000000..cf53813 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig2.svg @@ -0,0 +1,16 @@ + + + + + + + AgentEnvironmentactionstate reward \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig3.svg b/AI/Day05/1.Introduction/assets/fig3.svg new file mode 100644 index 0000000..bf90860 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig3.svg @@ -0,0 +1,16 @@ + + + + + + + ?????? \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig4.svg b/AI/Day05/1.Introduction/assets/fig4.svg new file mode 100644 index 0000000..57eaf7b --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig4.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig5.svg b/AI/Day05/1.Introduction/assets/fig5.svg new file mode 100644 index 0000000..0077e3c --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig5.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig6.svg b/AI/Day05/1.Introduction/assets/fig6.svg new file mode 100644 index 0000000..4c0be14 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig6.svg @@ -0,0 +1,16 @@ + + + + + + + Policy-Based Methods \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig7.svg b/AI/Day05/1.Introduction/assets/fig7.svg new file mode 100644 index 0000000..070eeab --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig7.svg @@ -0,0 +1,16 @@ + + + + + + + -10.90.80.50.40.30.20.20.40.30.20.10.10.10.2-1-1-11Value-based Methods0.1 \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig8.svg b/AI/Day05/1.Introduction/assets/fig8.svg new file mode 100644 index 0000000..93cec39 --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig8.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/1.Introduction/assets/fig9.svg b/AI/Day05/1.Introduction/assets/fig9.svg new file mode 100644 index 0000000..a32ef9c --- /dev/null +++ b/AI/Day05/1.Introduction/assets/fig9.svg @@ -0,0 +1,16 @@ + + + + + + + TargetOld state valueOld state valueUpdated state value* learning rate** discount factor* \ No newline at end of file diff --git a/AI/Day05/1.Introduction/envi.py b/AI/Day05/1.Introduction/envi.py new file mode 100644 index 0000000..ef352a9 --- /dev/null +++ b/AI/Day05/1.Introduction/envi.py @@ -0,0 +1,90 @@ +""" +In this file we define the environment class to avoid filling the notebook with too much code +""" +import random +import numpy as np + +GRAPHICS = {' ': [255, 255, 255], + 'O': [255, 0, 0], + 'P': [0, 0, 255], + 'X': [0, 255, 0]} + +REWARDS = {'NEGATIVE': -100, + 'NEUTRAL': -1, + 'POSITIVE': 1000} + + +class Environment: + """ + The environment which will receive actions and update state + """ + + def __init__(self, map_size, actions): + self.map_size = map_size + self.actions = actions + + # generating a random map + self.map = [' '] * self.map_size + for i in range(self.map_size): + self.map[i] = [' '] * self.map_size + obstacle = random.randint(0, self.map_size - 1) + for j in range(self.map_size): + if i * self.map_size + j == self.map_size ** 2 - 1: + self.map[i][j] = 'X' + continue + if i != 0 and i != self.map_size - 1 and j == obstacle: + self.map[i][j] = 'O' + obstacle = True + # spawning obstacles + self.dangers = [] + for y in range(self.map_size): + for x in range(self.map_size): + if self.map[y][x] == 'O': + self.dangers.append(y*self.map_size+x) + # initialising agent state + self.state = 0 + + def graphic(self): + """ + This method will return an array of colors which can be used for animation in matplotlib + """ + color_array = [[GRAPHICS[x] for x in y] for y in self.map] + color_array[self.state // self.map_size][self.state % + self.map_size] = GRAPHICS['P'] + return np.array(color_array).repeat(10, axis=0).repeat(10, axis=1) + + def step(self, action): + """ + This method will update the environment based on the chosen action by the agent + """ + new_state = self.state + reward = 0 + done = False + + # movement + if action == self.actions['UP']: + new_state -= self.map_size + if action == self.actions['DOWN']: + new_state += self.map_size + if action == self.actions['LEFT'] and new_state % self.map_size != 0: + new_state -= 1 + if action == self.actions['RIGHT'] and new_state % self.map_size != self.map_size - 1: + new_state += 1 + if 0 <= new_state < self.map_size ** 2: + self.state = new_state + # granting rewards + reward = REWARDS['NEUTRAL'] + if self.state in self.dangers: + reward = REWARDS['NEGATIVE'] + done = True + if self.state == self.map_size ** 2 - 1: + reward = REWARDS['POSITIVE'] + done = True + return self.state, reward, done + + def reset(self): + """ + This method resets our environment to default values + """ + self.state = 0 + return self.state diff --git a/AI/Day05/2.GoingDeeper/README.md b/AI/Day05/2.GoingDeeper/README.md new file mode 100644 index 0000000..3c011e6 --- /dev/null +++ b/AI/Day05/2.GoingDeeper/README.md @@ -0,0 +1,118 @@ +# Going Deeper with Deep Q Networks + +![Lunar Lander](assets/fig1.gif) +> An agent solving the [Lunar Lander](https://gymnasium.farama.org/environments/box2d/lunar_lander/) environment + +Now that you've seen examples of both value and policy based methods for RL, let's take a deeper dive into the first one by implementing the Deep Q Network algorithm, which is what you get when you apply Deep Neural Networks to Q-Learning ! + +## 1. Why do we need neural networks ? + +We were able to train pretty good agents and receive nice rewards using simple Q-Learning by creating a Q-Table and updating its values. + +![Environment comparison](assets/fig2.svg) + +CartPole is one of the simpler Gym environments, with only four values in its state space and only two possible actions to be taken.\ +With more and more complex environments, we need to use neural networks to approximate our Q-Table ! + +![Q-Table](assets/fig3.svg) + +We have learned, during this week, of a function which allows to take an input and output a prediction based on that input. + +![DQN](assets/fig4.svg) + +The Deep Q Network can replace our Q-Table. It is a deep neural network which takes a state as input and outputs the q-values of each action within the state ! + +## Deep Q Network + +The goal of this exercise is to implement the following algorithm using PyTorch and OpenAI Gym to solve the Lunar Lander environment ! + +![Algorithm](assets/fig5.svg) +> The Deep Q Network as defined in [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602v1.pdf) by Mnih et al. + +### Dissecting the algorithm: + +The first thing you can do with this algorithm is to extract the different variables or constants: + +_Constants_: +- **Memory capacity** $N$ +- **Training length** $M$, the amount of episodes before the training ends +- **Episode length** $T$, the amount of time steps before the episode ends + - and its **timestep** $t$ +- **Discount rate $\gamma$**, as you know, usually $\gamma = 0.99$ + +_Variables_: +- **Transitions $(\phi_t, a_t, r_t, \phi_{t+1})$**: a tuple containing the current state $\phi$, the action $a$, the reward $r$ and the new state $\phi$ for each time step $t$ +- **Memory $D$**: a list of size $N$ containing $Transitions$ +- **Action-value function $Q$**: a neural network +- **sequence $s$ and preprocessed sequence $\phi$**: using gym, we can simply consider that our sequence is already preprocessed and that $\phi$ is the state we receive when we call `env.reset()` or `env.step()`. +- **Parameters $\theta$** these are the parameters of our neural network $Q$ + +_Methods_: +- **Execute action in emulator and observe reward and image**: the usual OpenAI Gym implementation: + ```py + state, reward, termination, truncation, _ = env.step(action) + ``` +- **For terminal / non-terminal state**: this is where `termination` finally comes in handy inside our algorithms ! + > Tip: you can implement this condition in one line by replacing the formula with: + > $$ y_j = r_j + (termination_j - 1) * \gamma * max_{a'} * Q(\phi_{j+1}, a'; \theta) $$ + > Because `termination` is a boolean, it means the result will be reduced to + > $$ y_j = r_j $$ + > when `termination` is equal to `True` +- **Gradient descent step**: the usual PyTorch implementation, nothing new here: + ```py + optimizer.zero_grad() ## reset the gradients + loss.backward() ## backward propagation + optimizer.step() ## updating the network + ``` + + +### Lunar Lander + +You need to solve the Lunar Lander environment, so read the [documentation](https://gymnasium.farama.org/environments/box2d/lunar_lander/) carefully. + +```py +env = gym.make("LunarLander-v2") +``` +> Bonus: your implementation should support the setting of parameters such as learning rate, discount rate, memory capacity, episode length, etc. from the command line, as such: +> ```py +> python3 dqn.py --lr 5e-4 --gamma 0.99 -M 500 -N 1000 -T 1000 +> ``` + +**Good luck !** + +## DQN Extensions + +In order to improve upon the base DQN algorithm, many different extensions were made. + +We would like to ask you to implement this one, defined in [Human-level control through deep reinforcement learning](https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf) by Mnih et al. + +It introduces the concept of a **target network**, which allows for the network to compare its predictions with a stable network while updating its "**online network**". + +This prevents a problem with the vanilla DQN, where the network would attempt to minimize its loss by getting closer to its own prediction, meaning it would be like a pig chasing a carrot that is attached to a rod carried by the human that's riding it. + +![Illustration of the problem with vanilla DQN](./assets/fig7.svg) + +With this extension, we use two separate neural networks: +- The **online** network, which is used for prediction. + - This network's parameters are: + - initialized randomly. + - updated at each timestep according to the loss. +- The **target** network, which is used to compute the loss. + - This network's parameters are: + - initialized to the same values as those of the **online** network. + - updated every **C** timesteps, but instead of updating them with an optimizer, we replace its parameters with those of the **online** network. + +Here's the updated algorithm: + +![DQN with target network](./assets/fig6.svg) + +There aren't many changes, as you can see.\ +Let's go through each difference: + +- First, we initialize a second neural network using the same parameters as the first. +- Then, instead of using the same network for our predictions and to compute our target, we use the second network instead. +- Finally, we reset its parameters to those of the first network every **C** steps. + +See if you can implement this extension into your network and observe the differences in performance ! + +**Good luck !** \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig1.gif b/AI/Day05/2.GoingDeeper/assets/fig1.gif new file mode 100644 index 0000000..3c6a165 Binary files /dev/null and b/AI/Day05/2.GoingDeeper/assets/fig1.gif differ diff --git a/AI/Day05/2.GoingDeeper/assets/fig2.svg b/AI/Day05/2.GoingDeeper/assets/fig2.svg new file mode 100644 index 0000000..7383f5c --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig2.svg @@ -0,0 +1,16 @@ + + + + + + + CartPole State SpaceLunarLander State Space \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig3.svg b/AI/Day05/2.GoingDeeper/assets/fig3.svg new file mode 100644 index 0000000..eb56ffd --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig3.svg @@ -0,0 +1,16 @@ + + + + + + + StateQ-TableAction \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig4.svg b/AI/Day05/2.GoingDeeper/assets/fig4.svg new file mode 100644 index 0000000..f9fa9c6 --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig4.svg @@ -0,0 +1,16 @@ + + + + + + + StateDQNAction \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig5.svg b/AI/Day05/2.GoingDeeper/assets/fig5.svg new file mode 100644 index 0000000..8e62a18 --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig5.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig6.svg b/AI/Day05/2.GoingDeeper/assets/fig6.svg new file mode 100644 index 0000000..19dabf6 --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig6.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/2.GoingDeeper/assets/fig7.svg b/AI/Day05/2.GoingDeeper/assets/fig7.svg new file mode 100644 index 0000000..d86f295 --- /dev/null +++ b/AI/Day05/2.GoingDeeper/assets/fig7.svg @@ -0,0 +1,16 @@ + + + + + + + \ No newline at end of file diff --git a/AI/Day05/README.md b/AI/Day05/README.md new file mode 100644 index 0000000..d575e43 --- /dev/null +++ b/AI/Day05/README.md @@ -0,0 +1,19 @@ +# ~ PoC AI Pool 2025 ~ + +- ## Day 5: Reinforcement Learning + - ### Module 1: Q Learning + - **Notebook:** [`Q_Learning.ipynb`](./1.Introduction/Q_Learning.ipynb) + - ### Module 2: REINFORCE + - **Notebook:** [`REINFORCE.ipynb`](./1.Introduction/REINFORCE.ipynb) + - ### Module 3: Deep Q Network + - **Module:** [`2.GoingDeeper`](./2.GoingDeeper) + +--- + +Today, the focus is Reinforcement Learning : where machines can learn to adapt to environments and take actions based on their observations ! + +> Here's a list of resources that we believe can be useful to follow along (and that we've ourselves used to learn these topics before being able to write the subjects): + +- [Reinforcement Learning: An Introduction](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf) +- [Huggingface Deep RL Course](https://huggingface.co/deep-rl-course/unit0/introduction) +- [Gymnasium](https://gymnasium.farama.org/index.html) diff --git a/AI/Day05/requirements.txt b/AI/Day05/requirements.txt new file mode 100644 index 0000000..30cf277 --- /dev/null +++ b/AI/Day05/requirements.txt @@ -0,0 +1,5 @@ +seaborn +scipy +torch +gym +tensorboard diff --git a/AI/README.md b/AI/README.md new file mode 100644 index 0000000..2975efc --- /dev/null +++ b/AI/README.md @@ -0,0 +1,76 @@ +# ~ PoC AI Pool 2026 ~ + +Welcome to the **PoC AI Pool 2026** ! During this week, you will discover the world of Artificial Intelligence, from Python basics to reinforcement learning, through machine learning, deep learning and large language models. + +## Program + +| Day | Topic | Description | +|-----|-------|-------------| +| [Day 01](Day01) | **Python Basics** | Python fundamentals, NumPy, Matplotlib and Pandas | +| [Day 02](Day02) | **Large Language Models** | Fine-tuning a pre-trained model and building a RAG system | +| [Day 03](Day03) | **Machine Learning** | Linear regression, logistic regression and neural network theory | +| [Day 04](Day04) | **Neural Networks** | PyTorch, vision models (MNIST, CIFAR) and building your own framework | +| [Day 05](Day05) | **Reinforcement Learning** | Q-Learning, REINFORCE and Deep Q-Networks | + +## Getting Started + +Before starting the pool, please follow the [setup guide](setup.md) to configure your environment (virtual environment or Docker). + +### Prerequisites + +- Python 3.x and `pip` +- Jupyter Notebook +- Git + +### Quick Start + +```bash +git clone +cd AI +python3 -m venv venv +source venv/bin/activate +pip install -r requirements.txt +``` + +Then navigate to the day's folder and open the notebooks. + +## Structure + +``` +AI/ +├── Day01/ # Python, NumPy, Matplotlib, Pandas +├── Day02/ # Fine-tuning, RAG +├── Day03/ # Linear & Logistic Regression, Neural Networks +├── Day04/ # PyTorch, Vision Models, MyTorch +├── Day05/ # Reinforcement Learning +├── setup.md # Environment setup guide +└── requirements.txt +``` + +Each day has its own README with detailed instructions and resources. + +

+Organization +

+
+

+ + LinkedIn logo + + + Instagram logo + + + Twitter logo + + + Discord logo + +

+

+ + Website logo + +

+ +> 🚀 Don't hesitate to follow us on our different networks, and put a star 🌟 on `PoC's` repositories. diff --git a/AI/image.png b/AI/image.png new file mode 100644 index 0000000..0dbaa58 Binary files /dev/null and b/AI/image.png differ diff --git a/AI/requirements.txt b/AI/requirements.txt new file mode 100644 index 0000000..0d36e69 --- /dev/null +++ b/AI/requirements.txt @@ -0,0 +1,26 @@ +jupyter==1.1.1 +notebook==7.3.2 +matplotlib==3.10.0 +numpy==1.26.3 +pandas==2.2.3 +seaborn==0.13.2 +scikit-learn==1.6.1 +requests==2.32.3 +gym==0.26.2 +box2d-py==2.3.8 +gym[classic_control]==0.26.2 +einops==0.8.0 +torchvision==0.12.0 +pillow==10.2.0 +scipy==1.15.1 +IPython==8.20.0 +moviepy==2.1.2 +envi==0.2.2 +tensorboard==2.18.0 +datasets==3.2.0 +nltk==3.9.1 +torchvision==0.12.0 +gradio==3.0.2 +prophet==1.1.2 +h2o==3.38.0.4 +tqdm==4.67.1 \ No newline at end of file diff --git a/AI/setup.md b/AI/setup.md new file mode 100644 index 0000000..9a71a1c --- /dev/null +++ b/AI/setup.md @@ -0,0 +1,75 @@ +# Python Setup Guide + +This setup guide provides instructions to quickly get started with the AI Pool using a virtual environment (`venv`), `pip` and a `requirements.txt` file, or, as an alternative option, Docker. + +## Requirements +- Python 3.x and `pip` installed on your machine (for venv setup). +- Docker installed on your machine (for Docker setup). + +## Option 1: Local Setup with Virtual Environment + +If you want to set up the environment locally on your machine, you can use a Python virtual environment (`venv`). + +### 1. Create and Activate the Virtual Environment + +Navigate to your project directory and create a new virtual environment: + +```bash +python3 -m venv poc_ai_pool_venv +``` + +Activate the virtual environment: + +- On **macOS/Linux**: + + ```bash + source poc_ai_pool_venv/bin/activate + ``` + +- On **Windows**: + + ```bash + poc_ai_pool_venv\Scripts\activate + ``` + +### 2. Install Required Packages + +Ensure that you have a `requirements.txt` file in your project directory, containing all the necessary dependencies for the pool. + +To install the required packages listed in the `requirements.txt`, run: + +```bash +pip install -r requirements.txt +``` + +## Classic error with vscode + +- **Virtual Environment is activated in your terminal, but not on your vscode** + - Ensure that the virtual environment is properly activated by checking the bottom right of your status bar. + + +## Option 2: Docker Setup + +### 1. Pull the Docker Image + +To begin, pull the pre-built Docker image that contains the necessary setup for the pool. Run the following command in your terminal: + +```bash +docker pull laiheau/poc_ai_pool +``` + +### 2. Run the Docker Container + +Once the image is pulled, you can start a Docker container that will run the environment. Execute this command: + +```bash +docker run -t -i -v .:/workspace:z -p 8888:8888 laiheau/poc_ai_pool +``` + +This will start a jupyter notebook server, **and copy your current directory in it**. You read this right, your current directory will be synchronized each seconds with the one in your jupyter notebook server. Then, you just need to follow the link in your terminal (see the picture bellow). + +![connection to server example picture](image.png) + +## Conclusion + +You now have two options for setting up the Poc AI Pool environment: using a local Python virtual environment or Docker. Both methods will give you access to Jupyter Notebooks and all the tools needed for the pool. Enjoy this week with us !