IBMDecisionOptimization · sdejonk · Jan 3, 2018
diff --git a/jupyter/MachineLearning_and_CPLEX.ipynb b/jupyter/MachineLearning_and_CPLEX.ipynb
@@ -8,23 +8,23 @@
     "\n",
     "In 2016, a retail bank sold several products (mortgage account, savings account, and pension account) to its customers.\n",
     "It kept a record of all historical data, and this data is available for analysis and reuse.\n",
-    "Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns. \n",
+    "Following a merger in 2017, the bank has new customers and wants to launch some marketing campaigns. \n",
     "\n",
     "The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product.\n",
     "\n",
     "\n",
     "The marketing department needs to decide:\n",
     "   * Who should be contacted?\n",
-    "   * Which product should be proposed? Proposing too many products is counter productive, so only one product per customer contact.\n",
+    "   * Which product should be proposed? (Proposing too many products is counter productive, so only one product will be proposed per customer contact.)\n",
     "   * How will a customer be contacted? There are different ways, with different costs and efficiency.\n",
-    "   * How can they optimally use the limited budget?\n",
+    "   * How can they optimally use their limited budget?\n",
     "   * Will such campaigns be profitable?\n",
     "   \n",
     "#### Predictive and prescriptive workflow\n",
     "\n",
-    "From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
-    "* We can apply this predictive model to the new customers data to predict for each new customer what they will buy.\n",
-    "* On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:\n",
+    "From the historical data, you can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
+    "* You can apply this predictive model to the new customer data to predict for each new customer what they will buy.\n",
+    "* With this new data, you decide which offers are proposed. Which product is offered to which customer through which channel is determined:\n",
     "   * a. with a greedy algorithm that reproduces what a human being would do\n",
     "   * b. using an optimization model wih IBM Decision Optimization.\n",
     "* The solutions can be displayed, compared, and analyzed.\n",
@@ -34,7 +34,7 @@
     "\n",
     "*  [Understand the historical data](#Understanding-the-historical-data)\n",
     "*  [Predict the 2017 customer behavior](#Predict-the-2017-customer-behavior)\n",
-    "*  [Get business decisions on the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
+    "*  [Get business decisions for the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
     "*  [Conclusion on the decision making](#Conclusion)"
    ]
   },
@@ -54,10 +54,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The purpose of this Notebook is not to provide a perfect machine learning model nor a perfect optimization model.\n",
-    "The purpose is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
+    "The purpose of this notebook is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
     "\n",
-    "This notebook takes some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
+    "This notebook can take some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
    ]
   },
   {
@@ -394,16 +393,16 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "We have 1650 clients who bought several products\n",
-      "We have 123 clients who bought all the products\n"
+      "You have 1650 clients who bought several products\n",
+      "You have 123 clients who bought all the products\n"
      ]
     }
    ],
    "source": [
     "abc = known_behaviors[known_behaviors.nb_products > 1]\n",
-    "print(\"We have %d clients who bought several products\" %len(abc))\n",
+    "print(\"You have %d clients who bought several products\" %len(abc))\n",
     "abc = known_behaviors[known_behaviors.nb_products == 3]\n",
-    "print(\"We have %d clients who bought all the products\" %len(abc))"
+    "print(\"You have %d clients who bought all the products\" %len(abc))"
    ]
   },
   {
@@ -421,14 +420,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Do some visual analysis of the historical data"
+    "##### Provide some visual analysis of the historical data"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It's possible to use pandas plotting capabilities, but it would require a new version of it. This Notebook relies on matplotlib as it is present everywhere."
+    "It's possible to use pandas plotting capabilities, but that would require a new version of it. This notebook relies on matplotlib as it is commonly used."
    ]
   },
   {
@@ -489,9 +488,9 @@
    "metadata": {},
    "source": [
     "### Understanding the 2016 customers\n",
-    "We can see that:\n",
-    "   * The greater a customer's income, the more likely it is s/he will buy a savings account.\n",
-    "   * The older a customer is, the more likely it is s/he will buy a pension account.\n",
+    "You can see that:\n",
+    "   * The greater a customer's income, the more likely it is he or she will buy a savings account.\n",
+    "   * The older a customer is, the more likely it is he or she will buy a pension account.\n",
     "   * There is a correlation between the number of people in a customer's household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart."
    ]
   },
@@ -536,7 +535,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's use the following columns as machine-learning features:"
+    "Use the following columns as machine-learning features:"
    ]
   },
   {
@@ -658,7 +657,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
+    "You are using a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
    ]
   },
   {
@@ -694,8 +693,8 @@
    "source": [
     "### New customer data and predictions\n",
     "\n",
-    "Load new customer data, predict behaviors using trained classifier, and do some visual analysis.\n",
-    "We have all the characteristics of the new customers, as for the 2016 clients, but the new customers did not buy any product yet.\n",
+    "Load new customer data, predict behaviors using a trained classifier, and perform some visual analysis.\n",
+    "You have all the characteristics of the new customers, as for the 2016 clients, but the new customers have not yet bought any product.\n",
     "\n",
     "##### Load new customer data"
    ]
@@ -921,7 +920,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Do some visual analysis of the predicted data"
+    "##### Perform some visual analysis of the predicted data"
    ]
   },
   {
@@ -1002,28 +1001,28 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "We predicted that 112 clients would buy more than one product\n",
-      "We predicted that 0 clients would buy all three products\n"
+      "It's predicted that 112 clients would buy more than one product\n",
+      "It's predicted that 0 clients would buy all three products\n"
      ]
     }
    ],
    "source": [
     "to_predict[\"nb_products\"] = to_predict.Mortgage + to_predict.Pension + to_predict.Savings\n",
     "\n",
     "abc = to_predict[to_predict.nb_products > 1]\n",
-    "print(\"We predicted that %d clients would buy more than one product\" %len(abc))\n",
+    "print(\"It's predicted that %d clients would buy more than one product\" %len(abc))\n",
     "abc = to_predict[to_predict.nb_products == 3]\n",
-    "print(\"We predicted that %d clients would buy all three products\" %len(abc))"
+    "print(\"It's predicted that %d clients would buy all three products\" %len(abc))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Remarks on the prediction\n",
-    "The goal is to contact the customers to sell them only one product, so we cannot select all of them.\n",
-    "This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer.  \n",
-    "It may be hard to compute this. In order to check, we will use two techniques:\n",
+    "The goal is to contact the customers to sell them only one product, so you cannot select all of them.\n",
+    "This increases the complexity of the problem: you need to determine the best contact channel, but also need to select which product will be sold to a given customer.  \n",
+    "It may be hard to compute this. In order to check, you will use two techniques:\n",
     "   * a greedy algorithm\n",
     "   * CPLEX, the IBM leading optimization solver."
    ]
@@ -1043,12 +1042,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Get business decisions on the 2017 data\n",
+    "# Get business decisions for the 2017 data\n",
     "## Assign campaigns to customers\n",
     "\n",
-    "* We have predicted who will buy what in the list of new customers.\n",
-    "* However, we do not have the budget to contact all of them. We have various contact channels with different costs and effectiveness.\n",
-    "* Furthermore, if we contact somebody, we don't want to frustrate them by proposing multiple products; we want to propose only one product per customer.\n",
+    "* You have a prediction of who will buy what in the list of new customers.\n",
+    "* However, you do not have the budget to contact all of them. You have various contact channels with different costs and effectiveness.\n",
+    "* Furthermore, if you contact a customer, you want to propose only one product per customer.\n",
     "\n",
     "##### Some input data for optimization\n"
    ]
@@ -1083,7 +1082,7 @@
    "metadata": {},
    "source": [
     "#### Using a greedy algorithm\n",
-    "* We create a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
+    "* You are creating a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
    ]
   },
   {
@@ -1251,7 +1250,7 @@
    "source": [
     "#### Using IBM Decision Optimization CPLEX Modeling for Python\n",
     "\n",
-    "Let's create the optimization model to select the best ways to contact customers and stay within the limited budget."
+    "Create the optimization model to select the best ways to contact customers and stay within the limited budget."
    ]
   },
   {
@@ -1402,7 +1401,7 @@
    "source": [
     "##### Express the objective\n",
     "\n",
-    "We want to maximize expected revenue, so we take into account the predicted behavior of each customer for each product."
+    "You want to maximize expected revenue, so you take into account the predicted behavior of each customer for each product."
    ]
   },
   {
@@ -1580,7 +1579,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "With the mathematical optimization, we made a better selection of customers."
+    "With the mathematical optimization, you made a better selection of customers."
    ]
   },
   {
@@ -1871,7 +1870,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Due to the business constraints, we can address a maximum of 1680 customers with a \\$35615 budget.\n",
+    "Due to the business constraints, you can address a maximum of 1680 customers with a \\$35615 budget.\n",
     "Any funds available above that amount won't be spent.\n",
     "The expected revenue is \\$87.1K."
    ]
@@ -1881,8 +1880,8 @@
    "metadata": {},
    "source": [
     "### Dealing with infeasibility\n",
-    "What about a context where we are in tight financial conditions, and our budget is very low?\n",
-    "We need to determine the minimum amount of budget needed to adress 1/20 of our customers."
+    "What about the context where you have tight financial conditions, and our budget is very low?\n",
+    "You need to determine the minimum amount of budget needed to address 1/20 of our customers."
    ]
   },
   {
@@ -1933,7 +1932,7 @@
     "    #setting all bool vars to 0 is an easy relaxation, so let's refuse it and force to offer something to 1/3 of the clients\n",
     "    mdl.add_constraint(totaloffers >= len(offers)//20, ctname=\"high\")\n",
     "    \n",
-    "    # solve has failed, we try relaxation, based on constraint names\n",
+    "    # solve has failed, trying relaxation, based on constraint names\n",
     "    # constraints are prioritized according to their names\n",
     "    # if a name contains \"low\", it has priority LOW\n",
     "    # if a ct name contains \"medium\" it has priority MEDIUM\n",
@@ -1987,8 +1986,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
-    "With this minimal budget, we will be able to adress 825 possible clients."
+    "You need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
+    "With this minimal budget, you will be able to address 825 possible clients."
    ]
   },
   {
@@ -2005,17 +2004,17 @@
     "| Greedy    |   50800 |              1123 |             299 |            111 |            713 |        21700 |\n",
     "| CPLEX     |   72600 |              1218 |             381 |            117 |            691 |        25000 |\n",
     "\n",
-    "* As you can see, with Decision Optimization, we can safely do this marketing campaign to contact <b>1218 customers</b> out of the 2756 customers. \n",
-    "* This will lead to a <b>\\$91.5K revenue</b>, significantly greater than the \\$49.5K revenue given by a greedy algorithm.\n",
-    "* With a greedy algorithm, we will:\n",
+    "* As you can see, with Decision Optimization, you can safely use this marketing campaign to contact <b>1218 customers</b> out of the 2756 customers. \n",
+    "* This will lead to a <b>\\$72.6K revenue</b>, significantly greater than the \\$50.8K revenue given by a greedy algorithm.\n",
+    "* With a greedy algorithm, you will:\n",
     "   * be unable to focus on the correct customers (it will select fewer of them), \n",
     "   * spend less of the available budget for a smaller revenue.\n",
     "   * focus on selling savings accounts that have the biggest revenue\n",
     "\n",
     "### Marketing campaign analysis\n",
-    "* We need a <b>minimum of \\$16K</b> to be able to start a valid campaign and we expect it will generate \\$47.5K.\n",
+    "* You need a <b>minimum of \\$16K</b> to be able to start a valid campaign and you expect it will generate \\$47.5K.\n",
     "\n",
-    "* Due to the business constraints, we will be able to address <b>1680 customers maximum</b> using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
+    "* Due to the business constraints, you will be able to address <b>1680 customers maximum</b> using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
    ]
   },
   {