Skip to content

Improved English and fixed errors #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 49 additions & 50 deletions jupyter/MachineLearning_and_CPLEX.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,23 @@
"\n",
"In 2016, a retail bank sold several products (mortgage account, savings account, and pension account) to its customers.\n",
"It kept a record of all historical data, and this data is available for analysis and reuse.\n",
"Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns. \n",
"Following a merger in 2017, the bank has new customers and wants to launch some marketing campaigns. \n",
"\n",
"The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product.\n",
"\n",
"\n",
"The marketing department needs to decide:\n",
" * Who should be contacted?\n",
" * Which product should be proposed? Proposing too many products is counter productive, so only one product per customer contact.\n",
" * Which product should be proposed? (Proposing too many products is counter productive, so only one product will be proposed per customer contact.)\n",
" * How will a customer be contacted? There are different ways, with different costs and efficiency.\n",
" * How can they optimally use the limited budget?\n",
" * How can they optimally use their limited budget?\n",
" * Will such campaigns be profitable?\n",
" \n",
"#### Predictive and prescriptive workflow\n",
"\n",
"From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
"* We can apply this predictive model to the new customers data to predict for each new customer what they will buy.\n",
"* On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:\n",
"From the historical data, you can train a machine learning product-based classifier on customer profile (age, income, account level, ...) to predict whether a customer would subscribe to a mortgage, savings, or pension account.\n",
"* You can apply this predictive model to the new customer data to predict for each new customer what they will buy.\n",
"* With this new data, you decide which offers are proposed. Which product is offered to which customer through which channel is determined:\n",
" * a. with a greedy algorithm that reproduces what a human being would do\n",
" * b. using an optimization model wih IBM Decision Optimization.\n",
"* The solutions can be displayed, compared, and analyzed.\n",
Expand All @@ -34,7 +34,7 @@
"\n",
"* [Understand the historical data](#Understanding-the-historical-data)\n",
"* [Predict the 2017 customer behavior](#Predict-the-2017-customer-behavior)\n",
"* [Get business decisions on the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
"* [Get business decisions for the 2017 data](#Get-business-decisions-on-the-2017-data)\n",
"* [Conclusion on the decision making](#Conclusion)"
]
},
Expand All @@ -54,10 +54,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The purpose of this Notebook is not to provide a perfect machine learning model nor a perfect optimization model.\n",
"The purpose is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
"The purpose of this notebook is to show how easy it is to mix machine learning and CPLEX data transformations by doing a forecast, then getting fast and reliable decisions on this new data. \n",
"\n",
"This notebook takes some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
"This notebook can take some time to run because multiple optimization models are solved and compared in the part dedicated to what-if analysis. The time it takes depends on your subscription type, which determines what optimization service configuration is used."
]
},
{
Expand Down Expand Up @@ -394,16 +393,16 @@
"name": "stdout",
"output_type": "stream",
"text": [
"We have 1650 clients who bought several products\n",
"We have 123 clients who bought all the products\n"
"You have 1650 clients who bought several products\n",
"You have 123 clients who bought all the products\n"
]
}
],
"source": [
"abc = known_behaviors[known_behaviors.nb_products > 1]\n",
"print(\"We have %d clients who bought several products\" %len(abc))\n",
"print(\"You have %d clients who bought several products\" %len(abc))\n",
"abc = known_behaviors[known_behaviors.nb_products == 3]\n",
"print(\"We have %d clients who bought all the products\" %len(abc))"
"print(\"You have %d clients who bought all the products\" %len(abc))"
]
},
{
Expand All @@ -421,14 +420,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Do some visual analysis of the historical data"
"##### Provide some visual analysis of the historical data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's possible to use pandas plotting capabilities, but it would require a new version of it. This Notebook relies on matplotlib as it is present everywhere."
"It's possible to use pandas plotting capabilities, but that would require a new version of it. This notebook relies on matplotlib as it is commonly used."
]
},
{
Expand Down Expand Up @@ -489,9 +488,9 @@
"metadata": {},
"source": [
"### Understanding the 2016 customers\n",
"We can see that:\n",
" * The greater a customer's income, the more likely it is s/he will buy a savings account.\n",
" * The older a customer is, the more likely it is s/he will buy a pension account.\n",
"You can see that:\n",
" * The greater a customer's income, the more likely it is he or she will buy a savings account.\n",
" * The older a customer is, the more likely it is he or she will buy a pension account.\n",
" * There is a correlation between the number of people in a customer's household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart."
]
},
Expand Down Expand Up @@ -536,7 +535,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use the following columns as machine-learning features:"
"Use the following columns as machine-learning features:"
]
},
{
Expand Down Expand Up @@ -658,7 +657,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
"You are using a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C."
]
},
{
Expand Down Expand Up @@ -694,8 +693,8 @@
"source": [
"### New customer data and predictions\n",
"\n",
"Load new customer data, predict behaviors using trained classifier, and do some visual analysis.\n",
"We have all the characteristics of the new customers, as for the 2016 clients, but the new customers did not buy any product yet.\n",
"Load new customer data, predict behaviors using a trained classifier, and perform some visual analysis.\n",
"You have all the characteristics of the new customers, as for the 2016 clients, but the new customers have not yet bought any product.\n",
"\n",
"##### Load new customer data"
]
Expand Down Expand Up @@ -921,7 +920,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Do some visual analysis of the predicted data"
"##### Perform some visual analysis of the predicted data"
]
},
{
Expand Down Expand Up @@ -1002,28 +1001,28 @@
"name": "stdout",
"output_type": "stream",
"text": [
"We predicted that 112 clients would buy more than one product\n",
"We predicted that 0 clients would buy all three products\n"
"It's predicted that 112 clients would buy more than one product\n",
"It's predicted that 0 clients would buy all three products\n"
]
}
],
"source": [
"to_predict[\"nb_products\"] = to_predict.Mortgage + to_predict.Pension + to_predict.Savings\n",
"\n",
"abc = to_predict[to_predict.nb_products > 1]\n",
"print(\"We predicted that %d clients would buy more than one product\" %len(abc))\n",
"print(\"It's predicted that %d clients would buy more than one product\" %len(abc))\n",
"abc = to_predict[to_predict.nb_products == 3]\n",
"print(\"We predicted that %d clients would buy all three products\" %len(abc))"
"print(\"It's predicted that %d clients would buy all three products\" %len(abc))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remarks on the prediction\n",
"The goal is to contact the customers to sell them only one product, so we cannot select all of them.\n",
"This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n",
"It may be hard to compute this. In order to check, we will use two techniques:\n",
"The goal is to contact the customers to sell them only one product, so you cannot select all of them.\n",
"This increases the complexity of the problem: you need to determine the best contact channel, but also need to select which product will be sold to a given customer. \n",
"It may be hard to compute this. In order to check, you will use two techniques:\n",
" * a greedy algorithm\n",
" * CPLEX, the IBM leading optimization solver."
]
Expand All @@ -1043,12 +1042,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get business decisions on the 2017 data\n",
"# Get business decisions for the 2017 data\n",
"## Assign campaigns to customers\n",
"\n",
"* We have predicted who will buy what in the list of new customers.\n",
"* However, we do not have the budget to contact all of them. We have various contact channels with different costs and effectiveness.\n",
"* Furthermore, if we contact somebody, we don't want to frustrate them by proposing multiple products; we want to propose only one product per customer.\n",
"* You have a prediction of who will buy what in the list of new customers.\n",
"* However, you do not have the budget to contact all of them. You have various contact channels with different costs and effectiveness.\n",
"* Furthermore, if you contact a customer, you want to propose only one product per customer.\n",
"\n",
"##### Some input data for optimization\n"
]
Expand Down Expand Up @@ -1083,7 +1082,7 @@
"metadata": {},
"source": [
"#### Using a greedy algorithm\n",
"* We create a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
"* You are creating a custom algorithm that ensures 10% of offers are made per channel by choosing the most promising per channel. The algorithm then continues to add offers until the budget is reached."
]
},
{
Expand Down Expand Up @@ -1251,7 +1250,7 @@
"source": [
"#### Using IBM Decision Optimization CPLEX Modeling for Python\n",
"\n",
"Let's create the optimization model to select the best ways to contact customers and stay within the limited budget."
"Create the optimization model to select the best ways to contact customers and stay within the limited budget."
]
},
{
Expand Down Expand Up @@ -1402,7 +1401,7 @@
"source": [
"##### Express the objective\n",
"\n",
"We want to maximize expected revenue, so we take into account the predicted behavior of each customer for each product."
"You want to maximize expected revenue, so you take into account the predicted behavior of each customer for each product."
]
},
{
Expand Down Expand Up @@ -1580,7 +1579,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"With the mathematical optimization, we made a better selection of customers."
"With the mathematical optimization, you made a better selection of customers."
]
},
{
Expand Down Expand Up @@ -1871,7 +1870,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Due to the business constraints, we can address a maximum of 1680 customers with a \\$35615 budget.\n",
"Due to the business constraints, you can address a maximum of 1680 customers with a \\$35615 budget.\n",
"Any funds available above that amount won't be spent.\n",
"The expected revenue is \\$87.1K."
]
Expand All @@ -1881,8 +1880,8 @@
"metadata": {},
"source": [
"### Dealing with infeasibility\n",
"What about a context where we are in tight financial conditions, and our budget is very low?\n",
"We need to determine the minimum amount of budget needed to adress 1/20 of our customers."
"What about the context where you have tight financial conditions, and our budget is very low?\n",
"You need to determine the minimum amount of budget needed to address 1/20 of our customers."
]
},
{
Expand Down Expand Up @@ -1933,7 +1932,7 @@
" #setting all bool vars to 0 is an easy relaxation, so let's refuse it and force to offer something to 1/3 of the clients\n",
" mdl.add_constraint(totaloffers >= len(offers)//20, ctname=\"high\")\n",
" \n",
" # solve has failed, we try relaxation, based on constraint names\n",
" # solve has failed, trying relaxation, based on constraint names\n",
" # constraints are prioritized according to their names\n",
" # if a name contains \"low\", it has priority LOW\n",
" # if a ct name contains \"medium\" it has priority MEDIUM\n",
Expand Down Expand Up @@ -1987,8 +1986,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
"With this minimal budget, we will be able to adress 825 possible clients."
"You need a minimum of 15950\\$ to be able to start a marketing campaign.\n",
"With this minimal budget, you will be able to address 825 possible clients."
]
},
{
Expand All @@ -2005,17 +2004,17 @@
"| Greedy | 50800 | 1123 | 299 | 111 | 713 | 21700 |\n",
"| CPLEX | 72600 | 1218 | 381 | 117 | 691 | 25000 |\n",
"\n",
"* As you can see, with Decision Optimization, we can safely do this marketing campaign to contact <b>1218 customers</b> out of the 2756 customers. \n",
"* This will lead to a <b>\\$91.5K revenue</b>, significantly greater than the \\$49.5K revenue given by a greedy algorithm.\n",
"* With a greedy algorithm, we will:\n",
"* As you can see, with Decision Optimization, you can safely use this marketing campaign to contact <b>1218 customers</b> out of the 2756 customers. \n",
"* This will lead to a <b>\\$72.6K revenue</b>, significantly greater than the \\$50.8K revenue given by a greedy algorithm.\n",
"* With a greedy algorithm, you will:\n",
" * be unable to focus on the correct customers (it will select fewer of them), \n",
" * spend less of the available budget for a smaller revenue.\n",
" * focus on selling savings accounts that have the biggest revenue\n",
"\n",
"### Marketing campaign analysis\n",
"* We need a <b>minimum of \\$16K</b> to be able to start a valid campaign and we expect it will generate \\$47.5K.\n",
"* You need a <b>minimum of \\$16K</b> to be able to start a valid campaign and you expect it will generate \\$47.5K.\n",
"\n",
"* Due to the business constraints, we will be able to address <b>1680 customers maximum</b> using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
"* Due to the business constraints, you will be able to address <b>1680 customers maximum</b> using a budget of \\$36K. Any money above that amount won't be spent. The expected revenue is \\$87K.\n"
]
},
{
Expand Down