Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Commit 9e01d99

Browse files
committed
Initial commit with usual files and w-i-p README, notebook...
* Saving a version of the notebook to create the model. * W-I-P describing creating the streaming app in the Flow UI.
1 parent f764498 commit 9e01d99

24 files changed

+16002
-2
lines changed

ACKNOWLEDGEMENTS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
# Acknowledgements
3+
4+
* Leveraged a lot of work from Rich Hagarty's ibm-streams-with-kafka.
5+
* Training data is from the old product recommender pattern.

CONTRIBUTING.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Contributing
2+
3+
This is an open source project, and we appreciate your help!
4+
5+
We use the GitHub issue tracker to discuss new features and non-trivial bugs.
6+
7+
In addition to the issue tracker, [#journeys on
8+
Slack](https://dwopen.slack.com) is the best way to get into contact with the
9+
project's maintainers.
10+
11+
To contribute code, documentation, or tests, please submit a pull request to
12+
the GitHub repository. Generally, we expect two maintainers to review your pull
13+
request before it is approved for merging. For more details, see the
14+
[MAINTAINERS](MAINTAINERS.md) page.
15+
16+
Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt).

MAINTAINERS.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Maintainers Guide
2+
3+
This guide is intended for maintainers - anybody with commit access to one or
4+
more Code Pattern repositories.
5+
6+
## Methodology
7+
8+
This repository does not have a traditional release management cycle, but
9+
should instead be maintained as a useful, working, and polished reference at
10+
all times. While all work can therefore be focused on the master branch, the
11+
quality of this branch should never be compromised.
12+
13+
The remainder of this document details how to merge pull requests to the
14+
repositories.
15+
16+
## Merge approval
17+
18+
The project maintainers use LGTM (Looks Good To Me) in comments on the pull
19+
request to indicate acceptance prior to merging. A change requires LGTMs from
20+
two project maintainers. If the code is written by a maintainer, the change
21+
only requires one additional LGTM.
22+
23+
## Reviewing Pull Requests
24+
25+
We recommend reviewing pull requests directly within GitHub. This allows a
26+
public commentary on changes, providing transparency for all users. When
27+
providing feedback be civil, courteous, and kind. Disagreement is fine, so long
28+
as the discourse is carried out politely. If we see a record of uncivil or
29+
abusive comments, we will revoke your commit privileges and invite you to leave
30+
the project.
31+
32+
During your review, consider the following points:
33+
34+
### Does the change have positive impact?
35+
36+
Some proposed changes may not represent a positive impact to the project. Ask
37+
whether or not the change will make understanding the code easier, or if it
38+
could simply be a personal preference on the part of the author (see
39+
[bikeshedding](https://en.wiktionary.org/wiki/bikeshedding)).
40+
41+
Pull requests that do not have a clear positive impact should be closed without
42+
merging.
43+
44+
### Do the changes make sense?
45+
46+
If you do not understand what the changes are or what they accomplish, ask the
47+
author for clarification. Ask the author to add comments and/or clarify test
48+
case names to make the intentions clear.
49+
50+
At times, such clarification will reveal that the author may not be using the
51+
code correctly, or is unaware of features that accommodate their needs. If you
52+
feel this is the case, work up a code sample that would address the pull
53+
request for them, and feel free to close the pull request once they confirm.
54+
55+
### Does the change introduce a new feature?
56+
57+
For any given pull request, ask yourself "is this a new feature?" If so, does
58+
the pull request (or associated issue) contain narrative indicating the need
59+
for the feature? If not, ask them to provide that information.
60+
61+
Are new unit tests in place that test all new behaviors introduced? If not, do
62+
not merge the feature until they are! Is documentation in place for the new
63+
feature? (See the documentation guidelines). If not do not merge the feature
64+
until it is! Is the feature necessary for general use cases? Try and keep the
65+
scope of any given component narrow. If a proposed feature does not fit that
66+
scope, recommend to the user that they maintain the feature on their own, and
67+
close the request. You may also recommend that they see if the feature gains
68+
traction among other users, and suggest they re-submit when they can show such
69+
support.

README.md

Lines changed: 257 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,257 @@
1-
# ibm-streams-with-ml-model
2-
Use a machine learning model in an IBM Streams application
1+
*** WORK-IN-PROGRESS ***
2+
# Score streaming data with a machine learning model
3+
4+
In this code pattern, we will be streaming online shopping data and using the data to track the products that each customer has added to their cart. We will build a k-means clustering model with scikit-learn to group customers according to the contents of their shopping carts. The cluster assignment can be used to predict additional products to recommend.
5+
6+
Our application will be built using IBM Streams on IBM Cloud Pak for Data. IBM Streams provides a built-in IDE, called **Streams flows**, that allows developers to visually create their streaming application. The Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter notebooks, and machine learning.
7+
8+
To build and deploy our machine learning model, we will use a Jupyter notebook in Watson Studio and a Watson Machine Learning instance. In our examples, both are running on IBM Cloud Pak for Data.
9+
10+
Using the Streams Flows editor, we will create a streaming application with the following operators:
11+
12+
* A `Source` operator that generates sample clickstream data
13+
* A `Filter` operator that keeps only the "add to cart" events
14+
* A `Code` operator where we use Python code to arrange the shopping cart items into an input array for scoring
15+
* A `WML Deployment` operator to assign the customer to a cluster
16+
* A `Debug` operator to demonstrate the results
17+
18+
## Flow
19+
20+
![architecure](doc/source/images/architecture.png)
21+
22+
1. User builds and deploys a machine learning model
23+
1. User creates and runs an IBM Streams application
24+
1. The Streams Flow UI shows streaming, filtering, and scoring in action
25+
26+
## Prerequisites
27+
28+
* IBM Cloud Pak for Data with Watson Studio, Watson Machine Learning, IBM Streams and Streams Flows.
29+
30+
## Steps
31+
32+
1. [Verify access to your IBM Streams instance on Cloud Pak for Data](#1-Verify-access-to-your-IBM-Streams-instance-on-Cloud-Pak-for-Data)
33+
1. [Create a new project in Cloud Pak for Data](#2-create-a-new-project-in-cloud-pak-for-data)
34+
1. [Build and store a model](#2-build-and-store-a-model)
35+
36+
1. [Create a Streams Flow in Cloud Pak for Data](#6-create-a-streams-flow-in-cloud-pak-for-data)
37+
1. [Create a Streams Flow with Kafka as source](#7-create-a-streams-flow-with-kafka-as-source)
38+
1. [Use Streams Flows option to generate a notebook](#8-use-streams-flows-option-to-generate-a-notebook)
39+
1. [Run the generated Streams Flow notebook](#9-run-the-generated-streams-flow-notebook)
40+
41+
### 1. Verify access to your IBM Streams instance on Cloud Pak for Data
42+
43+
Once you login to your `Cloud Pak for Data` instance, ensure that your administrator has provisioned an instance of `IBM Streams`, and has given your user has access to the instance.
44+
45+
To see the available services, click on the `Services` icon. Search for `Streams`. You should see an `Enabled` indicator for Streams. `Watson Studio` and `Watson Machine Learning` also need to be enabled to build and deploy the model.
46+
47+
![catalog_streams.png](doc/source/images/catalog_streams.png)
48+
49+
To see your provisioned instances, click on the (``) menu icon in the top left corner of your screen and click `My Instances`.
50+
51+
![cpd-streams-instance](doc/source/images/cpd-streams-instance.png)
52+
53+
### 2. Create a new project in Cloud Pak for Data
54+
55+
Click on the (``) menu icon in the top left corner of your screen and click `Projects`.
56+
57+
Click on `New project +`. Then select `Create an empty project` and enter a unique name for your project. Click on `Create` to complete your project creation.
58+
59+
![new-project](doc/source/images/new-project.png)
60+
61+
### 2. Build and store a model
62+
63+
We will build a model using a Jupyter notebook and scikit-learn. We're using a k-means classifier to group customers based on the contents of their shopping carts. Later, we will use that model to predict which group a customer is most likely to go in so that we can anticipate additional products to recommend.
64+
65+
Once we have built and stored the model, it will be available for deployment so that it can be used in our streaming application.
66+
67+
#### Import the notebook into your project
68+
69+
From the project `Assets` tab, click `Add to project +` on the top right and choose the `Notebook` asset type.
70+
71+
![add-notebook.png](doc/source/images/add-notebook.png)
72+
73+
Fill in the following information:
74+
75+
* Select the `From URL` tab. [1]
76+
* Enter a `Name` for the notebook and optionally a description. [2]
77+
* For `Select runtime` select the `Default Python 3.6` option. [3]
78+
* Under `Notebook URL` provide the following url [4]:
79+
```url
80+
https://raw.githubusercontent.com/IBM/ibm-streams-with-ml-model/master/notebooks/shopping_cart_kmeans_cluster_model.ipynb
81+
```
82+
![new-notebook](doc/source/images/new-notebook.png)
83+
84+
Click the `Create notebook` button.
85+
86+
#### Edit the notebook
87+
88+
When you import the notebook you will be put in edit mode. Before running the notebook, you need to configure one thing. Edit the `WML Credentials` cell to set the `url` to the URL you use to access Cloud Pak for Data.
89+
90+
![wml_creds.png](doc/source/images/wml_creds.png)
91+
92+
#### Run the notebook
93+
94+
Select `Cell > Run All` to run the notebook. If you prefer, you can use the `Run` button to run the cells one at a time. The notebook contains additional details about what it is doing. Here we are focusing on using IBM Streams with the resulting ML Model so we'll continue on once the notebook has completed.
95+
96+
### 2. Associate the deployment space with the project
97+
98+
The notebook created a deployment space named "Shopping Cart k-means Model" and stored the model there.
99+
100+
Inside your new project, select the `Settings` tab and click on `Associate a deployment space +`.
101+
102+
![deployment_space.png](doc/source/images/deployment_space.png)
103+
104+
* Use `Existing` tab
105+
* Select the `streams_ml_deployment_space` which was just created
106+
* Click `Associate`
107+
108+
### 3. Deploy the model
109+
110+
* In your project, click on the newly associated deployment space named `streams_ml_deployment_space`.
111+
* Select the `Assets` tab and click on the model named `Shopping Cart k-means Model`.
112+
* Click on the `Create deployment` button.
113+
* Select `Online`, provide a deployment name, and click `Create`.
114+
115+
### 6. Create a Streams Flow
116+
117+
From the project panel, click the `Add to project +` button. Choose the `Streams flow` tile from the list of options.
118+
119+
![add-streams-flow-type](doc/source/images/add-streams-flow-type.png)
120+
121+
Provide a unique name for the flow, and select the create `Manually` option. Then click `Create` to create the flow.
122+
123+
![create-streams-flow](doc/source/images/create_streams_flow.png)
124+
125+
Once created, it will be displayed in the `Streams flow` editor.
126+
127+
![blank-streams-flow](doc/source/images/blank-streams-flow.png)
128+
129+
On the left are the operators that can drag-n-dropped onto the editor canvas. The operators are divided into type. For the purpose of this code pattern, we will use `Sources`, `Targets`, `Processing and Analytics`, and `WML Deployments`.
130+
131+
In the main icon bar at the top, you will see the options to `run` and `save` the flow.
132+
133+
#### Add sample clickstream data as data source
134+
135+
From the `Sources` list, select and drag the `Sample Data` operator onto the canvas.
136+
137+
![add-sample-data-source](doc/source/images/add-sample-data-source.png)
138+
139+
Click on the canvas object to see its associated properties. From the list of available data types in the `Topic` drop-down list, select `Clickstream`.
140+
141+
![set-sample-data-topic](doc/source/images/set-sample-data-topic.png)
142+
143+
#### Add a Filter operator
144+
145+
From the `Processing and Analytics` list, select and drag the `Filter` operator onto the canvas.
146+
147+
* To connect the operators, click on the right-hand circle icon located in the source operator, and drap your mouse to the filter operator. This should result in a line connecting the two operators.
148+
149+
* Click on the Filter operator to see its associated properties. Set the `Condition Expression` to `click_event_type == 'add_to_cart'`. This will reduce the data stream to only the add_to_cart events.
150+
151+
![add-filter-target](doc/source/images/add-filter-target.png)
152+
153+
#### Notice error indicators
154+
155+
As you've probably already noticed, red error indicators tell you when required settings are missing. For example, we've seen settings that were required. You will also see that some operators require a source and/or target connection.
156+
157+
When an operator has a red spot on it, you can hover over it to see what the erros are.
158+
159+
#### Add a Code operator
160+
161+
* From the `Processing and Analytics` list, select and drag the `Code` operator onto the canvas
162+
* Connect the `Filter` operator's target to this `Code` operator's source (using drag-and-drop like we did earlier)
163+
* Click on the Code operator to see its associated properties. Select `Python 3.6` as the `Coding Language`.
164+
* In the `Code` property, paste in the following code:
165+
```python
166+
YOU MUST EDIT THE SCHEMA and add all attributes that you are returning as output.
167+
#
168+
# Preinstalled Python packages can be viewed from the Settings pane.
169+
# In the Settings pane you can also install additional Python packages.
170+
171+
import sys
172+
import logging
173+
174+
# Use this logger for tracing or debugging your code:
175+
logger = logging.getLogger(__name__)
176+
# Example:
177+
# logger.info('Got to step 2...')
178+
179+
# init() function will be called once on flow initialization
180+
# @state a Python dictionary object for keeping state. The state object is passed to the process function
181+
def init(state):
182+
# do something once on flow initialization and save in the state object
183+
state['keep_columns'] = ['Baby Food','Diapers','Formula','Lotion','Baby wash','Wipes','Fresh Fruits','Fresh Vegetables','Beer','Wine','Club Soda','Sports Drink','Chips','Popcorn','Oatmeal','Medicines','Canned Foods','Cigarettes','Cheese','Cleaning Products','Condiments','Frozen Foods','Kitchen Items','Meat','Office Supplies','Personal Care','Pet Supplies','Sea Food','Spices']
184+
state['empty_cart'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
185+
state['customer_carts'] = {}
186+
pass
187+
188+
# process() function will be invoked on every event tuple
189+
# @event a Python dictionary object representing the input event tuple as defined by the input schema
190+
# @state a Python dictionary object for keeping state over subsequent function calls
191+
# return must be a Python dictionary object. It will be the output of this operator.
192+
# Returning None results in not submitting an output tuple for this invocation.
193+
# You must declare all output attributes in the Edit Schema window.
194+
def process(event, state):
195+
# Enrich the event, such as by:
196+
# event['wordCount'] = len(event['phrase'].split())
197+
logger.info(event)
198+
# TODO: check for > 0?
199+
customer_id = event['customer_id']
200+
product_index = state['keep_columns'].index(event['product_name'])
201+
try:
202+
cart = state['customer_carts'][event['customer_id']]
203+
except KeyError:
204+
cart = state['empty_cart']
205+
if product_index > -1:
206+
cart[product_index] = 1
207+
state['customer_carts'][event['customer_id']] = cart
208+
return { 'customer_id': customer_id, 'cart_list': cart }
209+
```
210+
* Edit the `Output Schema` and set it as follows and click `Apply`:
211+
| Attribute Name | Type |
212+
| --- | --- |
213+
| customer_id | Number |
214+
| cart_list | Text |
215+
216+
The output now contains just the customer_id an array indicating which products are in the cart. This is the format we needed to pass to our model.
217+
218+
> Notice: We used the Code operator specifically to arrange our shopping cart data for scoring, but if you take another look at the Code operator you'll see that it is a very powerful operator where you can put in the Python code to do whatever manipulation you need in your streaming application.
219+
220+
#### Add a WML Deployment
221+
222+
* From the `WML Deployments` list, select your deployment and drag it onto the canvas.
223+
* Connect the target end of the `Code` operator, with the source end of the WML Deployment.
224+
* Click on the deployment operator to edit the settings.
225+
* Under `DEPLOYMENT INPUT PARAMETERS`, set `input_cart (array)` to `cart_list`. This maps our Code output array to the expected input parameter for prediction.
226+
* Edit the `Output schema`.
227+
* Click `Add attributes from incoming schema`.
228+
* Click `Add attribute +`.
229+
* Set `Attribute Name` to `prediction`.
230+
* Set `Type` to `Number`.
231+
* Set `Model Field` to `prediction`.
232+
233+
#### Add Debug operator as target
234+
235+
For simplicity, we will assign a `Debug` operator as the target of our WML Deployment.
236+
237+
From the `Targets` list, select and drag the `Debug` operator onto the canvas, and then connect the two object together.
238+
239+
<!-- ![add-debug-target](doc/source/images/add-debug-target.png) -->
240+
241+
#### Run the streams flow
242+
243+
Click the `run` button to start the flow.
244+
245+
#### Save and run
246+
247+
Click the `Save and run` icon to start your flow. This will result in a new panel being displayed that shows real-time metrics. What is displayed is a live data stream. If you click on the stream between any two operator nodes, you can see the actual data - in a table view or in JSON format.
248+
249+
![streams-flow-show-data](doc/source/images/streams-flow-show-data.png)
250+
251+
Use the `Stop` icon to stop the flow, and the `Pencil` icon to return to the flow editor.
252+
253+
## License
254+
255+
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt).
256+
257+
[Apache License FAQ](https://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN)

0 commit comments

Comments
 (0)