|
1 |
| -# ibm-streams-with-ml-model |
2 |
| -Use a machine learning model in an IBM Streams application |
| 1 | +*** WORK-IN-PROGRESS *** |
| 2 | +# Score streaming data with a machine learning model |
| 3 | + |
| 4 | +In this code pattern, we will be streaming online shopping data and using the data to track the products that each customer has added to their cart. We will build a k-means clustering model with scikit-learn to group customers according to the contents of their shopping carts. The cluster assignment can be used to predict additional products to recommend. |
| 5 | + |
| 6 | +Our application will be built using IBM Streams on IBM Cloud Pak for Data. IBM Streams provides a built-in IDE, called **Streams flows**, that allows developers to visually create their streaming application. The Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter notebooks, and machine learning. |
| 7 | + |
| 8 | +To build and deploy our machine learning model, we will use a Jupyter notebook in Watson Studio and a Watson Machine Learning instance. In our examples, both are running on IBM Cloud Pak for Data. |
| 9 | + |
| 10 | +Using the Streams Flows editor, we will create a streaming application with the following operators: |
| 11 | + |
| 12 | +* A `Source` operator that generates sample clickstream data |
| 13 | +* A `Filter` operator that keeps only the "add to cart" events |
| 14 | +* A `Code` operator where we use Python code to arrange the shopping cart items into an input array for scoring |
| 15 | +* A `WML Deployment` operator to assign the customer to a cluster |
| 16 | +* A `Debug` operator to demonstrate the results |
| 17 | + |
| 18 | +## Flow |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +1. User builds and deploys a machine learning model |
| 23 | +1. User creates and runs an IBM Streams application |
| 24 | +1. The Streams Flow UI shows streaming, filtering, and scoring in action |
| 25 | + |
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +* IBM Cloud Pak for Data with Watson Studio, Watson Machine Learning, IBM Streams and Streams Flows. |
| 29 | + |
| 30 | +## Steps |
| 31 | + |
| 32 | +1. [Verify access to your IBM Streams instance on Cloud Pak for Data](#1-Verify-access-to-your-IBM-Streams-instance-on-Cloud-Pak-for-Data) |
| 33 | +1. [Create a new project in Cloud Pak for Data](#2-create-a-new-project-in-cloud-pak-for-data) |
| 34 | +1. [Build and store a model](#2-build-and-store-a-model) |
| 35 | + |
| 36 | +1. [Create a Streams Flow in Cloud Pak for Data](#6-create-a-streams-flow-in-cloud-pak-for-data) |
| 37 | +1. [Create a Streams Flow with Kafka as source](#7-create-a-streams-flow-with-kafka-as-source) |
| 38 | +1. [Use Streams Flows option to generate a notebook](#8-use-streams-flows-option-to-generate-a-notebook) |
| 39 | +1. [Run the generated Streams Flow notebook](#9-run-the-generated-streams-flow-notebook) |
| 40 | + |
| 41 | +### 1. Verify access to your IBM Streams instance on Cloud Pak for Data |
| 42 | + |
| 43 | +Once you login to your `Cloud Pak for Data` instance, ensure that your administrator has provisioned an instance of `IBM Streams`, and has given your user has access to the instance. |
| 44 | + |
| 45 | +To see the available services, click on the `Services` icon. Search for `Streams`. You should see an `Enabled` indicator for Streams. `Watson Studio` and `Watson Machine Learning` also need to be enabled to build and deploy the model. |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +To see your provisioned instances, click on the (`☰`) menu icon in the top left corner of your screen and click `My Instances`. |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | +### 2. Create a new project in Cloud Pak for Data |
| 54 | + |
| 55 | +Click on the (`☰`) menu icon in the top left corner of your screen and click `Projects`. |
| 56 | + |
| 57 | +Click on `New project +`. Then select `Create an empty project` and enter a unique name for your project. Click on `Create` to complete your project creation. |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +### 2. Build and store a model |
| 62 | + |
| 63 | +We will build a model using a Jupyter notebook and scikit-learn. We're using a k-means classifier to group customers based on the contents of their shopping carts. Later, we will use that model to predict which group a customer is most likely to go in so that we can anticipate additional products to recommend. |
| 64 | + |
| 65 | +Once we have built and stored the model, it will be available for deployment so that it can be used in our streaming application. |
| 66 | + |
| 67 | +#### Import the notebook into your project |
| 68 | + |
| 69 | +From the project `Assets` tab, click `Add to project +` on the top right and choose the `Notebook` asset type. |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +Fill in the following information: |
| 74 | + |
| 75 | +* Select the `From URL` tab. [1] |
| 76 | +* Enter a `Name` for the notebook and optionally a description. [2] |
| 77 | +* For `Select runtime` select the `Default Python 3.6` option. [3] |
| 78 | +* Under `Notebook URL` provide the following url [4]: |
| 79 | + ```url |
| 80 | + https://raw.githubusercontent.com/IBM/ibm-streams-with-ml-model/master/notebooks/shopping_cart_kmeans_cluster_model.ipynb |
| 81 | + ``` |
| 82 | +  |
| 83 | + |
| 84 | +Click the `Create notebook` button. |
| 85 | + |
| 86 | +#### Edit the notebook |
| 87 | + |
| 88 | +When you import the notebook you will be put in edit mode. Before running the notebook, you need to configure one thing. Edit the `WML Credentials` cell to set the `url` to the URL you use to access Cloud Pak for Data. |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | +#### Run the notebook |
| 93 | + |
| 94 | +Select `Cell > Run All` to run the notebook. If you prefer, you can use the `Run` button to run the cells one at a time. The notebook contains additional details about what it is doing. Here we are focusing on using IBM Streams with the resulting ML Model so we'll continue on once the notebook has completed. |
| 95 | + |
| 96 | +### 2. Associate the deployment space with the project |
| 97 | + |
| 98 | +The notebook created a deployment space named "Shopping Cart k-means Model" and stored the model there. |
| 99 | + |
| 100 | +Inside your new project, select the `Settings` tab and click on `Associate a deployment space +`. |
| 101 | + |
| 102 | + |
| 103 | + |
| 104 | +* Use `Existing` tab |
| 105 | +* Select the `streams_ml_deployment_space` which was just created |
| 106 | +* Click `Associate` |
| 107 | + |
| 108 | +### 3. Deploy the model |
| 109 | + |
| 110 | +* In your project, click on the newly associated deployment space named `streams_ml_deployment_space`. |
| 111 | +* Select the `Assets` tab and click on the model named `Shopping Cart k-means Model`. |
| 112 | +* Click on the `Create deployment` button. |
| 113 | +* Select `Online`, provide a deployment name, and click `Create`. |
| 114 | + |
| 115 | +### 6. Create a Streams Flow |
| 116 | + |
| 117 | +From the project panel, click the `Add to project +` button. Choose the `Streams flow` tile from the list of options. |
| 118 | + |
| 119 | + |
| 120 | + |
| 121 | +Provide a unique name for the flow, and select the create `Manually` option. Then click `Create` to create the flow. |
| 122 | + |
| 123 | + |
| 124 | + |
| 125 | +Once created, it will be displayed in the `Streams flow` editor. |
| 126 | + |
| 127 | + |
| 128 | + |
| 129 | +On the left are the operators that can drag-n-dropped onto the editor canvas. The operators are divided into type. For the purpose of this code pattern, we will use `Sources`, `Targets`, `Processing and Analytics`, and `WML Deployments`. |
| 130 | + |
| 131 | +In the main icon bar at the top, you will see the options to `run` and `save` the flow. |
| 132 | + |
| 133 | +#### Add sample clickstream data as data source |
| 134 | + |
| 135 | +From the `Sources` list, select and drag the `Sample Data` operator onto the canvas. |
| 136 | + |
| 137 | + |
| 138 | + |
| 139 | +Click on the canvas object to see its associated properties. From the list of available data types in the `Topic` drop-down list, select `Clickstream`. |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | +#### Add a Filter operator |
| 144 | + |
| 145 | +From the `Processing and Analytics` list, select and drag the `Filter` operator onto the canvas. |
| 146 | + |
| 147 | +* To connect the operators, click on the right-hand circle icon located in the source operator, and drap your mouse to the filter operator. This should result in a line connecting the two operators. |
| 148 | + |
| 149 | +* Click on the Filter operator to see its associated properties. Set the `Condition Expression` to `click_event_type == 'add_to_cart'`. This will reduce the data stream to only the add_to_cart events. |
| 150 | + |
| 151 | + |
| 152 | + |
| 153 | +#### Notice error indicators |
| 154 | + |
| 155 | +As you've probably already noticed, red error indicators tell you when required settings are missing. For example, we've seen settings that were required. You will also see that some operators require a source and/or target connection. |
| 156 | + |
| 157 | +When an operator has a red spot on it, you can hover over it to see what the erros are. |
| 158 | + |
| 159 | +#### Add a Code operator |
| 160 | + |
| 161 | +* From the `Processing and Analytics` list, select and drag the `Code` operator onto the canvas |
| 162 | +* Connect the `Filter` operator's target to this `Code` operator's source (using drag-and-drop like we did earlier) |
| 163 | +* Click on the Code operator to see its associated properties. Select `Python 3.6` as the `Coding Language`. |
| 164 | +* In the `Code` property, paste in the following code: |
| 165 | + ```python |
| 166 | + YOU MUST EDIT THE SCHEMA and add all attributes that you are returning as output. |
| 167 | + # |
| 168 | + # Preinstalled Python packages can be viewed from the Settings pane. |
| 169 | + # In the Settings pane you can also install additional Python packages. |
| 170 | + |
| 171 | + import sys |
| 172 | + import logging |
| 173 | + |
| 174 | + # Use this logger for tracing or debugging your code: |
| 175 | + logger = logging.getLogger(__name__) |
| 176 | + # Example: |
| 177 | + # logger.info('Got to step 2...') |
| 178 | + |
| 179 | + # init() function will be called once on flow initialization |
| 180 | + # @state a Python dictionary object for keeping state. The state object is passed to the process function |
| 181 | + def init(state): |
| 182 | + # do something once on flow initialization and save in the state object |
| 183 | + state['keep_columns'] = ['Baby Food','Diapers','Formula','Lotion','Baby wash','Wipes','Fresh Fruits','Fresh Vegetables','Beer','Wine','Club Soda','Sports Drink','Chips','Popcorn','Oatmeal','Medicines','Canned Foods','Cigarettes','Cheese','Cleaning Products','Condiments','Frozen Foods','Kitchen Items','Meat','Office Supplies','Personal Care','Pet Supplies','Sea Food','Spices'] |
| 184 | + state['empty_cart'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 185 | + state['customer_carts'] = {} |
| 186 | + pass |
| 187 | + |
| 188 | + # process() function will be invoked on every event tuple |
| 189 | + # @event a Python dictionary object representing the input event tuple as defined by the input schema |
| 190 | + # @state a Python dictionary object for keeping state over subsequent function calls |
| 191 | + # return must be a Python dictionary object. It will be the output of this operator. |
| 192 | + # Returning None results in not submitting an output tuple for this invocation. |
| 193 | + # You must declare all output attributes in the Edit Schema window. |
| 194 | + def process(event, state): |
| 195 | + # Enrich the event, such as by: |
| 196 | + # event['wordCount'] = len(event['phrase'].split()) |
| 197 | + logger.info(event) |
| 198 | + # TODO: check for > 0? |
| 199 | + customer_id = event['customer_id'] |
| 200 | + product_index = state['keep_columns'].index(event['product_name']) |
| 201 | + try: |
| 202 | + cart = state['customer_carts'][event['customer_id']] |
| 203 | + except KeyError: |
| 204 | + cart = state['empty_cart'] |
| 205 | + if product_index > -1: |
| 206 | + cart[product_index] = 1 |
| 207 | + state['customer_carts'][event['customer_id']] = cart |
| 208 | + return { 'customer_id': customer_id, 'cart_list': cart } |
| 209 | + ``` |
| 210 | +* Edit the `Output Schema` and set it as follows and click `Apply`: |
| 211 | + | Attribute Name | Type | |
| 212 | + | --- | --- | |
| 213 | + | customer_id | Number | |
| 214 | + | cart_list | Text | |
| 215 | + |
| 216 | +The output now contains just the customer_id an array indicating which products are in the cart. This is the format we needed to pass to our model. |
| 217 | + |
| 218 | +> Notice: We used the Code operator specifically to arrange our shopping cart data for scoring, but if you take another look at the Code operator you'll see that it is a very powerful operator where you can put in the Python code to do whatever manipulation you need in your streaming application. |
| 219 | +
|
| 220 | +#### Add a WML Deployment |
| 221 | + |
| 222 | +* From the `WML Deployments` list, select your deployment and drag it onto the canvas. |
| 223 | +* Connect the target end of the `Code` operator, with the source end of the WML Deployment. |
| 224 | +* Click on the deployment operator to edit the settings. |
| 225 | +* Under `DEPLOYMENT INPUT PARAMETERS`, set `input_cart (array)` to `cart_list`. This maps our Code output array to the expected input parameter for prediction. |
| 226 | +* Edit the `Output schema`. |
| 227 | +* Click `Add attributes from incoming schema`. |
| 228 | +* Click `Add attribute +`. |
| 229 | + * Set `Attribute Name` to `prediction`. |
| 230 | + * Set `Type` to `Number`. |
| 231 | + * Set `Model Field` to `prediction`. |
| 232 | + |
| 233 | +#### Add Debug operator as target |
| 234 | + |
| 235 | +For simplicity, we will assign a `Debug` operator as the target of our WML Deployment. |
| 236 | + |
| 237 | +From the `Targets` list, select and drag the `Debug` operator onto the canvas, and then connect the two object together. |
| 238 | + |
| 239 | +<!--  --> |
| 240 | + |
| 241 | +#### Run the streams flow |
| 242 | + |
| 243 | +Click the `run` button to start the flow. |
| 244 | + |
| 245 | +#### Save and run |
| 246 | + |
| 247 | +Click the `Save and run` icon to start your flow. This will result in a new panel being displayed that shows real-time metrics. What is displayed is a live data stream. If you click on the stream between any two operator nodes, you can see the actual data - in a table view or in JSON format. |
| 248 | + |
| 249 | + |
| 250 | + |
| 251 | +Use the `Stop` icon to stop the flow, and the `Pencil` icon to return to the flow editor. |
| 252 | + |
| 253 | +## License |
| 254 | + |
| 255 | +This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). |
| 256 | + |
| 257 | +[Apache License FAQ](https://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN) |
0 commit comments