Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Commit 7326d1d

Browse files
authored
Merge pull request #7 from IBM/results
Fix code and improve the results and explanation.
2 parents c79f3f7 + 5dd7ece commit 7326d1d

File tree

3 files changed

+55
-52
lines changed

3 files changed

+55
-52
lines changed

README.md

Lines changed: 55 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ Select `Cell > Run All` to run the notebook. If you prefer, you can use the `Run
8686

8787
The notebook created a deployment space named `ibm_streams_with_ml_model_deployment_space` and stored the model there.
8888

89-
Inside your new project, select the `Settings` tab and click on `Associate a deployment space +`.
89+
* Inside your new project, select the `Settings` tab and click on `Associate a deployment space +`.
9090

9191
![deployment_space.png](doc/source/images/deployment_space.png)
9292

@@ -129,7 +129,7 @@ In the main icon bar at the top, you will see the options to `run` and `save` th
129129

130130
#### Add sample clickstream data as data source
131131

132-
From the `Sources` list, select and drag the `Sample Data` operator onto the canvas.
132+
* From the `Sources` list, select and drag the `Sample Data` operator onto the canvas.
133133

134134
![add-sample-data-source](doc/source/images/add-sample-data-source.png)
135135

@@ -158,48 +158,47 @@ From the `Processing and Analytics` list, select and drag the `Filter` operator
158158
* If you have a `Code Style` pulldown, select `Function`.
159159
* In the `Code` property, paste in the following code:
160160

161-
```python
162-
#
163-
# Preinstalled Python packages can be viewed from the Settings pane.
164-
# In the Settings pane you can also install additional Python packages.
165-
166-
import sys
167-
import logging
168-
169-
# Use this logger for tracing or debugging your code:
170-
logger = logging.getLogger(__name__)
171-
# Example:
172-
# logger.info('Got to step 2...')
173-
174-
# init() function will be called once on flow initialization
175-
# @state a Python dictionary object for keeping state. The state object is passed to the process function
176-
def init(state):
177-
# do something once on flow initialization and save in the state object
178-
state['keep_columns'] = ['Baby Food','Diapers','Formula','Lotion','Baby wash','Wipes','Fresh Fruits','Fresh Vegetables','Beer','Wine','Club Soda','Sports Drink','Chips','Popcorn','Oatmeal','Medicines','Canned Foods','Cigarettes','Cheese','Cleaning Products','Condiments','Frozen Foods','Kitchen Items','Meat','Office Supplies','Personal Care','Pet Supplies','Sea Food','Spices']
179-
state['empty_cart'] = [0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5]
180-
state['customer_carts'] = {}
181-
pass
182-
183-
# process() function will be invoked on every event tuple
184-
# @event a Python dictionary object representing the input event tuple as defined by the input schema
185-
# @state a Python dictionary object for keeping state over subsequent function calls
186-
# return must be a Python dictionary object. It will be the output of this operator.
187-
# Returning None results in not submitting an output tuple for this invocation.
188-
# You must declare all output attributes in the Edit Schema window.
189-
def process(event, state):
190-
# Enrich the event, such as by:
191-
# event['wordCount'] = len(event['phrase'].split())
192-
logger.info(event)
193-
customer_id = event['customer_id']
194-
product_index = state['keep_columns'].index(event['product_name'])
195-
try:
196-
cart = state['customer_carts'][event['customer_id']]
197-
except KeyError:
198-
cart = state['empty_cart']
199-
if product_index > -1:
200-
cart[product_index] = 1
201-
state['customer_carts'][event['customer_id']] = cart
202-
return { 'customer_id': customer_id, 'cart_list': cart }
161+
```python
162+
# Preinstalled Python packages can be viewed from the Settings pane.
163+
# In the Settings pane you can also install additional Python packages.
164+
165+
import sys
166+
import logging
167+
168+
# Use this logger for tracing or debugging your code:
169+
logger = logging.getLogger(__name__)
170+
# Example:
171+
# logger.info('Got to step 2...')
172+
173+
# init() function will be called once on flow initialization
174+
# @state a Python dictionary object for keeping state. The state object is passed to the process function
175+
def init(state):
176+
# do something once on flow initialization and save in the state object
177+
state['keep_columns'] = ['Baby Food','Diapers','Formula','Lotion','Baby wash','Wipes','Fresh Fruits','Fresh Vegetables','Beer','Wine','Club Soda','Sports Drink','Chips','Popcorn','Oatmeal','Medicines','Canned Foods','Cigarettes','Cheese','Cleaning Products','Condiments','Frozen Foods','Kitchen Items','Meat','Office Supplies','Personal Care','Pet Supplies','Sea Food','Spices']
178+
state['num_columns'] = len(state['keep_columns'])
179+
state['customer_carts'] = {}
180+
pass
181+
182+
# process() function will be invoked on every event tuple
183+
# @event a Python dictionary object representing the input event tuple as defined by the input schema
184+
# @state a Python dictionary object for keeping state over subsequent function calls
185+
# return must be a Python dictionary object. It will be the output of this operator.
186+
# Returning None results in not submitting an output tuple for this invocation.
187+
# You must declare all output attributes in the Edit Schema window.
188+
def process(event, state):
189+
# Enrich the event, such as by:
190+
# event['wordCount'] = len(event['phrase'].split())
191+
# logger.info(event)
192+
193+
customer_id = event['customer_id']
194+
cart = state['customer_carts'].get(customer_id, [0] * state['num_columns']) # Find cart or start empty
195+
product_index = state['keep_columns'].index(event['product_name'])
196+
if product_index > -1: # If product name recognized
197+
cart[product_index] = 1 # Mark product in cart
198+
state['customer_carts'][customer_id] = cart # Save cart
199+
200+
# Return customer_id and list indicating which products are in cart
201+
return { 'customer_id': customer_id, 'cart_list': cart }
203202
```
204203

205204
* Edit the `Output Schema` and set it as follows and click `Apply`:
@@ -219,29 +218,33 @@ The output now contains just the customer_id and an array indicating which produ
219218
* Click on the deployment operator to edit the settings.
220219
* Under `DEPLOYMENT INPUT PARAMETERS`, set `input_cart (array)` to `cart_list`. This maps our Code output array to the expected input parameter for prediction.
221220
* Edit the `Output schema`.
222-
* Click `Add attributes from incoming schema`.
223221
* Click `Add attribute +`.
224222
* Set `Attribute Name` to `prediction`.
225223
* Set `Type` to `Number`.
226224
* Set `Model Field` to `prediction`.
225+
* Click `Add attributes from incoming schema`.
227226

228227
#### Add Debug operator as target
229228

230229
For simplicity, we will assign a `Debug` operator as the target of our WML Deployment.
231230

232-
From the `Targets` list, select and drag the `Debug` operator onto the canvas, and then connect the two object together.
231+
* From the `Targets` list, select and drag the `Debug` operator onto the canvas, and then connect the two object together.
232+
233+
#### Save and run
233234

234-
#### Run the streams flow
235+
* Click the `Save and run` icon to start your flow.
235236

236-
Click the `run` button to start the flow.
237+
This will result in a new panel being displayed that shows real-time metrics. What is displayed is a live data stream. If you click on the stream between any two operator nodes, you can see the actual data - in a table view or in JSON format.
237238

238-
#### Save and run
239+
![results.png](doc/source/images/results.png)
240+
241+
If you watch the output stream from the `Code` operator, you'll see that we used some Python code to build an array indicating which products are in each customer's cart. This is the format we needed for the next operator.
239242

240-
Click the `Save and run` icon to start your flow. This will result in a new panel being displayed that shows real-time metrics. What is displayed is a live data stream. If you click on the stream between any two operator nodes, you can see the actual data - in a table view or in JSON format.
243+
If you watch the output stream from the Watson Machine Learning Deployment operator, you'll see that we used the k-means model (that we built and deployed) to add a `prediction` column to the data. This prediction indicates that this customer's cart is similar to other carts in this group. We could use this prediction to recommend products based on what other customers in this group frequently bought.
241244

242-
![streams-flow-show-data](doc/source/images/streams-flow-show-data.png)
245+
For now, the `Debug` operator is where we'll stop. We wanted to demonstrate enriching data-in-motion with a machine learning model. Using a sample data source and debug output allowed us to do that. Of course, a production application would use a real live data stream as input and would make the product recommendation available to customers in real-time. Source and target operators such as Kafka, MQTT, databases and also IBM Streams are typically used for this.
243246

244-
Use the `Stop` icon to stop the flow, and the `Pencil` icon to return to the flow editor.
247+
* Use the `Stop` icon to stop the flow
245248

246249
## License
247250

doc/source/images/results.png

151 KB
Loading
-163 KB
Binary file not shown.

0 commit comments

Comments
 (0)