NLU训练数据很慢 #4

jiangdongguo · 2019-03-29T08:32:17Z

你好！感谢你的项目，很赞。这里我有个问题，就是我在训练NLU数据的时候，总共51条，4个意图，10个实体，但是5个小时才能够训练完，有时候还会挂掉，请问是什么原因导致的？还是我在构造训练数据时对实体、意图的标志不对?期待解答，谢谢！

howl-anderson · 2019-03-29T10:16:26Z

出现错误时，是什么信息？你的机器什么配置？是否改动了项目的默认配置？

jiangdongguo · 2019-04-01T01:19:40Z

训练时卡在：
Loading model cost 0.874 seconds.
Prefix dict has been built succesfully.
Training to recognize 10 labels: 'item', 'loc', 'number', 'hello', 'name', 'affirm', 'finish', 'thank', 'bye', 'time'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 1
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.857143
C: 35 loss: 3 0.857143
C: 20 loss: 4.5 0.914286
C: 5 loss: 3 0.857143
C: 20 loss: 1.5 0.714286
C: 20 loss: 4.75 0.914286
C: 21.5 loss: 4.65 0.914286
C: 17.7498 loss: 4.60893 0.914286
C: 20 loss: 4.4 0.914286
C: 20.9071 loss: 4.45791 0.914286
best C: 20
best loss: 4.5
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.972222 1 0.985915
Part I: elapsed time: 1 seconds.

Part II: train segment classifier
now do training
num training samples: 36
这里就不动了，跑的是您的项目，所有配置都没改，只是训练的是自己的语料库：
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "车牌",
"intent": "request_search",
"entities": [
{
"start": 0,
"end": 2,
"value": "车牌",
"entity": "item"
}
]
},
{
"text": "帮我查个车",
"intent": "request_search",
"entities": [
{
"start": 4,
"end": 5,
"value": "车",
"entity": "item"
}
]
},
{
"text": "查车牌",
"intent": "request_search",
"entities": [
{
"start": 1,
"end": 3,
"value": "车牌",
"entity": "item"
}
]
},
{
"text": "搜索车牌号码",
"intent": "request_search",
"entities": [
{
"start": 2,
"end": 6,
"value": "车牌号码",
"entity": "item"
}
]
},
{
"text": "查看车牌号",
"intent": "request_search",
"entities": [
{
"start": 2,
"end": 5,
"value": "车牌号",
"entity": "item"
}
]
},
{
"text": "我想搜索车牌号码浙AB8888",
"intent": "request_search",
"entities": [
{
"start": 4,
"end": 8,
"value": "车牌号码",
"entity": "item"
},
{
"start": 8,
"end": 9,
"value": "浙",
"entity": "loc"
},
{
"start": 9,
"end": 15,
"value": "AB8888",
"entity": "number"
}
]
},
{
"text": "鲁JB1686",
"intent": "request_search",
"entities": [
{
"start": 1,
"end": 7,
"value": "JB1686",
"entity": "number"
},
{
"start": 0,
"end": 1,
"value": "鲁",
"entity": "loc"
}
]
},
{
"text": "hi",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "hi",
"entity": "hello"
}
]
},
{
"text": "嘿",
"intent": "greet",
"entities": []
},
{
"text": "嗨",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "嗨",
"entity": "hello"
}
]
},
{
"text": "hi 小智",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "hi",
"entity": "hello"
},
{
"start": 3,
"end": 5,
"value": "小智",
"entity": "name"
}
]
},
{
"text": "你好",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
}
]
},
{
"text": "你好小智",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
},
{
"start": 2,
"end": 4,
"value": "小智",
"entity": "name"
}
]
},
{
"text": "早",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "早",
"entity": "hello"
}
]
},
{
"text": "早，小丽",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "早",
"entity": "hello"
},
{
"start": 2,
"end": 4,
"value": "小丽",
"entity": "name"
}
]
},
{
"text": "你好啊",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
}
]
},
{
"text": "是的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "是",
"entity": "affirm"
}
]
},
{
"text": "对的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "对",
"entity": "affirm"
}
]
},
{
"text": "好的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "好",
"entity": "affirm"
}
]
},
{
"text": "算了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "算了",
"entity": "finish"
}
]
},
{
"text": "不用了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "不用",
"entity": "finish"
}
]
},
{
"text": "没事了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "没事",
"entity": "finish"
}
]
},
{
"text": "好的，谢谢你",
"intent": "thanks",
"entities": [
{
"start": 3,
"end": 5,
"value": "谢谢",
"entity": "thank"
}
]
},
{
"text": "谢谢",
"intent": "thanks",
"entities": [
{
"start": 0,
"end": 2,
"value": "谢谢",
"entity": "thank"
}
]
},
{
"text": "再见",
"intent": "say_bye",
"entities": [
{
"start": 0,
"end": 2,
"value": "再见",
"entity": "bye"
}
]
},
{
"text": "多谢啦",
"intent": "thanks",
"entities": [
{
"start": 0,
"end": 2,
"value": "多谢",
"entity": "thank"
}
]
},
{
"text": "拜拜",
"intent": "say_bye",
"entities": [
{
"start": 0,
"end": 2,
"value": "拜拜",
"entity": "bye"
}
]
},
{
"text": "下次见",
"intent": "say_bye",
"entities": []
},
{
"text": "晚上好",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "晚上",
"entity": "time"
},
{
"start": 2,
"end": 3,
"value": "好",
"entity": "affirm"
}
]
},
{
"text": "车牌号码",
"intent": "request_search",
"entities": [
{
"start": 0,
"end": 4,
"value": "车牌号码",
"entity": "item"
}
]
}
]
}
}

请问是什么问题？谢谢！

howl-anderson · 2019-04-01T03:08:30Z

SVM 训练起来确实特别慢，具体慢到什么程度和机器相关。目前来看只是慢，并没有什么bug。

jiangdongguo · 2019-04-01T03:11:45Z

谢谢回复！以下是我的机器配置：

i5-7300
8g
win10 64bit

这个跟实体和意图的个数是否有关系？nlu的语料库编写是否有准确的规则？

howl-anderson · 2019-04-01T07:01:26Z

这个跟实体和意图的个数是否有关系？

是的，有关系的

nlu的语料库编写是否有准确的规则？

一般建议按照真实数据的分布编写，项目早期可以先写一部分，根据实际效果再做调整

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLU训练数据很慢 #4

NLU训练数据很慢 #4

jiangdongguo commented Mar 29, 2019

howl-anderson commented Mar 29, 2019

jiangdongguo commented Apr 1, 2019

howl-anderson commented Apr 1, 2019

jiangdongguo commented Apr 1, 2019

howl-anderson commented Apr 1, 2019

NLU训练数据很慢 #4

NLU训练数据很慢 #4

Comments

jiangdongguo commented Mar 29, 2019

howl-anderson commented Mar 29, 2019

jiangdongguo commented Apr 1, 2019

howl-anderson commented Apr 1, 2019

jiangdongguo commented Apr 1, 2019

howl-anderson commented Apr 1, 2019