Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLU训练数据很慢 #4

Open
jiangdongguo opened this issue Mar 29, 2019 · 5 comments
Open

NLU训练数据很慢 #4

jiangdongguo opened this issue Mar 29, 2019 · 5 comments

Comments

@jiangdongguo
Copy link

你好!感谢你的项目,很赞。这里我有个问题,就是我在训练NLU数据的时候,总共51条,4个意图,10个实体,但是5个小时才能够训练完,有时候还会挂掉,请问是什么原因导致的?还是我在构造训练数据时对实体、意图的标志不对?期待解答,谢谢!

@howl-anderson
Copy link
Owner

出现错误时,是什么信息?你的机器什么配置?是否改动了项目的默认配置?

@jiangdongguo
Copy link
Author

训练时卡在:
Loading model cost 0.874 seconds.
Prefix dict has been built succesfully.
Training to recognize 10 labels: 'item', 'loc', 'number', 'hello', 'name', 'affirm', 'finish', 'thank', 'bye', 'time'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 1
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.857143
C: 35 loss: 3 0.857143
C: 20 loss: 4.5 0.914286
C: 5 loss: 3 0.857143
C: 20 loss: 1.5 0.714286
C: 20 loss: 4.75 0.914286
C: 21.5 loss: 4.65 0.914286
C: 17.7498 loss: 4.60893 0.914286
C: 20 loss: 4.4 0.914286
C: 20.9071 loss: 4.45791 0.914286
best C: 20
best loss: 4.5
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.972222 1 0.985915
Part I: elapsed time: 1 seconds.

Part II: train segment classifier
now do training
num training samples: 36
这里就不动了,跑的是您的项目,所有配置都没改,只是训练的是自己的语料库:
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "车牌",
"intent": "request_search",
"entities": [
{
"start": 0,
"end": 2,
"value": "车牌",
"entity": "item"
}
]
},
{
"text": "帮我查个车",
"intent": "request_search",
"entities": [
{
"start": 4,
"end": 5,
"value": "车",
"entity": "item"
}
]
},
{
"text": "查车牌",
"intent": "request_search",
"entities": [
{
"start": 1,
"end": 3,
"value": "车牌",
"entity": "item"
}
]
},
{
"text": "搜索车牌号码",
"intent": "request_search",
"entities": [
{
"start": 2,
"end": 6,
"value": "车牌号码",
"entity": "item"
}
]
},
{
"text": "查看车牌号",
"intent": "request_search",
"entities": [
{
"start": 2,
"end": 5,
"value": "车牌号",
"entity": "item"
}
]
},
{
"text": "我想搜索车牌号码浙AB8888",
"intent": "request_search",
"entities": [
{
"start": 4,
"end": 8,
"value": "车牌号码",
"entity": "item"
},
{
"start": 8,
"end": 9,
"value": "浙",
"entity": "loc"
},
{
"start": 9,
"end": 15,
"value": "AB8888",
"entity": "number"
}
]
},
{
"text": "鲁JB1686",
"intent": "request_search",
"entities": [
{
"start": 1,
"end": 7,
"value": "JB1686",
"entity": "number"
},
{
"start": 0,
"end": 1,
"value": "鲁",
"entity": "loc"
}
]
},
{
"text": "hi",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "hi",
"entity": "hello"
}
]
},
{
"text": "嘿",
"intent": "greet",
"entities": []
},
{
"text": "嗨",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "嗨",
"entity": "hello"
}
]
},
{
"text": "hi 小智",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "hi",
"entity": "hello"
},
{
"start": 3,
"end": 5,
"value": "小智",
"entity": "name"
}
]
},
{
"text": "你好",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
}
]
},
{
"text": "你好小智",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
},
{
"start": 2,
"end": 4,
"value": "小智",
"entity": "name"
}
]
},
{
"text": "早",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "早",
"entity": "hello"
}
]
},
{
"text": "早,小丽",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 1,
"value": "早",
"entity": "hello"
},
{
"start": 2,
"end": 4,
"value": "小丽",
"entity": "name"
}
]
},
{
"text": "你好啊",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "你好",
"entity": "hello"
}
]
},
{
"text": "是的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "是",
"entity": "affirm"
}
]
},
{
"text": "对的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "对",
"entity": "affirm"
}
]
},
{
"text": "好的",
"intent": "affirm",
"entities": [
{
"start": 0,
"end": 1,
"value": "好",
"entity": "affirm"
}
]
},
{
"text": "算了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "算了",
"entity": "finish"
}
]
},
{
"text": "不用了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "不用",
"entity": "finish"
}
]
},
{
"text": "没事了",
"intent": "finish",
"entities": [
{
"start": 0,
"end": 2,
"value": "没事",
"entity": "finish"
}
]
},
{
"text": "好的,谢谢你",
"intent": "thanks",
"entities": [
{
"start": 3,
"end": 5,
"value": "谢谢",
"entity": "thank"
}
]
},
{
"text": "谢谢",
"intent": "thanks",
"entities": [
{
"start": 0,
"end": 2,
"value": "谢谢",
"entity": "thank"
}
]
},
{
"text": "再见",
"intent": "say_bye",
"entities": [
{
"start": 0,
"end": 2,
"value": "再见",
"entity": "bye"
}
]
},
{
"text": "多谢啦",
"intent": "thanks",
"entities": [
{
"start": 0,
"end": 2,
"value": "多谢",
"entity": "thank"
}
]
},
{
"text": "拜拜",
"intent": "say_bye",
"entities": [
{
"start": 0,
"end": 2,
"value": "拜拜",
"entity": "bye"
}
]
},
{
"text": "下次见",
"intent": "say_bye",
"entities": []
},
{
"text": "晚上好",
"intent": "greet",
"entities": [
{
"start": 0,
"end": 2,
"value": "晚上",
"entity": "time"
},
{
"start": 2,
"end": 3,
"value": "好",
"entity": "affirm"
}
]
},
{
"text": "车牌号码",
"intent": "request_search",
"entities": [
{
"start": 0,
"end": 4,
"value": "车牌号码",
"entity": "item"
}
]
}
]
}
}

请问是什么问题?谢谢!

@howl-anderson
Copy link
Owner

SVM 训练起来确实特别慢,具体慢到什么程度和机器相关。目前来看只是慢,并没有什么bug。

@jiangdongguo
Copy link
Author

谢谢回复!以下是我的机器配置:

i5-7300
8g
win10 64bit

这个跟实体和意图的个数是否有关系?nlu的语料库编写是否有准确的规则?

@howl-anderson
Copy link
Owner

这个跟实体和意图的个数是否有关系?

是的,有关系的

nlu的语料库编写是否有准确的规则?

一般建议按照真实数据的分布编写,项目早期可以先写一部分,根据实际效果再做调整

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants