-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with DialogRPT + DialoGPT #6
Comments
hi @pablogranolabar , I can think of several potential reasons of OOM:
|
Hi @golsun, thanks for the quick response! The two machine idea makes sense, I think I can do that with relative ease if it comes to that. For the DialogRPT models I am just using updown. So I should ensemble at least updown + human_vs_rand? This application is for a conversational agent that can rerank dialog based on human scoring of the chatbot responses. |
yes |
Hi again @golsun. I'm working on ensembling human_vs_rand with updown per your advice, but I'm unsure of the way to proceed with ensemble.yml. Should human_vs_rand and updown be a part of prior with equal weights? Or should human_vs_rand be prior and with updown conditional? Based on the performance reasons above I'm trying to do this with just a two model ensemble as you suggested. |
hi, in this case, I guess a simple way without dealing with # `get_model` and `predict` are functions from score.py
hvm = get_model('restore/human_vs_machine.pth')
updown = get_model('restore/updown.pth')
score_hvm = predict(hvm, cxt, hyps)
score_updown = predict(updown, cxt, hyps)
score_overall = np.sqrt(score_updown * score_hvm) # use this as the final score I used geometric mean for |
Hi again @golsun,
I've been working with DialogRPT using DialoGPT-large for dialog generation and have hit some performance issues that aren't present when using just DialoGPT-large. Round trip responses using CPU inference are just a few seconds with gpt2-large but whenever DialogRPT is used with the DialoGPT-large checkpoint, performance grinds to a halt. With GPU inference I can run gpt2-large on a 6GB GPU but with DialogRPT I get OOM. I understand that there are multiple models running with the combination of DialogRPT + DialoGPT which is the obvious culprit, is there any way to serialize execution of the two models to prevent these resource consumption issues?
The text was updated successfully, but these errors were encountered: