Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with DialogRPT + DialoGPT #6

Open
pablogranolabar opened this issue Mar 1, 2021 · 5 comments
Open

Performance issues with DialogRPT + DialoGPT #6

pablogranolabar opened this issue Mar 1, 2021 · 5 comments

Comments

@pablogranolabar
Copy link

Hi again @golsun,

I've been working with DialogRPT using DialoGPT-large for dialog generation and have hit some performance issues that aren't present when using just DialoGPT-large. Round trip responses using CPU inference are just a few seconds with gpt2-large but whenever DialogRPT is used with the DialoGPT-large checkpoint, performance grinds to a halt. With GPU inference I can run gpt2-large on a 6GB GPU but with DialogRPT I get OOM. I understand that there are multiple models running with the combination of DialogRPT + DialoGPT which is the obvious culprit, is there any way to serialize execution of the two models to prevent these resource consumption issues?

@golsun
Copy link
Owner

golsun commented Mar 1, 2021

hi @pablogranolabar ,

I can think of several potential reasons of OOM:

  • torch.no_grad which avoids grad taking memory
    it was already applied in scorer, but not in generation.py -- I've updated it here, please take a look.
  • number of candidates to be scored -- if it's too large, you can split candidates into several batches and send them to DialogRPT, similar to this
  • if that still doesn't work, I guess you can use two machines, one just for DialoGPT-large and one for DialogRPT, and use API to communicate with each other
  • how many DialogRPT models are you using? I recommend at least updown and human_vs_rand because updown doesn't capture context-response relevance.

@pablogranolabar
Copy link
Author

Hi @golsun, thanks for the quick response!

The two machine idea makes sense, I think I can do that with relative ease if it comes to that.

For the DialogRPT models I am just using updown. So I should ensemble at least updown + human_vs_rand? This application is for a conversational agent that can rerank dialog based on human scoring of the chatbot responses.

@golsun
Copy link
Owner

golsun commented Mar 1, 2021

yes human_vs_rand (together with updown)should help in that case.
if memory is a concern, a low-memory way without using human_vs_rand is to decode response with small top_k or top_p, this should also help the response to be relevant to context. but I guess the performance depends on the scenarios.....

@pablogranolabar
Copy link
Author

Hi again @golsun. I'm working on ensembling human_vs_rand with updown per your advice, but I'm unsure of the way to proceed with ensemble.yml. Should human_vs_rand and updown be a part of prior with equal weights? Or should human_vs_rand be prior and with updown conditional? Based on the performance reasons above I'm trying to do this with just a two model ensemble as you suggested.

@golsun
Copy link
Owner

golsun commented Mar 3, 2021

hi, in this case, I guess a simple way without dealing with ensemble.yml is

# `get_model` and `predict` are functions from score.py
hvm = get_model('restore/human_vs_machine.pth')
updown = get_model('restore/updown.pth')
score_hvm =  predict(hvm, cxt, hyps)
score_updown =  predict(updown, cxt, hyps)
score_overall = np.sqrt(score_updown * score_hvm)   # use this as the final score

I used geometric mean for score_overall, but you can play with some weighted arithmetic mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants