Performance issues with DialogRPT + DialoGPT #6

pablogranolabar · 2021-03-01T22:38:49Z

I've been working with DialogRPT using DialoGPT-large for dialog generation and have hit some performance issues that aren't present when using just DialoGPT-large. Round trip responses using CPU inference are just a few seconds with gpt2-large but whenever DialogRPT is used with the DialoGPT-large checkpoint, performance grinds to a halt. With GPU inference I can run gpt2-large on a 6GB GPU but with DialogRPT I get OOM. I understand that there are multiple models running with the combination of DialogRPT + DialoGPT which is the obvious culprit, is there any way to serialize execution of the two models to prevent these resource consumption issues?

golsun · 2021-03-01T23:01:18Z

hi @pablogranolabar ,

I can think of several potential reasons of OOM:

torch.no_grad which avoids grad taking memory
it was already applied in scorer, but not in generation.py -- I've updated it here, please take a look.
number of candidates to be scored -- if it's too large, you can split candidates into several batches and send them to DialogRPT, similar to this
if that still doesn't work, I guess you can use two machines, one just for DialoGPT-large and one for DialogRPT, and use API to communicate with each other
how many DialogRPT models are you using? I recommend at least updown and human_vs_rand because updown doesn't capture context-response relevance.

pablogranolabar · 2021-03-01T23:04:13Z

Hi @golsun, thanks for the quick response!

The two machine idea makes sense, I think I can do that with relative ease if it comes to that.

For the DialogRPT models I am just using updown. So I should ensemble at least updown + human_vs_rand? This application is for a conversational agent that can rerank dialog based on human scoring of the chatbot responses.

golsun · 2021-03-01T23:10:49Z

yes human_vs_rand (together with updown)should help in that case.
if memory is a concern, a low-memory way without using human_vs_rand is to decode response with small top_k or top_p, this should also help the response to be relevant to context. but I guess the performance depends on the scenarios.....

pablogranolabar · 2021-03-03T01:51:17Z

Hi again @golsun. I'm working on ensembling human_vs_rand with updown per your advice, but I'm unsure of the way to proceed with ensemble.yml. Should human_vs_rand and updown be a part of prior with equal weights? Or should human_vs_rand be prior and with updown conditional? Based on the performance reasons above I'm trying to do this with just a two model ensemble as you suggested.

golsun · 2021-03-03T02:59:06Z

hi, in this case, I guess a simple way without dealing with ensemble.yml is

# `get_model` and `predict` are functions from score.py
hvm = get_model('restore/human_vs_machine.pth')
updown = get_model('restore/updown.pth')
score_hvm =  predict(hvm, cxt, hyps)
score_updown =  predict(updown, cxt, hyps)
score_overall = np.sqrt(score_updown * score_hvm)   # use this as the final score

I used geometric mean for score_overall, but you can play with some weighted arithmetic mean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with DialogRPT + DialoGPT #6

Performance issues with DialogRPT + DialoGPT #6

pablogranolabar commented Mar 1, 2021

golsun commented Mar 1, 2021

pablogranolabar commented Mar 1, 2021

golsun commented Mar 1, 2021 •

edited

Loading

pablogranolabar commented Mar 3, 2021

golsun commented Mar 3, 2021 •

edited

Loading

Performance issues with DialogRPT + DialoGPT #6

Performance issues with DialogRPT + DialoGPT #6

Comments

pablogranolabar commented Mar 1, 2021

golsun commented Mar 1, 2021

pablogranolabar commented Mar 1, 2021

golsun commented Mar 1, 2021 • edited Loading

pablogranolabar commented Mar 3, 2021

golsun commented Mar 3, 2021 • edited Loading

golsun commented Mar 1, 2021 •

edited

Loading

golsun commented Mar 3, 2021 •

edited

Loading