Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook kernel can go idle during long processing, losing cell results and preventing completion of notebook #291

Open
cybersam opened this issue Nov 16, 2023 · 4 comments
Labels
BUG Something isn't working

Comments

@cybersam
Copy link

cybersam commented Nov 16, 2023

Describe the bug
When running .predict on a model in a Jupyter notebook cell, the intervals between progress bar output can become so long that the Jupyter kernel decides the cell has finished, and can put the kernel in an idle (basically, "dead") state after enough inactivity. This is very possible for long-running predictions (say, running overnight), where the user steps away and does not touch Jupyter for many hours. When the Neo4j server finally finishes prediction, the results (say, from .predict.stream) of the many hours of processing are lost, since the notebook is dead. I suspect the same problem can occur with other long-running GDS operations.

To Reproduce

I don't know if the eventual slow-down in progress bar output happens for all long-running use cases or server configurations. In my case, it usually happens but sometimes not.

GDS version: 2.5.3
Neo4j version: 5.11.0
Operating system: Amazon Linux

My specific Jupyter environment: JupyterLab 4.0.8, Python 3 (ipykernel) kernel, on AWS EC2 with Amazon Linux

Steps to reproduce the behavior:

  • Start a long-running (say, 10 hour long) .predict.stream operation in a Jupyter cell, and do not touch Jupyter the whole time.
  • After a good amount of progress, the progress bar output will freeze at some percentage of completion and the kernel will go to idle state.
  • If you CALL gds.listProgress in the Browser, you will see that the prediction is still running (if it has not yet completed).
  • After prediction complets on the server, the (dead) notebook does not not display any new content, and no following cells are executed.

Expected behavior
The Jupyter notebook should never go idle while any long-running GDS operation is still in progress.

Probably just need to ensure that output is regularly produced (say, every x minutes).

@cybersam cybersam added the BUG Something isn't working label Nov 16, 2023
@cybersam cybersam changed the title Jupyter notebook kernel can go idle during model prediction, causing streaming results to be lost Jupyter notebook kernel can go idle during long processing, causing streaming results to be lost Nov 16, 2023
@cybersam cybersam changed the title Jupyter notebook kernel can go idle during long processing, causing streaming results to be lost Jupyter notebook kernel can go idle during long processing, losing streaming results and stopping further cell processing Nov 16, 2023
@cybersam cybersam changed the title Jupyter notebook kernel can go idle during long processing, losing streaming results and stopping further cell processing Jupyter notebook kernel can go idle during long processing, losing cell results and stopping further cell processing Nov 16, 2023
@cybersam cybersam changed the title Jupyter notebook kernel can go idle during long processing, losing cell results and stopping further cell processing Jupyter notebook kernel can go idle during long processing, losing cell results and preventing completion of notebook Nov 16, 2023
@adamnsch
Copy link
Contributor

adamnsch commented Nov 17, 2023

Hi @cybersam,

Thank for bringing this to our attention.

The progress of a link prediction pipeline is not linear, so it may well be that there are substantial chunks of time where the algorithm has not reported progress in terms of %, even though it's still running. Is your jupyter environment by any chance running an idle culler? If so, have you tried to configure the idle culler according to your needs? https://tljh.jupyter.org/en/latest/topic/idle-culler.html

Adam

@cybersam
Copy link
Author

As far as I know, I am using default Jupyter configuration, in which culling is supposed to be disabled. The config files do not set any culling values.

@cybersam
Copy link
Author

cybersam commented Nov 17, 2023

Also, it turns out my idle kernel is not culled, even after a long time. It still remembers the state before the cell that "died".

@cybersam
Copy link
Author

Some background: the cell in question stores the prediction result in a 'result' variable.

I tried the following experiment, and the results are very interesting:

  • I changed the cell to execute model.predict_mutate instead of model.predict_stream, so that the results are not completely lost when the kernel goes idle.
  • I saw that the cell output stopped showing new progress at some point.
  • In the Browser, I ran CALL gds.listProgress to keep tabs on the progress of the ongoing prediction operation.
  • When the prediction completed, I used the Browser to verify that the new relationships created by mutate (for link prediction pipeline) were in the GDS projection.
  • I then added a new cell to display the 'result' variable, and it actually showed the mutation result!

So, when the kernel goes idle it is apparently still able to get ultimate results. But the cell output is messed up, and subsequent cells do not execute when they are supposed to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants