Skip to content

Out of memory when dealing with large collections #529

Closed
@joepio

Description

@joepio

The .tpf function currently stores all Atoms in memory. This leads to OOM issues (out of memory crash) for large collections.

Some thoughts on this:

Rust OOM tools

Currently, we don't get any useful errors in the log. We can't do a stack trace, theres no unwind. This makes debugging OOM issues hard.

This also may have something to do with linux overcommitting memory.

  • The RFC for try_reserve may help prevent panics / OS killing Atomic-Server.
  • oom=panic might help give prettier error messages. But it's not implemented in stable rust yet.

Index all the TPF queries

Let's go over the types of TPF queries we use, and how we can index these:

  • All the queries with a known subject are not relevant
  • By far the most queries have a known property and value
  • The queries with a known property probably need a property-value-subject index. We don't have that as of now. That would also help us create really performant queries for new, unindexed query filters.
  • The queries with only a known value are indexed by the reference_index.

How I found the issue

read more...

Screenshot_2022-10-27-22-21-11-348_org mozilla firefox

The problem is that the websocket requests have no response.

Sometimes (but not always) the WebSocket connection seems to fail:

The connection to wss://atomicdata.dev/ws was interrupted while the page was loading. [websockets.js:23:19](https://atomicdata.dev/lib/dist/src/websockets.js)
websocket error: 
error { target: WebSocket, isTrusted: true, srcElement: WebSocket, currentTarget: WebSocket, eventPhase: 2, bubbles: false, cancelable: false, returnValue: true, defaultPrevented: false, composed: false, … }
[bugsnag.js:2579:15](https://atomicdata.dev/node_modules/.pnpm/@bugsnag+browser@7.16.5/node_modules/@bugsnag/browser/dist/bugsnag.js)

On the server, I see this every time:

Oct 29 10:50:49 vultr.guest atomic-server[2965299]: Visit https://atomicdata.dev
Oct 29 10:50:49 vultr.guest atomic-server[2965299]: 2022-10-29T10:50:49.596753Z  INFO actix_server::builder: Starting 1 workers
Oct 29 10:50:49 vultr.guest atomic-server[2965299]: 2022-10-29T10:50:49.596978Z  INFO actix_server::server: Actix runtime found; starting in Actix runtime
Oct 29 10:51:13 vultr.guest systemd[1]: atomic.service: Main process exited, code=killed, status=9/KILL
Oct 29 10:51:13 vultr.guest systemd[1]: atomic.service: Failed with result 'signal'.
Oct 29 10:51:14 vultr.guest systemd[1]: atomic.service: Scheduled restart job, restart counter is at 27.
Oct 29 10:51:14 vultr.guest systemd[1]: Stopped Atomic-Server.
Oct 29 10:51:14 vultr.guest systemd[1]: Started Atomic-Server.

What killed our process?

dmesg -T| grep -E -i -B100 'killed process'

oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/atomic.service,task=atomic-server,pid=2965353,uid=0
[Sat Oct 29 10:51:59 2022] Out of memory: Killed process 2965353 (atomic-server) total-vm:891908kB, anon-rss:278920kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:776kB oom_score_adj:0

An out of memory issue...

Since we can correctly see most of the Collections, but not all, I think it's one of the collections that is actually causing this.

After checking them one by one, the culprit seems to be /commits. Makes sense, it is by far the largest collection!

I think the problem has to do with .tpf not being iterable.

Activity

changed the title [-]Collections stuck on loading [/-] [+]Out of memory when dealing with large collections[/+] on Oct 29, 2022
added 4 commits that reference this issue on Oct 31, 2022

#529 WIP propvalsub index

#529 WIP propvalsub index

#529 add property index, speed up queries

added 3 commits that reference this issue on Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @joepio

        Issue actions

          Out of memory when dealing with large collections · Issue #529 · atomicdata-dev/atomic-server