-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
Description
Describe the issue:
Process memory grows steadily while computing log likelihood until it consumes all available memory (and swap). Replicated on linux and M1 Mac.
PYMC version: 5.7.2
Linux system:
Void Linux
Kernel 6.3.12_1
64 GB DDR5 RAM (64 GB SWAP)
24 GB RTX 4090 GPU
AMD Ryzen 9 7950X 16 core, 32 threads
Mac System:
16 GB memory
8 Cores
Dataset: ~161 mb total.
Reproducible code example:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
import pymc as pm
def pymc_bayes(df: pd.DataFrame):
a, b, c, i = df.a.values, df.b.values, df.c.values, df.i.values
n_i = int(i.max() + 1)
with pm.Model() as m:
alpha = pm.Normal("alpha", 0, 1, shape=[n_i])
beta_b = pm.HalfNormal("beta_b", 1)
beta_c = pm.HalfNormal("beta_c", 1)
beta_int = pm.Normal("beta_int", 0, 1)
mu = alpha[i] + beta_b * b + beta_c * c + beta_int * b * c
sigma = pm.Exponential("sigma", 1)
a_hat = pm.Normal("a_hat", mu, sigma, observed=a)
idata = pm.sample(mp_ctx="spawn", idata_kwargs={"log_likelihood": True})
idata.to_netcdf("pymc_bayes.nc")
print("finished!")
if __name__ == "__main__":
n, n_int = 2618018, 17 # to match the real dataset I care about
df = pd.DataFrame(np.random.randn(n, 3), columns=["a", "b", "c"])
df["i"] = np.random.randint(0, n_int, size=n)
pymc_bayes(df)
Error message:
Killed by OS.
PyMC version information:
PYMC version: 5.7.2
Context for the issue:
Trying to use this with arviz.compare(...)