PyMC3 Examples Known Errors & Notes (March 2021)

# Summary

The following are existing issues & suggestions in the pymc3-examples repo after going through an iteration of renaming plot dependencies from `pm.` to `arviz.`

> Note: This is similar to the pre-existing #34 issue.

## Installs
* Pymc3: v3.11
* theano-pymc (aesara):  v 1.1.2
# Issues

## General Issues

* Have param `returninference =True`

## File-specific Issues

- [ ] examples/pymc3_howto/lasso_block_update.ipynb

<details>
<summary>
Issue
</summary>

"`return_inferencedata` should be manually set, either to True or to False, but one of the two to avoid the warning.


If False, an `inferencedata` should be created within the model context, and passed to arviz. This will 
1. avoid the warning of conversion without the model context and 
2. push forward arviz best practices, it is probably not too relevant here but conversion may not be cheap for some models because it requires computing all the pointwise log likelihood values. `az.plot_xyz(trace)` works because ArviZ internally converts the data to inferencedata, then plots."
</details>

- [ ] examples/pymc3_howto/data_container.ipynb

<details>
<summary>
Issue
</summary>
We should have "keep_size=True to avoid the warning in the cell below, also because in a future ArviZ release the behaviour of hdi will change for 2d arrays (not for 1d or 3d+ arrays), so using 3d arrays with chain, draw, *shape should be used."
</details>

- [ ] examples/pymc3_howto/sampling_conjugate_step.ipynb


<details>
<summary>
Code
</summary>

```python
traces = []
models = []
names = ["Partial conjugate sampling", "Full NUTS"]

for use_conjugate in [True, False]:
    with pm.Model() as model:
        tau = pm.Exponential("tau", lam=1, testval=1.0)
        alpha = pm.Deterministic("alpha", tau * np.ones([N, J]))
        p = pm.Dirichlet("p", a=alpha)

        if use_conjugate:
            # If we use the conjugate sampling, we don't need to define the likelihood
            # as it's already taken into account in our custom step method
            step = [ConjugateStep(p.transformed, counts, tau)]

        else:
            x = pm.Multinomial("x", n=ncounts, p=p, observed=counts)
            step = []

        trace = pm.sample(step=step, chains=2, cores=1, return_inferencedata=True)
        traces.append(trace)

    assert all(az.summary(trace)["r_hat"] < 1.1)
    models.append(model)
```

</details>

<details>
<summary>
Issue
</summary>

"Since we are not storing the summary dataframe anywhere and we only want the rhat, we should use rhat instead. The assertion can be done with:

`assert (az.rhat(trace).to_array() < 1.1).all()`"

</details>



- [ ] examples/ode_models/ODE_with_manual_gradients.ipynb

<details>
<summary>
Code
</summary>

Similar to PR 43, for line  33 at variable `trace = `

```python
    Y_obs = pm.Lognormal("Y_obs", mu=pm.math.log(forward), sigma=sigma, observed=Y)

    trace = pm.sample(1500, init="jitter+adapt_diag", cores=1)
trace["diverging"].sum()
```

I changed init from `adapt_diag` to `jitter+adapt_diag` & added param `cores=1`.
</details>


<details>
<summary>
Issue
</summary>
I get a sampling error when using adapt_diagor other adapter types....unsure why.

The error:

`SamplingError: Bad initial energy`

Seen here
<img width="850" alt="Screen Shot 2021-03-14 at 6 56 40 PM" src="https://user-images.githubusercontent.com/19514362/111094394-6f5c2e00-84f8-11eb-9d9e-50e4361780d8.png">

</details>

- [ ] examples/generalized_linear_models/GLM-model-selection.ipynb

#### Issue 1

<details>
<summary>
Code
</summary>


```python

ax = (
    dfll["log_likelihood"]
    .unstack()
    .plot.bar(subplots=True, layout=(1, 2), figsize=(12, 6), sharex=True)
)

ax[0, 0].set_xticks(range(5))
ax[0, 0].set_xticklabels(["k1", "k2", "k3", "k4", "k5"])
ax[0, 0].set_xlim(-0.25, 4.25);

```

</details>

<details>
<summary>
Issue
</summary>

One dependency errors out, this making remainder of notebook not run.
Errors on missing `sd_log__`, and therefore cannot run entire notebook due to dependency.

Particularly

`GLM-model-selection KeyError: 'var names: "['sd_log__'] are not present" in dataset'`


[More details here](https://github.com/pymc-devs/pymc-examples/pull/24#issuecomment-792884220)

</details>

#### Issue 2

<details>
<summary>
Code
</summary>


```python
dfll = pd.DataFrame(index=["k1", "k2", "k3", "k4", "k5"], columns=["lin", "quad"])
dfll.index.name = "model"

for nm in dfll.index:
    dfll.loc[nm, "lin"] = -models_lin[nm].logp(
        az.summary(traces_lin[nm], traces_lin[nm].varnames)["mean"].to_dict()
    )

    dfll.loc[nm, "quad"] = -models_quad[nm].logp(
        az.summary(traces_quad[nm], traces_quad[nm].varnames)["mean"].to_dict()
    )

dfll = pd.melt(dfll.reset_index(), id_vars=["model"], var_name="poly", value_name="log_likelihood")
dfll.index = pd.MultiIndex.from_frame(dfll[["model", "poly"]])
```

</details>

<details>
<summary>
Issue
</summary>
I'm not familiar with this notebook and find the pandas stuff happening above quite confusing. The model selection should be simplified with newer ArviZ features. Having to work with straces directly is not something we should need to teach 😬
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PyMC3 Examples Known Errors & Notes (March 2021) #43

Summary

Installs

Issues

General Issues

File-specific Issues

Issue 1

Issue 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PyMC3 Examples Known Errors & Notes (March 2021) #43

Description

Summary

Installs

Issues

General Issues

File-specific Issues

Issue 1

Issue 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions