Skip to content

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

andreacate
Copy link
Contributor

@andreacate andreacate commented Mar 31, 2025

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

  • Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

  • Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
  • Add support for measurement error.

@zaxtax
Copy link
Contributor

zaxtax commented Apr 1, 2025

Looks interesting! Just say when you think it's ready for review

@fonnesbeck
Copy link
Member

cc @jessegrabowski

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@andreacate
Copy link
Contributor Author

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)
Still to do:
1) vectorization/block matrices
2) measurament errors
factor_order : int
Order of the VAR process for the latent factors.

k_endog : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k_endog : int
k_endog : int, optional

Order of the VAR process for the latent factors.

k_endog : int
Number of observed time series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of observed time series.
Number of observed time series. If not provided, the number of observed series will be inferred from `endog_names`. At least one of `k_endog` or `endog_names` must be provided.

k_endog : int
Number of observed time series.

endog_names : Sequence[str], optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endog_names : Sequence[str], optional
endog_names : list of str, optional

Number of observed time series.

endog_names : Sequence[str], optional
Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k`.
Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k` based on `k_endog`. At least one of `k_endog` or `endog_names` must be provided.

verbose: bool, default True
If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports.

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to have to add all the math equations and whatnot here eventually. No rush, but I want to make sure it's on your TODO list. Check the VARMAX docstring for what I have in mind

# Factor states
for i in range(self.k_factors):
for lag in range(self.factor_order):
names.append(f"factor_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}

Not married to it, but consider it for consistency's sake.

if self.error_order > 0:
for i in range(self.k_endog):
for lag in range(self.error_order):
names.append(f"error_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above


# If error_order > 0
if self.error_order > 0:
coords["error_ar_param"] = list(range(1, self.error_order + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coords["error_ar_param"] = list(range(1, self.error_order + 1))
coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))

It's weird to have a global everywhere except here

coord_map["factor_ar"] = (FACTOR_DIM, AR_PARAM_DIM)

if self.error_order > 0:
coord_map["error_ar"] = (OBS_STATE_DIM, "error_ar_param")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coord_map["error_ar"] = (OBS_STATE_DIM, "error_ar_param")
coord_map["error_ar"] = (OBS_STATE_DIM, ERROR_AR_PARAM_DIM)


self.ssm["initial_state_cov", :, :] = P0

# TODO vectorize the design matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to have to double-check all of these matrix constructions if you re-ordered the states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants