Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

New issue

Jump to bottom

Draft

andreacate wants to merge 9 commits into pymc-devs:main from andreacate:DFM_draft_implementation

+1,767 −0

Contributor

andreacate commented Mar 31, 2025 •

edited

Loading

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
Add support for measurement error.


          Added new file DFM.py for GSOC 2025 Dynamical Factor Models

174f8b0

Contributor

zaxtax commented Apr 1, 2025

Looks interesting! Just say when you think it's ready for review

Member

fonnesbeck commented Apr 5, 2025

cc @jessegrabowski


          Add initial notebook on custom DFM implementation

11ba543

review-notebook-app bot commented Apr 7, 2025

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Contributor Author

andreacate commented Apr 7, 2025

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

andreacate added 5 commits

April 22, 2025 10:54


          Update of DFM draft implementation

b4a85bb

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)


          Update of the implementation

a4624d6

Still to do:
1) vectorization/block matrices
2) measurament errors


          Merge branch 'pymc-devs:main' into DFM_draft_implementation

09adca1


          Update DFM.py

c806b64


          Merge branch 'DFM_draft_implementation' of https://github.com/andreac…

494d0b1

…ate/pymc-extras into DFM_draft_implementation

jessegrabowski requested changes

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated

+                  factor_order : int
+                      Order of the VAR process for the latent factors.
+                  k_endog : int

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                k_endog : int
          
                k_endog : int, optional

pymc_extras/statespace/models/DFM.py Outdated

+                      Order of the VAR process for the latent factors.
+                  k_endog : int
+                      Number of observed time series.

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                    Number of observed time series.
          
                    Number of observed time series. If not provided, the number of observed series will be inferred from `endog_names`. At least one of `k_endog` or `endog_names` must be provided.

pymc_extras/statespace/models/DFM.py Outdated

+                  k_endog : int
+                      Number of observed time series.
+                  endog_names : Sequence[str], optional

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                endog_names : Sequence[str], optional
          
                endog_names : list of str, optional

pymc_extras/statespace/models/DFM.py Outdated

+                      Number of observed time series.
+                  endog_names : Sequence[str], optional
+                      Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k`.

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                    Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k`.
          
                    Names of the observed time series. If not provided, default names will be generated as `endog_1`, `endog_2`, ..., `endog_k` based on `k_endog`. At least one of `k_endog` or `endog_names` must be provided.

pymc_extras/statespace/models/DFM.py

+                  verbose: bool, default True
+                      If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports.
+                  Notes

Member

jessegrabowski Jul 13, 2025

We're going to have to add all the math equations and whatnot here eventually. No rush, but I want to make sure it's on your TODO list. Check the VARMAX docstring for what I have in mind

pymc_extras/statespace/models/DFM.py Outdated

+                      # Factor states
+                      for i in range(self.k_factors):
+                          for lag in range(self.factor_order):
+                              names.append(f"factor_{i+1}_lag{lag}")

Member

jessegrabowski Jul 13, 2025

nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}

Not married to it, but consider it for consistency's sake.

pymc_extras/statespace/models/DFM.py Outdated

+                      if self.error_order > 0:
+                          for i in range(self.k_endog):
+                              for lag in range(self.error_order):
+                                  names.append(f"error_{i+1}_lag{lag}")

Member

jessegrabowski Jul 13, 2025

as above

pymc_extras/statespace/models/DFM.py Outdated

+                      # If error_order > 0
+                      if self.error_order > 0:
+                          coords["error_ar_param"] = list(range(1, self.error_order + 1))

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                        coords["error_ar_param"] = list(range(1, self.error_order + 1))
          
                        coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))

It's weird to have a global everywhere except here

pymc_extras/statespace/models/DFM.py Outdated

+                          coord_map["factor_ar"] = (FACTOR_DIM, AR_PARAM_DIM)
+                      if self.error_order > 0:
+                          coord_map["error_ar"] = (OBS_STATE_DIM, "error_ar_param")

Member

jessegrabowski Jul 13, 2025

Suggested change

      
                        coord_map["error_ar"] = (OBS_STATE_DIM, "error_ar_param")
          
                        coord_map["error_ar"] = (OBS_STATE_DIM, ERROR_AR_PARAM_DIM)

pymc_extras/statespace/models/DFM.py


		self.ssm["initial_state_cov", :, :] = P0

		# TODO vectorize the design matrix

Member

jessegrabowski Jul 13, 2025

You're going to have to double-check all of these matrix constructions if you re-ordered the states.

andreacate added 2 commits

July 14, 2025 13:51


          Update following suggestions by Jesse

aa995b9


          Merge branch 'pymc-devs:main' into DFM_draft_implementation

a542ec7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet