You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Summary:
Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.
Description:
Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.
In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.
Request:
We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.
Describe alternatives you've considered
For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.
Who will this benefit?
All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.
Are you interested in contributing this feature?
yes - I'm preparing a pull request
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Is this your first time submitting a feature request?
Describe the feature
Summary:
Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.
Description:
Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.
In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.
Request:
We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.
Describe alternatives you've considered
For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.
Who will this benefit?
All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.
Are you interested in contributing this feature?
yes - I'm preparing a pull request
Anything else?
No response
The text was updated successfully, but these errors were encountered: