-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE SSL mTLS capability #46023
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Hello, thank you for having me. It is my first time contributing to any open source project, so please bear with me. Happy to learn from any wisdom shared |
My current inclination in solving this problem is to use jpype to funnel python requests through a JVM running the JDBC Driver adjacent to Airflow. The goal with this is to use the Python natively to write HIVE DAGs but communicate in Java which is more native to HIVE and supports the JKS key format, which is the default to HIVE |
On pause for now and considering ramifications of this in my environment |
The |
So I'm currently looking at using two capabilities. The first, is to connect to an NGINX proxy that requires SSL certs and expects mTLS to serve HIVE commands locally through our cluster, into the HIVE2SERVER running right behind it. The second, down the line that I am hoping for, is to find or create support for direct connection with pyHIVE to HIVE2SERVER running with SSL, and to perform mTLS. The problem with this however is that I notice that python does not natively support the .jks format that HIVE2SERVER expects, hence the use of an NGINX proxy. However, looking at pyHIVE, and its most recent issues, to me it seems that pyHIVE as well does not support SSL connection: Forgive me for any misunderstanding as well, this is all a learning process to me at the same time. Thank you for the patience and help @nevcohen |
So today how do you connect to hive using a code? |
Thank you for the patience, this has taken some digging on my end, getting accustomed to what is currently practiced in my org. Currently our pyHive queries are written a more manual script and sent to a NGINX server that redirects appropriate traffic to a Hive2Server proxy. The Thrift communication is wrapped in HTTPS using the THTTPClient module from the Thrift library. I have found this to exist within pyHive as well. This lives and is made accessible within the Connection method of pyHive My goal is to add a method using the ssl library that creates ssl context using the extras provided and appends them to the connection being created if a "use_https_proxy" boolean is specified within the proxy. Further, a "enable_mtls" boolean option will be included to allow for cases where someone needs to use mTLS. |
Finding a way to do this through the pyHive scheme and default constructor parameters |
I think I understand what you want to do, the way I see it there are two options.
|
Hello, thank you for the help. I have opened an issue to make a PR for puHive first since it is lacking the capability fundamentally. Once I get that merged, I will come back here to make a PR for the airflow provider. |
I'm closing this issue as it's missing feature in upstream library dropbox/PyHive#480 |
Just wanted to add the comment, that the new pyHive has been adopted by the apache/kyuubi project. The PR for this support has been made upstream, and is awaiting release. Issue can remain closed. |
Description
Add HIVE provider capability to use SSL and perform mTLS handshake with connecting host
Use case/motivation
Airflow runs in a different network from HIVE due to policy and company support.
Require mTLS encrypted connection between the two instances to securely run HIVE jobs remotely.
Related issues
None that I am aware of
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: