Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[opt](kerberos) use ticket cache instead of principal+keytab on BE si…
…de (#47299) ### What problem does this PR solve? #### Overview Previously, BE node use principal and keytab to do the kerberos authentication. But only the modified hadoop libhdfs support authenticating in this way, the origin libhdfs only support setting kerberos ticket cache path, or use system level kerberos authentication context. This pull request introduces a comprehensive Kerberos authentication module for the BE. The module is designed to handle Kerberos ticket management, including initialization, authentication, and periodic ticket refresh. It provides a robust interface for integrating Kerberos authentication, ensuring secure and efficient credential management. #### Key Components 1. **KerberosConfig** (`kerberos_config.h` and `kerberos_config.cpp`): - This class encapsulates the configuration settings required for Kerberos authentication, such as principal, keytab path, and refresh intervals. - Provides methods to set and retrieve configuration parameters. 2. **KerberosTicketCache** (`kerberos_ticket_cache.h` and `kerberos_ticket_cache.cpp`): - Manages the Kerberos ticket cache, including initialization, login, and periodic refresh of tickets. - Supports operations like writing to the ticket cache and checking if a refresh is needed. - Utilizes a background thread to periodically refresh tickets based on configured intervals. - The default cache file will be written in `/tmp` dir, but can be modified using `kerberos_ccache_path` in be.conf 3. **KerberosTicketMgr** (`kerberos_ticket_mgr.h` and `kerberos_ticket_mgr.cpp`): - Acts as a manager for multiple Kerberos ticket caches, handling their lifecycle, including creation, access, and cleanup. - Provides methods to get or set ticket caches and retrieve cache file paths. - Includes a background thread for cleaning up expired ticket caches every 1 hour. If a cache is longer being referenced, it will be removed. 4. **HdfsMgr** - A simple and new class to manager the hdfs fs handler. - It replace the old `HdfsHandlerCache` - It will check HdfsHandler every 1 hour, and remove unused HdfsHandler after 24 hours. #### Mainly Changes 1. Introduce a comprehensive kerberos ticket cache management on BE side 1. Use ticket cache path instead of principal and keytab to do the kerberos authentication of libhdfs. 2. Fix the issue that `kerberos_krb5_conf_path` in be.conf does not take effect. 3. Add a new system table `backend_kerberos_ticket_cache`, to view the krb ticket cache of each backend: ``` Doris > select * from information_schema.backend_kerberos_ticket_cache\G *************************** 1. row *************************** BE_ID: 1738304534666 BE_IP: 172.20.32.136 PRINCIPAL: hdfs/[email protected] KEYTAB: /path/to/hdfs.keytab SERVICE_PRINCIPAL: krbtgt/[email protected] TICKET_CACHE_PATH: /tmp/doris_krb_ce93d5ebb2a6554c7ba9f43aee3a9e6c HASH_CODE: ce93d5ebb2a6554c7ba9f43aee3a9e6c START_TIME: 2025-02-01 00:08:26 EXPIRE_TIME: 2025-02-01 00:09:26 AUTH_TIME: 2025-02-01 00:08:26 REF_COUNT: 1 REFRESH_INTERVAL_SECOND: 3600 ``` #### Usage The user interface remains unchanged. 1. set krb5.conf path in be.conf `kerberos_krb5_conf_path`, default is `/etc/krb5.conf` 2. provide kerberos principal the keytab path as usual. #### Configurations be.conf 1. `kerberos_ccache_path` The dir where kerber ticket cache file saved. the file name as format `doris_krb_xxxx` 2. `kerberos_krb5_conf_path` The path of krb5.conf file 6. `kerberos_refresh_interval_second` The min interval to refresh a kerberos ticket cache file. default is 1h. 7. cleanup logic If the ticket cache is not used for 1 day, it will be deleted.
- Loading branch information