Skip to content

Commit

Permalink
[opt](kerberos) use ticket cache instead of principal+keytab on BE si…
Browse files Browse the repository at this point in the history
…de (#47299)

### What problem does this PR solve?

#### Overview

Previously, BE node use principal and keytab to do the kerberos
authentication.
But only the modified hadoop libhdfs support authenticating in this way,
the origin libhdfs
only support setting kerberos ticket cache path, or use system level
kerberos authentication context.

This pull request introduces a comprehensive Kerberos authentication
module for the BE.
The module is designed to handle Kerberos ticket management, including
initialization, authentication, and periodic ticket refresh.
It provides a robust interface for integrating Kerberos authentication,
ensuring secure and efficient credential management.

#### Key Components

1. **KerberosConfig** (`kerberos_config.h` and `kerberos_config.cpp`):
- This class encapsulates the configuration settings required for
Kerberos authentication, such as principal, keytab path, and refresh
intervals.
   - Provides methods to set and retrieve configuration parameters.

2. **KerberosTicketCache** (`kerberos_ticket_cache.h` and
`kerberos_ticket_cache.cpp`):
- Manages the Kerberos ticket cache, including initialization, login,
and periodic refresh of tickets.
- Supports operations like writing to the ticket cache and checking if a
refresh is needed.
- Utilizes a background thread to periodically refresh tickets based on
configured intervals.
- The default cache file will be written in `/tmp` dir, but can be
modified using `kerberos_ccache_path` in be.conf

3. **KerberosTicketMgr** (`kerberos_ticket_mgr.h` and
`kerberos_ticket_mgr.cpp`):
- Acts as a manager for multiple Kerberos ticket caches, handling their
lifecycle, including creation, access, and cleanup.
- Provides methods to get or set ticket caches and retrieve cache file
paths.
- Includes a background thread for cleaning up expired ticket caches
every 1 hour. If a cache is longer being referenced, it will be removed.

4. **HdfsMgr**
   - A simple and new class to manager the hdfs fs handler.
   - It replace the old `HdfsHandlerCache`
- It will check HdfsHandler every 1 hour, and remove unused HdfsHandler
after 24 hours.

#### Mainly Changes

1. Introduce a comprehensive kerberos ticket cache management on BE side
1. Use ticket cache path instead of principal and keytab to do the
kerberos authentication of libhdfs.
2. Fix the issue that `kerberos_krb5_conf_path` in be.conf does not take
effect.
3. Add a new system table `backend_kerberos_ticket_cache`, to view the
krb ticket cache of each backend:

```
Doris > select * from information_schema.backend_kerberos_ticket_cache\G
*************************** 1. row ***************************
                  BE_ID: 1738304534666
                  BE_IP: 172.20.32.136
              PRINCIPAL: hdfs/[email protected]
                 KEYTAB: /path/to/hdfs.keytab
      SERVICE_PRINCIPAL: krbtgt/[email protected]
      TICKET_CACHE_PATH: /tmp/doris_krb_ce93d5ebb2a6554c7ba9f43aee3a9e6c
              HASH_CODE: ce93d5ebb2a6554c7ba9f43aee3a9e6c
             START_TIME: 2025-02-01 00:08:26
            EXPIRE_TIME: 2025-02-01 00:09:26
              AUTH_TIME: 2025-02-01 00:08:26
              REF_COUNT: 1
REFRESH_INTERVAL_SECOND: 3600
```

#### Usage

The user interface remains unchanged.
1. set krb5.conf path in be.conf `kerberos_krb5_conf_path`, default is
`/etc/krb5.conf`
2. provide kerberos principal the keytab path as usual.

#### Configurations

be.conf

1. `kerberos_ccache_path`
The dir where kerber ticket cache file saved. the file name as format
`doris_krb_xxxx`

2. `kerberos_krb5_conf_path`
	The path of krb5.conf file

6. `kerberos_refresh_interval_second`
The min interval to refresh a kerberos ticket cache file. default is 1h.

7. cleanup logic

	If the ticket cache is not used for 1 day, it will be deleted.
  • Loading branch information
morningman authored Feb 10, 2025
1 parent fa09d46 commit b1e7432
Show file tree
Hide file tree
Showing 40 changed files with 2,865 additions and 307 deletions.
3 changes: 3 additions & 0 deletions be/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -536,6 +536,9 @@ if ((ARCH_AMD64 OR ARCH_AARCH64) AND OS_LINUX)
hadoop_hdfs
)
add_definitions(-DUSE_HADOOP_HDFS)
# USE_DORIS_HADOOP_HDFS means use hadoop deps from doris-thirdparty.
# the hadoop deps from doris-thirdparty contains some modification diff from the standard hadoop, such as log interface
add_definitions(-DUSE_DORIS_HADOOP_HDFS)
else()
add_library(hdfs3 STATIC IMPORTED)
set_target_properties(hdfs3 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libhdfs3.a)
Expand Down
3 changes: 2 additions & 1 deletion be/src/common/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1151,8 +1151,9 @@ DEFINE_Int32(rocksdb_max_write_buffer_number, "5");

DEFINE_mBool(allow_zero_date, "false");
DEFINE_Bool(allow_invalid_decimalv2_literal, "false");
DEFINE_mString(kerberos_ccache_path, "");
DEFINE_mString(kerberos_ccache_path, "/tmp/");
DEFINE_mString(kerberos_krb5_conf_path, "/etc/krb5.conf");
DEFINE_mInt32(kerberos_refresh_interval_second, "3600");

DEFINE_mString(get_stack_trace_tool, "libunwind");
DEFINE_mString(dwarf_location_info_mode, "FAST");
Expand Down
2 changes: 2 additions & 0 deletions be/src/common/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -1205,6 +1205,8 @@ DECLARE_mBool(allow_invalid_decimalv2_literal);
DECLARE_mString(kerberos_ccache_path);
// set krb5.conf path, use "/etc/krb5.conf" by default
DECLARE_mString(kerberos_krb5_conf_path);
// the interval for renew kerberos ticket cache
DECLARE_mInt32(kerberos_refresh_interval_second);

// Values include `none`, `glog`, `boost`, `glibc`, `libunwind`
DECLARE_mString(get_stack_trace_tool);
Expand Down
45 changes: 45 additions & 0 deletions be/src/common/kerberos/kerberos_config.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include "common/kerberos/kerberos_config.h"

#include <filesystem>

#include "common/config.h"
#include "util/md5.h"

namespace doris::kerberos {

KerberosConfig::KerberosConfig()
: _refresh_interval_second(3600), _min_time_before_refresh_second(600) {}

std::string KerberosConfig::get_hash_code(const std::string& principal, const std::string& keytab) {
return _get_hash_code(principal, keytab);
}

std::string KerberosConfig::_get_hash_code(const std::string& principal,
const std::string& keytab) {
// use md5(principal + keytab) as hash code
// so that same (principal + keytab) will have same name.
std::string combined = principal + keytab;
Md5Digest digest;
digest.update(combined.c_str(), combined.length());
digest.digest();
return digest.hex();
}

} // namespace doris::kerberos
77 changes: 77 additions & 0 deletions be/src/common/kerberos/kerberos_config.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#pragma once

#include <chrono>
#include <string>

#include "common/status.h"

namespace doris::kerberos {

// Configuration class for Kerberos authentication
class KerberosConfig {
public:
// Constructor with default values for refresh intervals
KerberosConfig();

// Set the Kerberos principal and keytab file path
void set_principal_and_keytab(const std::string& principal, const std::string& keytab) {
_principal = principal;
_keytab_path = keytab;
}
// Set the path to krb5.conf configuration file
void set_krb5_conf_path(const std::string& path) { _krb5_conf_path = path; }
// Set the interval for refreshing Kerberos tickets (in seconds)
void set_refresh_interval(int32_t interval) { _refresh_interval_second = interval; }
// Set the minimum time before refreshing tickets (in seconds)
void set_min_time_before_refresh(int32_t time) { _min_time_before_refresh_second = time; }

// Get the Kerberos principal name
const std::string& get_principal() const { return _principal; }
// Get the path to the keytab file
const std::string& get_keytab_path() const { return _keytab_path; }
// Get the path to krb5.conf configuration file
const std::string& get_krb5_conf_path() const { return _krb5_conf_path; }
// Get the ticket refresh interval in seconds
int32_t get_refresh_interval_second() const { return _refresh_interval_second; }
// Get the minimum time before refresh in seconds
int32_t get_min_time_before_refresh_second() const { return _min_time_before_refresh_second; }

std::string get_hash_code() const { return _get_hash_code(_principal, _keytab_path); }

// Use principal and keytab to generate a hash code.
static std::string get_hash_code(const std::string& principal, const std::string& keytab);

private:
static std::string _get_hash_code(const std::string& principal, const std::string& keytab);

private:
// Kerberos principal name (e.g., "[email protected]")
std::string _principal;
// Path to the Kerberos keytab file
std::string _keytab_path;
// Path to the Kerberos configuration file (krb5.conf)
std::string _krb5_conf_path;
// Interval for refreshing Kerberos tickets (in seconds)
int32_t _refresh_interval_second;
// Minimum time before refreshing tickets (in seconds)
int32_t _min_time_before_refresh_second;
};

} // namespace doris::kerberos
Loading

0 comments on commit b1e7432

Please sign in to comment.