|
1 | 1 | # Hash Functions
|
2 |
| -In C++, a hash refers to a function or algorithm that takes an input (or "key") and produces a fixed-size string of characters, which is typically a hexadecimal number or a sequence of bytes. Hash functions are commonly used in data structures like hash tables and for various other purposes, such as data encryption, password storage, and digital signatures. |
| 2 | +In C++, `std::hash` is a function object, also known as a functor, that provides a way to obtain a hash value for a given input of a specific type. When you declare something like `std::hash<float> float_hasher;` or `std::hash<std::string> str_hasher;`, you're creating an instance of `std::hash` specialized for `float` or `std::string`. These are used to generate hash values from floating-point numbers or strings, respectively. |
3 | 3 |
|
4 |
| -Here's a basic explanation of how to use a hash function in C++ with an example using the `std::hash` function from the C++ Standard Library: |
| 4 | +### How Does `std::hash` Work? |
| 5 | + |
| 6 | +1. **Hash Function**: |
| 7 | + - `std::hash` provides a `operator()` function that takes an object of the specified type and returns a `std::size_t`, which is an unsigned integer type used to represent sizes. |
| 8 | + - This `operator()` essentially converts the input (e.g., a `float` or a `std::string`) into a hash value, which is a number that ideally distributes inputs uniformly across its range. |
| 9 | + |
| 10 | +2. **Usage Example**: |
| 11 | + ```cpp |
| 12 | + std::hash<float> float_hasher; |
| 13 | + std::size_t hash_value = float_hasher(3.14f); // Hash value for the float 3.14 |
| 14 | + |
| 15 | + std::hash<std::string> str_hasher; |
| 16 | + std::size_t hash_value_str = str_hasher("hello"); // Hash value for the string "hello" |
| 17 | + ``` |
| 18 | + |
| 19 | +### User-defined Hash functions |
5 | 20 |
|
6 | 21 | ```cpp
|
7 |
| -#include <iostream> |
8 |
| -#include <functional> |
| 22 | +class student { |
| 23 | +public: |
| 24 | + int id; |
| 25 | + std::string first_name; |
| 26 | + std::string last_name; |
| 27 | + |
| 28 | + bool operator==(const student &other) const { |
| 29 | + return (first_name == other.first_name && last_name == other.last_name && |
| 30 | + id == other.id); |
| 31 | + } |
| 32 | +}; |
| 33 | +``` |
9 | 34 |
|
10 |
| -int main() { |
11 |
| - // Create a hash function object using std::hash |
12 |
| - std::hash<std::string> hasher; |
| 35 | +hash function: |
13 | 36 |
|
14 |
| - // Input data (a string) |
15 |
| - std::string input = "Hello, World!"; |
| 37 | +```cpp |
| 38 | +namespace std { |
| 39 | +
|
| 40 | +template <> struct hash<student> { |
| 41 | + std::size_t operator()(const student &k) const { |
16 | 42 |
|
17 |
| - // Calculate the hash value of the input |
18 |
| - size_t hashValue = hasher(input); |
| 43 | + // Compute individual hash values for first, |
| 44 | + // second and third and combine them using XOR |
| 45 | + // and bit shifting: |
19 | 46 |
|
20 |
| - // Display the hash value |
21 |
| - std::cout << "Hash value of '" << input << "': " << hashValue << std::endl; |
| 47 | + return ((std::hash<string>()(k.first_name) ^ |
| 48 | + (std::hash<string>()(k.last_name) << 1)) >> |
| 49 | + 1) ^ |
| 50 | + (std::hash<int>()(k.id) << 1); |
| 51 | + ; |
| 52 | + } |
| 53 | +}; |
22 | 54 |
|
23 |
| - return 0; |
24 | 55 | }
|
25 | 56 | ```
|
| 57 | +If you don't want to specialize template inside the std namespace (although it's perfectly legal in this case), you can define the hash function as a separate class and add it to the template argument list for the map: |
26 | 58 |
|
27 |
| -In this example, we include the `<iostream>` and `<functional>` headers, which are necessary for input/output and using the `std::hash` function, respectively. |
| 59 | +```cpp |
| 60 | +struct KeyHasher { |
| 61 | + std::size_t operator()(const student &k) const { |
| 62 | + |
| 63 | + return ((std::hash<std::string>()(k.first_name) ^ |
| 64 | + (std::hash<std::string>()(k.last_name) << 1)) >> |
| 65 | + 1) ^ |
| 66 | + (std::hash<int>()(k.id) << 1); |
| 67 | + } |
| 68 | +}; |
| 69 | +``` |
28 | 70 |
|
29 |
| -Here's how the code works: |
| 71 | +In your main: |
30 | 72 |
|
31 |
| -1. We create a hash function object called `hasher` using the `std::hash` template. In this case, we specify that we want to hash `std::string` objects. |
| 73 | +```cpp |
| 74 | +std::unordered_map<student, std::string> student_umap = { |
| 75 | + {{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}}; |
32 | 76 |
|
33 |
| -2. We define a `std::string` variable called `input` with the string "Hello, World!" as our input data. |
| 77 | + std::unordered_map<student, std::string, KeyHasher> m6 = { |
| 78 | + {{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}}; |
| 79 | +``` |
| 80 | + |
34 | 81 |
|
35 |
| -3. We calculate the hash value of the input string by invoking the `hasher` function with the `input` as its argument. This will return a `size_t` value representing the hash code of the input string. |
36 | 82 |
|
37 |
| -4. Finally, we display the hash value using `std::cout`. |
| 83 | +### Where is the Hash Table? |
38 | 84 |
|
39 |
| -It's important to note that the `std::hash` function is suitable for basic use cases, but it's not suitable for cryptographic purposes or when you need a hash function with specific properties like collision resistance. For cryptographic purposes, you should use cryptographic hash functions like SHA-256 or SHA-3, which are available in C++ through various libraries, such as OpenSSL or the C++ Standard Library's `<cryptopp>` library. |
| 85 | +- **No Hash Table in `std::hash`**: |
| 86 | + - The `std::hash` function object itself does not involve a hash table. It simply computes a hash value for a given input. |
| 87 | + - The responsibility of organizing and storing these hash values in a hash table belongs to containers that utilize hashing, such as `std::unordered_map`, `std::unordered_set`, etc. |
40 | 88 |
|
41 |
| -# Hash Data Structure (Hash Table) |
42 |
| -A hash data structure, often referred to as a hash table or hash map, is a data structure that uses a hash function to map keys to values. It allows for efficient retrieval and storage of values based on their associated keys. Hash tables are commonly used in computer science and programming for tasks like implementing dictionaries, caches, and database indexing. |
| 89 | +- **Hash Table in Containers**: |
| 90 | + - When you use a container like `std::unordered_map`, it internally uses `std::hash` to compute hash values for keys and organizes these values in a hash table. |
| 91 | + - The hash table itself is managed by the container, not by the `std::hash` function. |
43 | 92 |
|
44 |
| -Here's a basic explanation of a hash table in C++ with an example using the `std::unordered_map` container from the C++ Standard Library: |
| 93 | +### What is the Size of the Hash Table? |
45 | 94 |
|
46 |
| -```cpp |
47 |
| -#include <iostream> |
48 |
| -#include <unordered_map> |
| 95 | +In C++, to get information about the size of the hash table (i.e., the number of buckets) in a `std::unordered_map` or `std::unordered_set`, you can use the `bucket_count()` member function. This function returns the current number of buckets in the hash table. |
49 | 96 |
|
50 |
| -int main() { |
51 |
| - // Create an unordered_map (hash table) to store key-value pairs |
52 |
| - std::unordered_map<std::string, int> hashMap; |
| 97 | +Here are the relevant functions you can use: |
53 | 98 |
|
54 |
| - // Insert key-value pairs into the hash table |
55 |
| - hashMap["apple"] = 3; |
56 |
| - hashMap["banana"] = 2; |
57 |
| - hashMap["cherry"] = 5; |
| 99 | +1. **`bucket_count()`**: Returns the number of buckets in the hash table. |
| 100 | +2. **`load_factor()`**: Returns the current load factor, which is the average number of elements per bucket. |
| 101 | +3. **`max_load_factor()`**: Returns the maximum load factor before the container will automatically increase the number of buckets (rehash). |
| 102 | +4. **`bucket_size(bucket_index)`**: Returns the number of elements in the specified bucket. |
58 | 103 |
|
59 |
| - // Access values by their keys |
60 |
| - std::cout << "Number of apples: " << hashMap["apple"] << std::endl; |
61 |
| - std::cout << "Number of cherries: " << hashMap["cherry"] << std::endl; |
| 104 | +### Example Code |
62 | 105 |
|
63 |
| - return 0; |
64 |
| -} |
65 |
| -``` |
| 106 | +```cpp |
| 107 | + std::unordered_map<int, std::string> my_map = {{1, "one"}, {2, "two"}, {3, "three"}}; |
66 | 108 |
|
67 |
| -In this example, we use the `std::unordered_map` container, which is an implementation of a hash table. Here's how the code works: |
| 109 | + std::cout << "Number of buckets in my_map: " << my_map.bucket_count() << std::endl; |
| 110 | + std::cout << "Current load factor in my_map: " << my_map.load_factor() << std::endl; |
| 111 | + std::cout << "Max load factor in my_map: " << my_map.max_load_factor() << std::endl; |
68 | 112 |
|
69 |
| -1. We include the necessary header `<iostream>` for input/output and `<unordered_map>` for using the `std::unordered_map` container. |
| 113 | + // Example with std::unordered_set |
| 114 | + std::unordered_set<int> my_set = {1, 2, 3, 4, 5}; |
70 | 115 |
|
71 |
| -2. We create an `std::unordered_map` named `hashMap` that associates `std::string` keys with `int` values. |
| 116 | + std::cout << "Number of buckets in my_set: " << my_set.bucket_count() << std::endl; |
| 117 | + std::cout << "Current load factor in my_set: " << my_set.load_factor() << std::endl; |
| 118 | + std::cout << "Max load factor in my_set: " << my_set.max_load_factor() << std::endl; |
72 | 119 |
|
73 |
| -3. We insert key-value pairs into the hash table using the `[]` operator. For example, we associate the key "apple" with the value 3. |
| 120 | + // Accessing the size of a specific bucket |
| 121 | + size_t bucket_index = 0; |
| 122 | + std::cout << "Elements in bucket " << bucket_index << " of my_map: " << my_map.bucket_size(bucket_index) << std::endl; |
| 123 | +``` |
74 | 124 |
|
75 |
| -4. We access values from the hash table using their corresponding keys. In this case, we retrieve the number of apples and cherries from the hash table and display them using `std::cout`. |
76 | 125 |
|
77 |
| -So, to clarify, the example provided above demonstrates the use of a hash data structure (hash table) in C++. It uses a hash function internally to efficiently store and retrieve values based on their keys. |
| 126 | +For the code above, you might see output similar to: |
78 | 127 |
|
| 128 | +```bash |
| 129 | +Number of buckets in my_map: 8 |
| 130 | +Current load factor in my_map: 0.375 |
| 131 | +Max load factor in my_map: 1 |
| 132 | +Number of buckets in my_set: 10 |
| 133 | +Current load factor in my_set: 0.5 |
| 134 | +Max load factor in my_set: 1 |
| 135 | +Elements in bucket 0 of my_map: 0 |
| 136 | +``` |
79 | 137 |
|
80 | 138 |
|
81 | 139 | [code](../src/hash.cpp)
|
0 commit comments