Skip to content

Commit 62dd98d

Browse files
committed
hashing updated
1 parent 75a8d7c commit 62dd98d

File tree

3 files changed

+263
-48
lines changed

3 files changed

+263
-48
lines changed

docs/hash_function_hash_table.md

Lines changed: 104 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,139 @@
11
# Hash Functions
2-
In C++, a hash refers to a function or algorithm that takes an input (or "key") and produces a fixed-size string of characters, which is typically a hexadecimal number or a sequence of bytes. Hash functions are commonly used in data structures like hash tables and for various other purposes, such as data encryption, password storage, and digital signatures.
2+
In C++, `std::hash` is a function object, also known as a functor, that provides a way to obtain a hash value for a given input of a specific type. When you declare something like `std::hash<float> float_hasher;` or `std::hash<std::string> str_hasher;`, you're creating an instance of `std::hash` specialized for `float` or `std::string`. These are used to generate hash values from floating-point numbers or strings, respectively.
33

4-
Here's a basic explanation of how to use a hash function in C++ with an example using the `std::hash` function from the C++ Standard Library:
4+
### How Does `std::hash` Work?
5+
6+
1. **Hash Function**:
7+
- `std::hash` provides a `operator()` function that takes an object of the specified type and returns a `std::size_t`, which is an unsigned integer type used to represent sizes.
8+
- This `operator()` essentially converts the input (e.g., a `float` or a `std::string`) into a hash value, which is a number that ideally distributes inputs uniformly across its range.
9+
10+
2. **Usage Example**:
11+
```cpp
12+
std::hash<float> float_hasher;
13+
std::size_t hash_value = float_hasher(3.14f); // Hash value for the float 3.14
14+
15+
std::hash<std::string> str_hasher;
16+
std::size_t hash_value_str = str_hasher("hello"); // Hash value for the string "hello"
17+
```
18+
19+
### User-defined Hash functions
520

621
```cpp
7-
#include <iostream>
8-
#include <functional>
22+
class student {
23+
public:
24+
int id;
25+
std::string first_name;
26+
std::string last_name;
27+
28+
bool operator==(const student &other) const {
29+
return (first_name == other.first_name && last_name == other.last_name &&
30+
id == other.id);
31+
}
32+
};
33+
```
934
10-
int main() {
11-
// Create a hash function object using std::hash
12-
std::hash<std::string> hasher;
35+
hash function:
1336
14-
// Input data (a string)
15-
std::string input = "Hello, World!";
37+
```cpp
38+
namespace std {
39+
40+
template <> struct hash<student> {
41+
std::size_t operator()(const student &k) const {
1642
17-
// Calculate the hash value of the input
18-
size_t hashValue = hasher(input);
43+
// Compute individual hash values for first,
44+
// second and third and combine them using XOR
45+
// and bit shifting:
1946
20-
// Display the hash value
21-
std::cout << "Hash value of '" << input << "': " << hashValue << std::endl;
47+
return ((std::hash<string>()(k.first_name) ^
48+
(std::hash<string>()(k.last_name) << 1)) >>
49+
1) ^
50+
(std::hash<int>()(k.id) << 1);
51+
;
52+
}
53+
};
2254
23-
return 0;
2455
}
2556
```
57+
If you don't want to specialize template inside the std namespace (although it's perfectly legal in this case), you can define the hash function as a separate class and add it to the template argument list for the map:
2658

27-
In this example, we include the `<iostream>` and `<functional>` headers, which are necessary for input/output and using the `std::hash` function, respectively.
59+
```cpp
60+
struct KeyHasher {
61+
std::size_t operator()(const student &k) const {
62+
63+
return ((std::hash<std::string>()(k.first_name) ^
64+
(std::hash<std::string>()(k.last_name) << 1)) >>
65+
1) ^
66+
(std::hash<int>()(k.id) << 1);
67+
}
68+
};
69+
```
2870
29-
Here's how the code works:
71+
In your main:
3072
31-
1. We create a hash function object called `hasher` using the `std::hash` template. In this case, we specify that we want to hash `std::string` objects.
73+
```cpp
74+
std::unordered_map<student, std::string> student_umap = {
75+
{{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
3276
33-
2. We define a `std::string` variable called `input` with the string "Hello, World!" as our input data.
77+
std::unordered_map<student, std::string, KeyHasher> m6 = {
78+
{{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
79+
```
80+
3481

35-
3. We calculate the hash value of the input string by invoking the `hasher` function with the `input` as its argument. This will return a `size_t` value representing the hash code of the input string.
3682

37-
4. Finally, we display the hash value using `std::cout`.
83+
### Where is the Hash Table?
3884

39-
It's important to note that the `std::hash` function is suitable for basic use cases, but it's not suitable for cryptographic purposes or when you need a hash function with specific properties like collision resistance. For cryptographic purposes, you should use cryptographic hash functions like SHA-256 or SHA-3, which are available in C++ through various libraries, such as OpenSSL or the C++ Standard Library's `<cryptopp>` library.
85+
- **No Hash Table in `std::hash`**:
86+
- The `std::hash` function object itself does not involve a hash table. It simply computes a hash value for a given input.
87+
- The responsibility of organizing and storing these hash values in a hash table belongs to containers that utilize hashing, such as `std::unordered_map`, `std::unordered_set`, etc.
4088

41-
# Hash Data Structure (Hash Table)
42-
A hash data structure, often referred to as a hash table or hash map, is a data structure that uses a hash function to map keys to values. It allows for efficient retrieval and storage of values based on their associated keys. Hash tables are commonly used in computer science and programming for tasks like implementing dictionaries, caches, and database indexing.
89+
- **Hash Table in Containers**:
90+
- When you use a container like `std::unordered_map`, it internally uses `std::hash` to compute hash values for keys and organizes these values in a hash table.
91+
- The hash table itself is managed by the container, not by the `std::hash` function.
4392

44-
Here's a basic explanation of a hash table in C++ with an example using the `std::unordered_map` container from the C++ Standard Library:
93+
### What is the Size of the Hash Table?
4594

46-
```cpp
47-
#include <iostream>
48-
#include <unordered_map>
95+
In C++, to get information about the size of the hash table (i.e., the number of buckets) in a `std::unordered_map` or `std::unordered_set`, you can use the `bucket_count()` member function. This function returns the current number of buckets in the hash table.
4996

50-
int main() {
51-
// Create an unordered_map (hash table) to store key-value pairs
52-
std::unordered_map<std::string, int> hashMap;
97+
Here are the relevant functions you can use:
5398

54-
// Insert key-value pairs into the hash table
55-
hashMap["apple"] = 3;
56-
hashMap["banana"] = 2;
57-
hashMap["cherry"] = 5;
99+
1. **`bucket_count()`**: Returns the number of buckets in the hash table.
100+
2. **`load_factor()`**: Returns the current load factor, which is the average number of elements per bucket.
101+
3. **`max_load_factor()`**: Returns the maximum load factor before the container will automatically increase the number of buckets (rehash).
102+
4. **`bucket_size(bucket_index)`**: Returns the number of elements in the specified bucket.
58103

59-
// Access values by their keys
60-
std::cout << "Number of apples: " << hashMap["apple"] << std::endl;
61-
std::cout << "Number of cherries: " << hashMap["cherry"] << std::endl;
104+
### Example Code
62105

63-
return 0;
64-
}
65-
```
106+
```cpp
107+
std::unordered_map<int, std::string> my_map = {{1, "one"}, {2, "two"}, {3, "three"}};
66108

67-
In this example, we use the `std::unordered_map` container, which is an implementation of a hash table. Here's how the code works:
109+
std::cout << "Number of buckets in my_map: " << my_map.bucket_count() << std::endl;
110+
std::cout << "Current load factor in my_map: " << my_map.load_factor() << std::endl;
111+
std::cout << "Max load factor in my_map: " << my_map.max_load_factor() << std::endl;
68112

69-
1. We include the necessary header `<iostream>` for input/output and `<unordered_map>` for using the `std::unordered_map` container.
113+
// Example with std::unordered_set
114+
std::unordered_set<int> my_set = {1, 2, 3, 4, 5};
70115

71-
2. We create an `std::unordered_map` named `hashMap` that associates `std::string` keys with `int` values.
116+
std::cout << "Number of buckets in my_set: " << my_set.bucket_count() << std::endl;
117+
std::cout << "Current load factor in my_set: " << my_set.load_factor() << std::endl;
118+
std::cout << "Max load factor in my_set: " << my_set.max_load_factor() << std::endl;
72119

73-
3. We insert key-value pairs into the hash table using the `[]` operator. For example, we associate the key "apple" with the value 3.
120+
// Accessing the size of a specific bucket
121+
size_t bucket_index = 0;
122+
std::cout << "Elements in bucket " << bucket_index << " of my_map: " << my_map.bucket_size(bucket_index) << std::endl;
123+
```
74124
75-
4. We access values from the hash table using their corresponding keys. In this case, we retrieve the number of apples and cherries from the hash table and display them using `std::cout`.
76125
77-
So, to clarify, the example provided above demonstrates the use of a hash data structure (hash table) in C++. It uses a hash function internally to efficiently store and retrieve values based on their keys.
126+
For the code above, you might see output similar to:
78127
128+
```bash
129+
Number of buckets in my_map: 8
130+
Current load factor in my_map: 0.375
131+
Max load factor in my_map: 1
132+
Number of buckets in my_set: 10
133+
Current load factor in my_set: 0.5
134+
Max load factor in my_set: 1
135+
Elements in bucket 0 of my_map: 0
136+
```
79137

80138

81139
[code](../src/hash.cpp)

docs/set_map_pair_tuple.md

Lines changed: 125 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,14 @@
88
* [unordered_map user defined type](#unordered-map-user-defined-type)
99
* [set user defined type](#set-user-defined-type)
1010
* [unordered_set user defined type](#unordered-set-user-defined-type)
11+
* [Real-world Examples and Applications of std::unordered_map and std::unordered_set](#real-world-examples-and-applications-of-std--unordered-map-and-std--unordered-set)
12+
+ [1. **Caching (Memoization) in Dynamic Programming**](#1---caching--memoization--in-dynamic-programming--)
13+
+ [2. **Counting Word Frequencies in Text Processing**](#2---counting-word-frequencies-in-text-processing--)
14+
+ [3. **Tracking Unique Visitors on a Website**](#3---tracking-unique-visitors-on-a-website--)
15+
+ [4. **Building an Inverted Index for a Search Engine**](#4---building-an-inverted-index-for-a-search-engine--)
16+
+ [5. **Routing in Network Applications**](#5---routing-in-network-applications--)
17+
+ [6. **Deduplication in Large Datasets**](#6---deduplication-in-large-datasets--)
18+
+ [Why `std::unordered_map` and `std::unordered_set`?](#why--std--unordered-map--and--std--unordered-set--)
1119
- [tie](#tie)
1220
- [tuple](#tuple)
1321
- [pair](#pair)
@@ -459,6 +467,122 @@ courses.insert(c2);
459467
courses.find(c2)->m_name;
460468
```
461469

470+
## Real-world Examples and Applications of std::unordered_map and std::unordered_set
471+
472+
473+
`std::unordered_map` and `std::unordered_set` are powerful containers in C++ that are particularly useful in situations where you need fast lookups, insertions, and deletions. Here are some real-world examples where these containers shine:
474+
475+
### 1. **Caching (Memoization) in Dynamic Programming**
476+
- **Scenario**: You're implementing a dynamic programming solution, such as solving the Fibonacci sequence, and want to avoid recalculating results for the same inputs.
477+
- **Use Case**: A `std::unordered_map` can be used to store previously computed values (e.g., `fib(n)`) so that when the function is called with the same argument again, the result can be retrieved in constant time.
478+
- **Example**: Calculating Fibonacci numbers using memoization.
479+
```cpp
480+
std::unordered_map<int, long long> fib_cache;
481+
482+
long long fib(int n) {
483+
if (n <= 1) return n;
484+
if (fib_cache.find(n) != fib_cache.end()) {
485+
return fib_cache[n];
486+
}
487+
long long result = fib(n - 1) + fib(n - 2);
488+
fib_cache[n] = result;
489+
return result;
490+
}
491+
```
492+
493+
### 2. **Counting Word Frequencies in Text Processing**
494+
- **Scenario**: In text processing or natural language processing (NLP), you often need to count the frequency of words in a large corpus of text.
495+
- **Use Case**: A `std::unordered_map<std::string, int>` can efficiently store words as keys and their frequencies as values. The fast lookups provided by the hash table are crucial when processing large amounts of text.
496+
- **Example**: Counting the frequency of each word in a document.
497+
```cpp
498+
std::unordered_map<std::string, int> word_count;
499+
500+
void count_words(const std::string& text) {
501+
std::istringstream stream(text);
502+
std::string word;
503+
while (stream >> word) {
504+
++word_count[word];
505+
}
506+
}
507+
```
508+
509+
### 3. **Tracking Unique Visitors on a Website**
510+
- **Scenario**: A website wants to track the number of unique visitors in a day by storing their IP addresses.
511+
- **Use Case**: A `std::unordered_set<std::string>` is ideal for this scenario, where each IP address is stored only once, ensuring uniqueness. The fast insertions and lookups help maintain performance even with a large number of visitors.
512+
- **Example**: Tracking unique visitor IP addresses.
513+
```cpp
514+
std::unordered_set<std::string> unique_ips;
515+
516+
void log_visit(const std::string& ip_address) {
517+
unique_ips.insert(ip_address);
518+
}
519+
520+
size_t unique_visitors() {
521+
return unique_ips.size();
522+
}
523+
```
524+
525+
### 4. **Building an Inverted Index for a Search Engine**
526+
- **Scenario**: In a search engine, you need to build an inverted index that maps each word to the list of documents that contain that word.
527+
- **Use Case**: A `std::unordered_map<std::string, std::unordered_set<int>>` can be used, where the key is a word, and the value is a set of document IDs. The `std::unordered_set` ensures that each document ID is unique for a given word.
528+
- **Example**: Building an inverted index.
529+
```cpp
530+
std::unordered_map<std::string, std::unordered_set<int>> inverted_index;
531+
532+
void index_document(int doc_id, const std::string& content) {
533+
std::istringstream stream(content);
534+
std::string word;
535+
while (stream >> word) {
536+
inverted_index[word].insert(doc_id);
537+
}
538+
}
539+
```
540+
541+
### 5. **Routing in Network Applications**
542+
- **Scenario**: In a peer-to-peer network application, you need to manage a dynamic list of active connections (identified by IP and port) to route data efficiently.
543+
- **Use Case**: A `std::unordered_set<std::pair<std::string, int>, CustomHash>` can track active connections. Using a custom hash function to hash the combination of IP address and port ensures that connections are unique and can be efficiently managed.
544+
- **Example**: Managing active network connections.
545+
```cpp
546+
struct Connection {
547+
std::string ip;
548+
int port;
549+
bool operator==(const Connection& other) const {
550+
return ip == other.ip && port == other.port;
551+
}
552+
};
553+
554+
struct ConnectionHash {
555+
std::size_t operator()(const Connection& conn) const {
556+
return std::hash<std::string>()(conn.ip) ^ std::hash<int>()(conn.port);
557+
}
558+
};
559+
560+
std::unordered_set<Connection, ConnectionHash> active_connections;
561+
562+
void add_connection(const std::string& ip, int port) {
563+
active_connections.insert({ip, port});
564+
}
565+
```
566+
567+
### 6. **Deduplication in Large Datasets**
568+
- **Scenario**: You have a large dataset with potential duplicate records, and you want to remove these duplicates efficiently.
569+
- **Use Case**: A `std::unordered_set` is ideal for deduplication, as it only allows unique elements. You can insert records into the set, and duplicates will be automatically discarded.
570+
- **Example**: Deduplicating a list of user IDs.
571+
```cpp
572+
std::unordered_set<int> unique_user_ids;
573+
574+
void add_user(int user_id) {
575+
unique_user_ids.insert(user_id);
576+
}
577+
```
578+
579+
### Why `std::unordered_map` and `std::unordered_set`?
580+
581+
- **Performance**: Both containers provide average-case constant time complexity (`O(1)`) for insertions, lookups, and deletions, which is often crucial in performance-sensitive applications.
582+
- **Ease of Use**: They provide a simple and effective way to manage key-value pairs or unique collections without worrying about the underlying implementation details.
583+
- **Flexibility**: They can be used in a variety of real-world scenarios that require fast access to data, uniqueness enforcement, or efficient key-based retrieval.
584+
585+
These examples demonstrate how `std::unordered_map` and `std::unordered_set` can be applied to solve practical problems efficiently, leveraging their strengths in situations where speed and unique data management are critical.
462586

463587

464588
# tie
@@ -546,7 +670,7 @@ if(items["mumbo jumo"]==NULL)
546670
{
547671
std::cout<<"not found" <<std::endl;
548672
}
549-
```
673+
```
550674

551675

552676

src/hash.cpp

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
#include <iostream>
2+
#include <set>
23
#include <string>
34
#include <unordered_map>
5+
#include <unordered_set>
46

57
// user-defined hash functions:
68
// Example 1
@@ -87,4 +89,35 @@ void unordered_mapCustomClasstype() {
8789
{{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
8890
}
8991

90-
int main() {}
92+
void sizeOfTheHashTable() {
93+
// Example with std::unordered_map
94+
std::unordered_map<int, std::string> my_map = {
95+
{1, "one"}, {2, "two"}, {3, "three"}};
96+
97+
std::cout << "Number of buckets in my_map: " << my_map.bucket_count()
98+
<< std::endl;
99+
std::cout << "Current load factor in my_map: " << my_map.load_factor()
100+
<< std::endl;
101+
std::cout << "Max load factor in my_map: " << my_map.max_load_factor()
102+
<< std::endl;
103+
104+
// Example with std::unordered_set
105+
std::unordered_set<int> my_set = {1, 2, 3, 4, 5};
106+
107+
std::cout << "Number of buckets in my_set: " << my_set.bucket_count()
108+
<< std::endl;
109+
std::cout << "Current load factor in my_set: " << my_set.load_factor()
110+
<< std::endl;
111+
std::cout << "Max load factor in my_set: " << my_set.max_load_factor()
112+
<< std::endl;
113+
114+
// Accessing the size of a specific bucket
115+
size_t bucket_index = 0;
116+
std::cout << "Elements in bucket " << bucket_index
117+
<< " of my_map: " << my_map.bucket_size(bucket_index) << std::endl;
118+
}
119+
120+
int main() {
121+
unordered_mapCustomClasstype();
122+
sizeOfTheHashTable();
123+
}

0 commit comments

Comments
 (0)