hashing updated

behnamasadi · behnamasadi · commit 62dd98dc368f · 2024-08-25T22:02:44.000+02:00
diff --git a/docs/hash_function_hash_table.md b/docs/hash_function_hash_table.md
@@ -1,81 +1,139 @@
 # Hash Functions
-In C++, a hash refers to a function or algorithm that takes an input (or "key") and produces a fixed-size string of characters, which is typically a hexadecimal number or a sequence of bytes. Hash functions are commonly used in data structures like hash tables and for various other purposes, such as data encryption, password storage, and digital signatures.
+In C++, `std::hash` is a function object, also known as a functor, that provides a way to obtain a hash value for a given input of a specific type. When you declare something like `std::hash<float> float_hasher;` or `std::hash<std::string> str_hasher;`, you're creating an instance of `std::hash` specialized for `float` or `std::string`. These are used to generate hash values from floating-point numbers or strings, respectively.
 
-Here's a basic explanation of how to use a hash function in C++ with an example using the `std::hash` function from the C++ Standard Library:
+### How Does `std::hash` Work?
+
+1. **Hash Function**:
+   - `std::hash` provides a `operator()` function that takes an object of the specified type and returns a `std::size_t`, which is an unsigned integer type used to represent sizes.
+   - This `operator()` essentially converts the input (e.g., a `float` or a `std::string`) into a hash value, which is a number that ideally distributes inputs uniformly across its range.
+
+2. **Usage Example**:
+   ```cpp
+   std::hash<float> float_hasher;
+   std::size_t hash_value = float_hasher(3.14f);  // Hash value for the float 3.14
+   
+   std::hash<std::string> str_hasher;
+   std::size_t hash_value_str = str_hasher("hello");  // Hash value for the string "hello"
+   ```
+
+### User-defined Hash functions
 
 ```cpp
-#include <iostream>
-#include <functional>
+class student {
+public:
+  int id;
+  std::string first_name;
+  std::string last_name;
+
+  bool operator==(const student &other) const {
+    return (first_name == other.first_name && last_name == other.last_name &&
+            id == other.id);
+  }
+};
+```
 
-int main() {
-    // Create a hash function object using std::hash
-    std::hash<std::string> hasher;
+hash function:
 
-    // Input data (a string)
-    std::string input = "Hello, World!";
+```cpp
+namespace std {
+
+template <> struct hash<student> {
+  std::size_t operator()(const student &k) const {
 
-    // Calculate the hash value of the input
-    size_t hashValue = hasher(input);
+    // Compute individual hash values for first,
+    // second and third and combine them using XOR
+    // and bit shifting:
 
-    // Display the hash value
-    std::cout << "Hash value of '" << input << "': " << hashValue << std::endl;
+    return ((std::hash<string>()(k.first_name) ^
+             (std::hash<string>()(k.last_name) << 1)) >>
+            1) ^
+           (std::hash<int>()(k.id) << 1);
+    ;
+  }
+};
 
-    return 0;
 }
 ```
+If you don't want to specialize template inside the std namespace (although it's perfectly legal in this case), you can define the hash function as a separate class and add it to the template argument list for the map:
 
-In this example, we include the `<iostream>` and `<functional>` headers, which are necessary for input/output and using the `std::hash` function, respectively.
+```cpp
+struct KeyHasher {
+  std::size_t operator()(const student &k) const {
+
+    return ((std::hash<std::string>()(k.first_name) ^
+             (std::hash<std::string>()(k.last_name) << 1)) >>
+            1) ^
+           (std::hash<int>()(k.id) << 1);
+  }
+};
+```
 
-Here's how the code works:
+In your main:
 
-1. We create a hash function object called `hasher` using the `std::hash` template. In this case, we specify that we want to hash `std::string` objects.
+```cpp
+std::unordered_map<student, std::string> student_umap = {
+      {{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
 
-2. We define a `std::string` variable called `input` with the string "Hello, World!" as our input data.
+  std::unordered_map<student, std::string, KeyHasher> m6 = {
+      {{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
+```
+    
 
-3. We calculate the hash value of the input string by invoking the `hasher` function with the `input` as its argument. This will return a `size_t` value representing the hash code of the input string.
 
-4. Finally, we display the hash value using `std::cout`.
+### Where is the Hash Table?
 
-It's important to note that the `std::hash` function is suitable for basic use cases, but it's not suitable for cryptographic purposes or when you need a hash function with specific properties like collision resistance. For cryptographic purposes, you should use cryptographic hash functions like SHA-256 or SHA-3, which are available in C++ through various libraries, such as OpenSSL or the C++ Standard Library's `<cryptopp>` library.
+- **No Hash Table in `std::hash`**:
+   - The `std::hash` function object itself does not involve a hash table. It simply computes a hash value for a given input.
+   - The responsibility of organizing and storing these hash values in a hash table belongs to containers that utilize hashing, such as `std::unordered_map`, `std::unordered_set`, etc.
 
-# Hash Data Structure (Hash Table)
-A hash data structure, often referred to as a hash table or hash map, is a data structure that uses a hash function to map keys to values. It allows for efficient retrieval and storage of values based on their associated keys. Hash tables are commonly used in computer science and programming for tasks like implementing dictionaries, caches, and database indexing.
+- **Hash Table in Containers**:
+   - When you use a container like `std::unordered_map`, it internally uses `std::hash` to compute hash values for keys and organizes these values in a hash table.
+   - The hash table itself is managed by the container, not by the `std::hash` function.
 
-Here's a basic explanation of a hash table in C++ with an example using the `std::unordered_map` container from the C++ Standard Library:
+### What is the Size of the Hash Table?
 
-```cpp
-#include <iostream>
-#include <unordered_map>
+In C++, to get information about the size of the hash table (i.e., the number of buckets) in a `std::unordered_map` or `std::unordered_set`, you can use the `bucket_count()` member function. This function returns the current number of buckets in the hash table.
 
-int main() {
-    // Create an unordered_map (hash table) to store key-value pairs
-    std::unordered_map<std::string, int> hashMap;
+Here are the relevant functions you can use:
 
-    // Insert key-value pairs into the hash table
-    hashMap["apple"] = 3;
-    hashMap["banana"] = 2;
-    hashMap["cherry"] = 5;
+1. **`bucket_count()`**: Returns the number of buckets in the hash table.
+2. **`load_factor()`**: Returns the current load factor, which is the average number of elements per bucket.
+3. **`max_load_factor()`**: Returns the maximum load factor before the container will automatically increase the number of buckets (rehash).
+4. **`bucket_size(bucket_index)`**: Returns the number of elements in the specified bucket.
 
-    // Access values by their keys
-    std::cout << "Number of apples: " << hashMap["apple"] << std::endl;
-    std::cout << "Number of cherries: " << hashMap["cherry"] << std::endl;
+### Example Code
 
-    return 0;
-}
-```
+```cpp
+    std::unordered_map<int, std::string> my_map = {{1, "one"}, {2, "two"}, {3, "three"}};
 
-In this example, we use the `std::unordered_map` container, which is an implementation of a hash table. Here's how the code works:
+    std::cout << "Number of buckets in my_map: " << my_map.bucket_count() << std::endl;
+    std::cout << "Current load factor in my_map: " << my_map.load_factor() << std::endl;
+    std::cout << "Max load factor in my_map: " << my_map.max_load_factor() << std::endl;
 
-1. We include the necessary header `<iostream>` for input/output and `<unordered_map>` for using the `std::unordered_map` container.
+    // Example with std::unordered_set
+    std::unordered_set<int> my_set = {1, 2, 3, 4, 5};
 
-2. We create an `std::unordered_map` named `hashMap` that associates `std::string` keys with `int` values.
+    std::cout << "Number of buckets in my_set: " << my_set.bucket_count() << std::endl;
+    std::cout << "Current load factor in my_set: " << my_set.load_factor() << std::endl;
+    std::cout << "Max load factor in my_set: " << my_set.max_load_factor() << std::endl;
 
-3. We insert key-value pairs into the hash table using the `[]` operator. For example, we associate the key "apple" with the value 3.
+    // Accessing the size of a specific bucket
+    size_t bucket_index = 0;
+    std::cout << "Elements in bucket " << bucket_index << " of my_map: " << my_map.bucket_size(bucket_index) << std::endl;
+```
 
-4. We access values from the hash table using their corresponding keys. In this case, we retrieve the number of apples and cherries from the hash table and display them using `std::cout`.
 
-So, to clarify, the example provided above demonstrates the use of a hash data structure (hash table) in C++. It uses a hash function internally to efficiently store and retrieve values based on their keys.
+For the code above, you might see output similar to:
 
+```bash
+Number of buckets in my_map: 8
+Current load factor in my_map: 0.375
+Max load factor in my_map: 1
+Number of buckets in my_set: 10
+Current load factor in my_set: 0.5
+Max load factor in my_set: 1
+Elements in bucket 0 of my_map: 0
+```
 
 
 [code](../src/hash.cpp)
diff --git a/docs/set_map_pair_tuple.md b/docs/set_map_pair_tuple.md
@@ -8,6 +8,14 @@
   * [unordered_map user defined type](#unordered-map-user-defined-type)
   * [set user defined type](#set-user-defined-type)
   * [unordered_set user defined type](#unordered-set-user-defined-type)
+  * [Real-world Examples and Applications of std::unordered_map and std::unordered_set](#real-world-examples-and-applications-of-std--unordered-map-and-std--unordered-set)
+    + [1. **Caching (Memoization) in Dynamic Programming**](#1---caching--memoization--in-dynamic-programming--)
+    + [2. **Counting Word Frequencies in Text Processing**](#2---counting-word-frequencies-in-text-processing--)
+    + [3. **Tracking Unique Visitors on a Website**](#3---tracking-unique-visitors-on-a-website--)
+    + [4. **Building an Inverted Index for a Search Engine**](#4---building-an-inverted-index-for-a-search-engine--)
+    + [5. **Routing in Network Applications**](#5---routing-in-network-applications--)
+    + [6. **Deduplication in Large Datasets**](#6---deduplication-in-large-datasets--)
+    + [Why `std::unordered_map` and `std::unordered_set`?](#why--std--unordered-map--and--std--unordered-set--)
 - [tie](#tie)
 - [tuple](#tuple)
 - [pair](#pair)
@@ -459,6 +467,122 @@ courses.insert(c2);
 courses.find(c2)->m_name;
 ```
 
+## Real-world Examples and Applications of std::unordered_map and std::unordered_set
+
+
+`std::unordered_map` and `std::unordered_set` are powerful containers in C++ that are particularly useful in situations where you need fast lookups, insertions, and deletions. Here are some real-world examples where these containers shine:
+
+### 1. **Caching (Memoization) in Dynamic Programming**
+   - **Scenario**: You're implementing a dynamic programming solution, such as solving the Fibonacci sequence, and want to avoid recalculating results for the same inputs.
+   - **Use Case**: A `std::unordered_map` can be used to store previously computed values (e.g., `fib(n)`) so that when the function is called with the same argument again, the result can be retrieved in constant time.
+   - **Example**: Calculating Fibonacci numbers using memoization.
+     ```cpp
+     std::unordered_map<int, long long> fib_cache;
+
+     long long fib(int n) {
+         if (n <= 1) return n;
+         if (fib_cache.find(n) != fib_cache.end()) {
+             return fib_cache[n];
+         }
+         long long result = fib(n - 1) + fib(n - 2);
+         fib_cache[n] = result;
+         return result;
+     }
+     ```
+
+### 2. **Counting Word Frequencies in Text Processing**
+   - **Scenario**: In text processing or natural language processing (NLP), you often need to count the frequency of words in a large corpus of text.
+   - **Use Case**: A `std::unordered_map<std::string, int>` can efficiently store words as keys and their frequencies as values. The fast lookups provided by the hash table are crucial when processing large amounts of text.
+   - **Example**: Counting the frequency of each word in a document.
+     ```cpp
+     std::unordered_map<std::string, int> word_count;
+
+     void count_words(const std::string& text) {
+         std::istringstream stream(text);
+         std::string word;
+         while (stream >> word) {
+             ++word_count[word];
+         }
+     }
+     ```
+
+### 3. **Tracking Unique Visitors on a Website**
+   - **Scenario**: A website wants to track the number of unique visitors in a day by storing their IP addresses.
+   - **Use Case**: A `std::unordered_set<std::string>` is ideal for this scenario, where each IP address is stored only once, ensuring uniqueness. The fast insertions and lookups help maintain performance even with a large number of visitors.
+   - **Example**: Tracking unique visitor IP addresses.
+     ```cpp
+     std::unordered_set<std::string> unique_ips;
+
+     void log_visit(const std::string& ip_address) {
+         unique_ips.insert(ip_address);
+     }
+
+     size_t unique_visitors() {
+         return unique_ips.size();
+     }
+     ```
+
+### 4. **Building an Inverted Index for a Search Engine**
+   - **Scenario**: In a search engine, you need to build an inverted index that maps each word to the list of documents that contain that word.
+   - **Use Case**: A `std::unordered_map<std::string, std::unordered_set<int>>` can be used, where the key is a word, and the value is a set of document IDs. The `std::unordered_set` ensures that each document ID is unique for a given word.
+   - **Example**: Building an inverted index.
+     ```cpp
+     std::unordered_map<std::string, std::unordered_set<int>> inverted_index;
+
+     void index_document(int doc_id, const std::string& content) {
+         std::istringstream stream(content);
+         std::string word;
+         while (stream >> word) {
+             inverted_index[word].insert(doc_id);
+         }
+     }
+     ```
+
+### 5. **Routing in Network Applications**
+   - **Scenario**: In a peer-to-peer network application, you need to manage a dynamic list of active connections (identified by IP and port) to route data efficiently.
+   - **Use Case**: A `std::unordered_set<std::pair<std::string, int>, CustomHash>` can track active connections. Using a custom hash function to hash the combination of IP address and port ensures that connections are unique and can be efficiently managed.
+   - **Example**: Managing active network connections.
+     ```cpp
+     struct Connection {
+         std::string ip;
+         int port;
+         bool operator==(const Connection& other) const {
+             return ip == other.ip && port == other.port;
+         }
+     };
+
+     struct ConnectionHash {
+         std::size_t operator()(const Connection& conn) const {
+             return std::hash<std::string>()(conn.ip) ^ std::hash<int>()(conn.port);
+         }
+     };
+
+     std::unordered_set<Connection, ConnectionHash> active_connections;
+
+     void add_connection(const std::string& ip, int port) {
+         active_connections.insert({ip, port});
+     }
+     ```
+
+### 6. **Deduplication in Large Datasets**
+   - **Scenario**: You have a large dataset with potential duplicate records, and you want to remove these duplicates efficiently.
+   - **Use Case**: A `std::unordered_set` is ideal for deduplication, as it only allows unique elements. You can insert records into the set, and duplicates will be automatically discarded.
+   - **Example**: Deduplicating a list of user IDs.
+     ```cpp
+     std::unordered_set<int> unique_user_ids;
+
+     void add_user(int user_id) {
+         unique_user_ids.insert(user_id);
+     }
+     ```
+
+### Why `std::unordered_map` and `std::unordered_set`?
+
+- **Performance**: Both containers provide average-case constant time complexity (`O(1)`) for insertions, lookups, and deletions, which is often crucial in performance-sensitive applications.
+- **Ease of Use**: They provide a simple and effective way to manage key-value pairs or unique collections without worrying about the underlying implementation details.
+- **Flexibility**: They can be used in a variety of real-world scenarios that require fast access to data, uniqueness enforcement, or efficient key-based retrieval.
+
+These examples demonstrate how `std::unordered_map` and `std::unordered_set` can be applied to solve practical problems efficiently, leveraging their strengths in situations where speed and unique data management are critical.
 
  
 # tie
@@ -546,7 +670,7 @@ if(items["mumbo jumo"]==NULL)
 {
     std::cout<<"not found" <<std::endl;
 }
-```    
+```
 
 
 
diff --git a/src/hash.cpp b/src/hash.cpp
@@ -1,6 +1,8 @@
 #include <iostream>
+#include <set>
 #include <string>
 #include <unordered_map>
+#include <unordered_set>
 
 // user-defined hash functions:
 // Example 1
@@ -87,4 +89,35 @@ void unordered_mapCustomClasstype() {
       {{1, "John", "Doe"}, "example"}, {{2, "Mary", "Sue"}, "another"}};
 }
 
-int main() {}
+void sizeOfTheHashTable() {
+  // Example with std::unordered_map
+  std::unordered_map<int, std::string> my_map = {
+      {1, "one"}, {2, "two"}, {3, "three"}};
+
+  std::cout << "Number of buckets in my_map: " << my_map.bucket_count()
+            << std::endl;
+  std::cout << "Current load factor in my_map: " << my_map.load_factor()
+            << std::endl;
+  std::cout << "Max load factor in my_map: " << my_map.max_load_factor()
+            << std::endl;
+
+  // Example with std::unordered_set
+  std::unordered_set<int> my_set = {1, 2, 3, 4, 5};
+
+  std::cout << "Number of buckets in my_set: " << my_set.bucket_count()
+            << std::endl;
+  std::cout << "Current load factor in my_set: " << my_set.load_factor()
+            << std::endl;
+  std::cout << "Max load factor in my_set: " << my_set.max_load_factor()
+            << std::endl;
+
+  // Accessing the size of a specific bucket
+  size_t bucket_index = 0;
+  std::cout << "Elements in bucket " << bucket_index
+            << " of my_map: " << my_map.bucket_size(bucket_index) << std::endl;
+}
+
+int main() {
+  unordered_mapCustomClasstype();
+  sizeOfTheHashTable();
+}