This assingment is a siumlation of realistic caching behavior with message access patters generated by message bots. It uses a Trie-based (numtrie_t) structure to manage cache operations using both LRU and Random replacement policies.
You can run main with the commmand
make test
Initially, we worked in parallel, and the first step towards integration was to modify cache_test.c. The original build, for functionality and independence, was constructed using an array-based caching approach. The steps towards integration were as follows:
-
Replaced the array structure
- replaced with the Trie using
numtrie_t - this utilizes the Trie functions for insertion (
add_node()), lookup (search()), and deletion (delete_min(),delete_random()).
- replaced with the Trie using
-
Unified the Cache initializaion.
- this was done by using
init_numtrie() - the caching policy is set based on command line args:
./cache_test random./cache_test lru
- this was done by using
-
Lookup and Insertion.
- with the integration, lookup is handled through the Trie
search()function - Insertion now uses
add_node()- Random uses
delete_random() - LRU uses
delete_min()
- Random uses
- with the integration, lookup is handled through the Trie
-
Message Bots
- generate message access patterns that simulate real-world usage:
- Long-term messages (50%) → many reuses, better for caching
- Medium-term messages (30%) → reused a few times
- Short-term messages (20%) → accessed once, cause more misses
- generate message access patterns that simulate real-world usage:
These changes now allow for a more realistic simulation with message accessing patterns rolled out with message bots to generate realistic message IDs. The approach is to track cache hits and misses while maintaining a clean output. To ensure efficient cache management, the system checks whether the cache is full before applying a replacement policy. The simulation leverages message bots to generate realistic message access patterns, creating short-term, medium-term, and long-term message IDs that mimic real-world scenarios. The output is designed to be clean and easy to read, providing structured printouts of cache hits, misses, and hit ratios for straightforward analysis.
We were only generating 2 numbers:
- The random number generator was not initialized right
- The seed value x_0 was not set or updating, so the output was always the same.
Fixes
- The addition of `srand(time(NULL))` function to seed the random number generator at the start
- A seed initialization function `(initialize_random_seed())` for seeding before any random number gen
- Implementation of a global seed variable for consistency
- A test file (
test_genrand.c) was added- compile with: gcc -Wall test_genrand.c genrand.c -o test_genrand
- test with: ./test_genrand
sample output:
➜ CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 79545697
Random number (1-100): 61
Random number (0-50): 50
Random number (0-1000): 215
Random string of length 10: xezktuvqx
➜ CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 81999519
Random number (1-100): 15
Random number (0-50): 9
Random number (0-1000): 237
Random string of length 10: vqnidolyl
➜ CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 82033133
Random number (1-100): 17
Random number (0-50): 37
Random number (0-1000): 979
Random string of length 10: zarwrmrqt
➜ CS5600-PJ1 git:(main) ✗
Every time the program runs, it generates unique outputs, the string is the right length with random numbers and letters with each run.
During testing, we encountered a major inconsistency in cache hit ratio results:
LRU Hit Ratio: 8.96% | RANDOM Hit Ratio: 47.41%
--- vs ---
LRU Hit Ratio: 29.70% | RANDOM Hit Ratio: 40.70%
--- vs ---
RANDOM Hit Ratio: 70.50%
After investigating, we discovered the root cause:
One test system was incorrectly generating 70% short-term messages instead of 70% long-term. This drastically reduced the expected number of hits.
The fix was to verify that the correct intended message ratio is used:
- 50% long-term
- 30% medium-term
- 20% short-term
This ratio was designed to mimic realistic access behavior, giving long-term messages a higher reuse probability (and therefore, better cache hit potential).
- Always double-check your weighted sampling logic when testing probabilistic systems.
- A flipped ratio can still yield "valid" results — just not meaningful ones.
file name: cache_test.c
make
make cache_test
./cache_test lru
./cache_test randomWe've hit 70%+ cache hit ratios with balanced weights and bot-driven access simulation
- Track the hit ratio over time and adjust the access weights dynamically.
- If the hit ratio drops below 50%, increase the weight for long-term messages.
- If the hit ratio is consistently high, slightly increase medium/short-term weights to maintain freshness.
- Modify the
weighted_random_message()function to include dynamic weight adjustments. - Introduce a "hit ratio tracker" that calculates the hit ratio after a fixed number of accesses (e.g., 1000).
- Adjust the weight distribution based on the hit ratio.
Code Block cache_test.c:
// Track hits and misses globally
int hits = 0, misses = 0;
// Function to dynamically adjust weights
void adjust_weights() {
float hit_ratio = (hits / (float)(hits + misses)) * 100;
if (hit_ratio < 50.0) {
// Increase long-term weight
long_weight = 60;
medium_weight = 25;
short_weight = 15;
printf("[ADJUST] Increased long-term weight to 60%%\n");
} else if (hit_ratio > 70.0) {
// Slightly increase freshness
long_weight = 50;
medium_weight = 30;
short_weight = 20;
printf("[ADJUST] Restored default weights for freshness\n");
}
}
int weighted_random_message() {
adjust_weights(); // Adjust weights dynamically
int chance = rand() % 100;
if (chance < long_weight) {
return generate_long_term_message();
} else if (chance < long_weight + medium_weight) {
return generate_medium_term_message();
} else {
return generate_short_term_message();
}
}- the current setup with
50% long-term,30% medium-term, and20% short-termworks well. We can experiment with:- 60% long-term, 25% medium-term, 15% short-term for even higher stability.
- modify the
weighted_random_message()function to change the weight distribution - adjust the
if-elsein theweighted_random_message()function to show the new weights.
Code Block cache_test.c:
int weighted_random_message() {
int chance = rand() % 100;
// Adjusted weights
if (chance < 60) { // 60% long-term
return generate_long_term_message();
} else if (chance < 85) { // Next 25% medium-term
return generate_medium_term_message();
} else { // Remaining 15% short-term
return generate_short_term_message();
}
}-
Combining RANDOM and LRU based on access frequency might yield even better results. For example:
- LRU for long-term messages.
- RANDOM for short-term messages to keep the cache agile.
-
we can add logic to the
insert_into_cachefunction to select a policy based on message type. -
this should introduce a message type parameter to differntiate long-term from short-term messages
Code Block cache_test.c:
// Modified insert_into_cache function
void insert_into_cache(const char *identifier, message_t *message, int message_type) {
// Check if the cache is full
if (cache.trie->elements >= TOTAL_ACCESSES) {
if (message_type == LONG_TERM) {
delete_min(cache.trie); // LRU eviction
printf("[CACHE] LRU eviction for long-term message\n");
} else {
delete_random(cache.trie); // Random eviction
printf("[CACHE] Random eviction for short-term message\n");
}
}
add_node(cache.trie, (void *)message, (char *)identifier);
printf("[CACHE] Inserted message %s into cache\n", identifier);
}
// Call the function with the message type
insert_into_cache(random_id, message, LONG_TERM);
insert_into_cache(random_id, message, SHORT_TERM);