Skip to content

nxxgxx/trie-based-cache-simulator

Repository files navigation

Practicum 1

This assingment is a siumlation of realistic caching behavior with message access patters generated by message bots. It uses a Trie-based (numtrie_t) structure to manage cache operations using both LRU and Random replacement policies.

You can run main with the commmand

make test

Integration

Initially, we worked in parallel, and the first step towards integration was to modify cache_test.c. The original build, for functionality and independence, was constructed using an array-based caching approach. The steps towards integration were as follows:

  1. Replaced the array structure

    • replaced with the Trie using numtrie_t
    • this utilizes the Trie functions for insertion (add_node()), lookup (search()), and deletion (delete_min(), delete_random()).
  2. Unified the Cache initializaion.

    • this was done by using init_numtrie()
    • the caching policy is set based on command line args:
      • ./cache_test random
      • ./cache_test lru
  3. Lookup and Insertion.

    • with the integration, lookup is handled through the Trie search() function
    • Insertion now uses add_node()
      • Random uses delete_random()
      • LRU uses delete_min()
  4. Message Bots

    • generate message access patterns that simulate real-world usage:
      • Long-term messages (50%) → many reuses, better for caching
      • Medium-term messages (30%) → reused a few times
      • Short-term messages (20%) → accessed once, cause more misses

These changes now allow for a more realistic simulation with message accessing patterns rolled out with message bots to generate realistic message IDs. The approach is to track cache hits and misses while maintaining a clean output. To ensure efficient cache management, the system checks whether the cache is full before applying a replacement policy. The simulation leverages message bots to generate realistic message access patterns, creating short-term, medium-term, and long-term message IDs that mimic real-world scenarios. The output is designed to be clean and easy to read, providing structured printouts of cache hits, misses, and hit ratios for straightforward analysis.


Friday March 28th: rand() function in get_weighted_id:

We were only generating 2 numbers:

  1. The random number generator was not initialized right
  2. The seed value x_0 was not set or updating, so the output was always the same.

Fixes

- The addition of `srand(time(NULL))` function to seed the random number generator at the start
- A seed initialization function `(initialize_random_seed())` for seeding before any random number gen
- Implementation of a global seed variable for consistency
  • A test file (test_genrand.c) was added
    • compile with: gcc -Wall test_genrand.c genrand.c -o test_genrand
    • test with: ./test_genrand

sample output:

➜  CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 79545697
Random number (1-100): 61
Random number (0-50): 50
Random number (0-1000): 215
Random string of length 10: xezktuvqx
➜  CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 81999519
Random number (1-100): 15
Random number (0-50): 9
Random number (0-1000): 237
Random string of length 10: vqnidolyl
➜  CS5600-PJ1 git:(main) ✗ ./test_genrand
[INIT] Seed initialized with x_0 = 82033133
Random number (1-100): 17
Random number (0-50): 37
Random number (0-1000): 979
Random string of length 10: zarwrmrqt
➜  CS5600-PJ1 git:(main) ✗ 

Every time the program runs, it generates unique outputs, the string is the right length with random numbers and letters with each run.


Saturday March 29th: Bug discovery

During testing, we encountered a major inconsistency in cache hit ratio results:

LRU Hit Ratio: 8.96%  |  RANDOM Hit Ratio: 47.41%
--- vs ---
LRU Hit Ratio: 29.70% | RANDOM Hit Ratio: 40.70%
--- vs ---
RANDOM Hit Ratio: 70.50%

After investigating, we discovered the root cause:

One test system was incorrectly generating 70% short-term messages instead of 70% long-term. This drastically reduced the expected number of hits.

The fix was to verify that the correct intended message ratio is used:

Correct Weighted Access Distribution

  • 50% long-term
  • 30% medium-term
  • 20% short-term

This ratio was designed to mimic realistic access behavior, giving long-term messages a higher reuse probability (and therefore, better cache hit potential).

Lesson Learned

  • Always double-check your weighted sampling logic when testing probabilistic systems.
  • A flipped ratio can still yield "valid" results — just not meaningful ones.

Build and test the cache simulator

file name: cache_test.c

make
make cache_test
./cache_test lru
./cache_test random

We've hit 70%+ cache hit ratios with balanced weights and bot-driven access simulation

Further Optimization Ideas:

1. Dynamic Adjustment of Weights:

  • Track the hit ratio over time and adjust the access weights dynamically.
    • If the hit ratio drops below 50%, increase the weight for long-term messages.
    • If the hit ratio is consistently high, slightly increase medium/short-term weights to maintain freshness.

HOW TO:

  • Modify the weighted_random_message() function to include dynamic weight adjustments.
  • Introduce a "hit ratio tracker" that calculates the hit ratio after a fixed number of accesses (e.g., 1000).
  • Adjust the weight distribution based on the hit ratio.

Code Block cache_test.c:

// Track hits and misses globally
int hits = 0, misses = 0;

// Function to dynamically adjust weights
void adjust_weights() {
    float hit_ratio = (hits / (float)(hits + misses)) * 100;

    if (hit_ratio < 50.0) {
        // Increase long-term weight
        long_weight = 60;
        medium_weight = 25;
        short_weight = 15;
        printf("[ADJUST] Increased long-term weight to 60%%\n");
    } else if (hit_ratio > 70.0) {
        // Slightly increase freshness
        long_weight = 50;
        medium_weight = 30;
        short_weight = 20;
        printf("[ADJUST] Restored default weights for freshness\n");
    }
}

int weighted_random_message() {
    adjust_weights(); // Adjust weights dynamically

    int chance = rand() % 100;
    if (chance < long_weight) {
        return generate_long_term_message();
    } else if (chance < long_weight + medium_weight) {
        return generate_medium_term_message();
    } else {
        return generate_short_term_message();
    }
}

2. Fine-Tuning Weights:

  • the current setup with 50% long-term, 30% medium-term, and 20% short-term works well. We can experiment with:
    • 60% long-term, 25% medium-term, 15% short-term for even higher stability.

HOW TO:

  • modify the weighted_random_message() function to change the weight distribution
  • adjust the if-else in the weighted_random_message() function to show the new weights.

Code Block cache_test.c:

   int weighted_random_message() {
    int chance = rand() % 100;

    // Adjusted weights
    if (chance < 60) {  // 60% long-term
        return generate_long_term_message();
    } else if (chance < 85) {  // Next 25% medium-term
        return generate_medium_term_message();
    } else {  // Remaining 15% short-term
        return generate_short_term_message();
    }
   }

3. Hybrid Policy:

  • Combining RANDOM and LRU based on access frequency might yield even better results. For example:

    • LRU for long-term messages.
    • RANDOM for short-term messages to keep the cache agile.
  • we can add logic to the insert_into_cache function to select a policy based on message type.

  • this should introduce a message type parameter to differntiate long-term from short-term messages

Code Block cache_test.c:

// Modified insert_into_cache function
void insert_into_cache(const char *identifier, message_t *message, int message_type) {
    // Check if the cache is full
    if (cache.trie->elements >= TOTAL_ACCESSES) {
        if (message_type == LONG_TERM) {
            delete_min(cache.trie);  // LRU eviction
            printf("[CACHE] LRU eviction for long-term message\n");
        } else {
            delete_random(cache.trie);  // Random eviction
            printf("[CACHE] Random eviction for short-term message\n");
        }
    }
    add_node(cache.trie, (void *)message, (char *)identifier);
    printf("[CACHE] Inserted message %s into cache\n", identifier);
}

// Call the function with the message type
insert_into_cache(random_id, message, LONG_TERM);
insert_into_cache(random_id, message, SHORT_TERM);

About

Showcase repository for a collaborative C-based cache simulator using Trie data structures and LRU/Random replacement policies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors