Discord
@dareich/cache-effectiveness
Simulation
0
Public

Cache Effectiveness

Simulate Cache Effectiveness

An effective cache is one that reduces the expensive calls to the main source (the DB).

In this simulation, each agent is a cache of the query (key) and the results (value).

Visualization

Each cache is visualized as a box on an x-y plane. Along the X axis, each cache has the same TTL (Time to Live). Across the Y axis, the TTL increases by 1 minute. You can specify the "rows" and "columns" with the numAtConstTTL (width along X) and numTTLs (depth along Y) values. numAtConstTTL will result in a nice "bar chart" of caches.

The height is the "effectiveness" of the cache. This is the percentage/10 of requests that result in a call to the DB. The lower the number, the more effective the cache is. A height of 10 means the cache is not preventing any DB calls.

Action for Each Step

Each step is a one second. Every refreshInterval seconds, the cache gets a request from the client to get the results. This is where the DB is called or not.

Each second, the agent checks to see if the cache should be invalidated. The inputs to this decision are:

  • TTL - after TTL number of steps, the cache is marked invalid.
  • Incoming changes to the DB affect the cache. This is tricky and discussed below.

Probabilty DB changes affect the cache

This is the biggest assumption in the simulation. The following global variables feed into the "per step" chance that a cache is invalidated.

  • changesPerHour - This includes all changes to the Db that would affect the results - new/deleted items, updated items (because user accessed it or finished their work and marked it done). This is more than just how many new items arrive per hour.
    • This is also per tenant. When translating throughtput numbers from the full system to this simulation, be sure to divide by the number of tenants or to filter the throughput by tenant.
  • chanceChangeAffectsQueryResults - This one is tricky and depends on the query and persona doing the search.
    • Say there are 40 users and entities are assigned evenly among the users. If those users search by "assigned to me", then the chance that any random change affects a cached query is 1/40 = 0.025
    • Say the query is "Get all new items in last 12 hours" and also say that items
    90% of the changes to the DB are due to new items. The results of the query will be invalidated 90% of the time.

TODO

  • Break out the "chance" part into more properties. For example, newEntriesPerHour and modifiedEntriesPerHour. The total changePerHour is the sum, but now you can easily calculate the "chance" for queries that care about modifications only.
  • Instead of using rows with the same TTL, use different chance probabilities so we can see the cache effectiveness across a wide range of conditions.