LeetCode MEDIUM 981 Time Based Key-Value Store Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Time Based Key-Value Store
Topics: Hash Table, String, Binary Search, Design

Problem gist

Design a key-value store that remembers history. A normal hash map answers “what is the current value for this key?” This problem asks “what was the value for this key at this timestamp?”

The data structure supports two operations:

set(key, value, timestamp) stores a value for a key at a specific time.

get(key, timestamp) returns the value from the latest previous set call for that key whose timestamp is less than or equal to the query timestamp. If the key did not have a value yet, return the empty string.

The useful constraint is that set timestamps are strictly increasing. That means each key’s history can be stored in chronological order simply by appending new entries.

Deriving the optimal solution

Start with the simplest idea: each key needs more than one value, so map each key to a list of historical entries. Every entry needs the timestamp and the value stored at that timestamp.

For set, append the new timestamp and value to that key’s history. Because timestamps arrive in increasing order, no sorting or rebalancing is needed.

For get, the question becomes: in this sorted timestamp list, find the rightmost timestamp <= query_timestamp. That is exactly a binary search upper-bound query. In Python, bisect_right(timestamps, query_timestamp) returns the insertion position after all timestamps less than or equal to the query. The answer is one slot to the left of that insertion position.

This gives the natural optimal design for the given constraints:

set runs in O(1) time because it only appends.

get runs in O(log m) time, where m is the number of values stored for that key.

The total space is O(n) for all stored set calls.

Python solution

from bisect import bisect_right
from collections import defaultdict
from typing import DefaultDict, List


class TimeMap:
    def __init__(self) -> None:
        self._timestamps_by_key: DefaultDict[str, List[int]] = defaultdict(list)
        self._values_by_key: DefaultDict[str, List[str]] = defaultdict(list)

    def set(self, key: str, value: str, timestamp: int) -> None:
        """
        Store a value at the given timestamp.

        LeetCode guarantees that set timestamps are strictly increasing, so
        appending preserves sorted timestamp order for every key.
        """
        self._timestamps_by_key[key].append(timestamp)
        self._values_by_key[key].append(value)

    def get(self, key: str, timestamp: int) -> str:
        """
        Return the value from the latest timestamp not greater than timestamp.
        """
        timestamps = self._timestamps_by_key.get(key)
        if not timestamps:
            return ""

        latest_index = self._latest_index_not_after(timestamps, timestamp)
        if latest_index == -1:
            return ""

        return self._values_by_key[key][latest_index]

    @staticmethod
    def _latest_index_not_after(timestamps: List[int], timestamp: int) -> int:
        """
        Return the index of the rightmost timestamp <= timestamp, or -1 if
        every stored timestamp is greater than timestamp.
        """
        insertion_index = bisect_right(timestamps, timestamp)
        return insertion_index - 1

Complexity

set is O(1) time because it appends to two lists. get is O(log m) time for one binary search within the requested key’s history, where m is the number of stored values for that key.

The data structure uses O(n) total space, where n is the total number of set calls. Keeping timestamps and values in parallel arrays avoids tuple comparison tricks and makes the binary search target explicit.

Interview follow-ups

What if timestamps are not guaranteed to arrive in increasing order?

The append-only design no longer preserves sorted order. The direct fix is to insert each new (timestamp, value) into the correct position for that key using binary search to find the position. The search is O(log m), but inserting into the middle of a Python list shifts later elements, so the full set operation becomes O(m).

If all writes are known before reads, a better batch approach is to append unsorted entries first, then sort each key’s history once before serving queries. That changes preprocessing to O(n log n) overall and keeps each get at O(log m). The right answer depends on whether reads must be correct immediately after every write.

What if get needs to return all values in a time range?

Store the same sorted timestamp list, then use two binary searches. bisect_left(timestamps, start) finds the first entry inside the range, and bisect_right(timestamps, end) finds the first entry after the range. The matching values are the slice between those two indices.

This works because all values for a key are sorted by timestamp, so the valid range is contiguous. The search cost is O(log m), and returning the values costs O(k) for the number of matching entries. The tradeoff is that a large time range can legitimately return many values, so the output size dominates.

What if the store must support deleting a key at a timestamp?

Treat deletion as another historical event instead of physically removing old entries. Store a tombstone marker at the delete timestamp. A later get still binary-searches for the latest event not after the query timestamp. If that event is a tombstone, return the empty string; if it is a value, return that value.

This preserves the same O(log m) read path because deletion is just another timestamped state change. The space cost grows with every delete event, so a production system might compact old histories once it is safe to discard queries before a retention cutoff.

What if the number of operations is huge and memory becomes the bottleneck?

The first step is to keep the per-key arrays compact, as this solution does. Two parallel arrays avoid per-entry object overhead compared with storing many small wrapper objects.

If memory is still too high, the expected approach is to add retention or compression based on product requirements. For example, if queries never ask for timestamps older than 30 days, old prefixes can be pruned while preserving the most recent value before the cutoff if needed. If timestamps are dense or predictable, delta encoding can reduce timestamp storage. These optimizations trade implementation complexity for lower memory usage, while the core lookup idea remains binary search over a sorted history.