Persistent Data Structures
Authors: Andi Qu, Benjamin Qi
What if data structures could time travel?
Prerequisites
Overview
A persistent data structure is a data structure that preserves the previous versions of itself when modified, allowing access to any historical version. In other words, once a change is made to the structure, both the original and modified versions remain accessible. This is particularly useful in scenarios where you need to keep track of the history of updates or backtrack to previous states of the data structure.
Persistent Array
Persistent arrays are one of the simplest persistent data structures. A persistent array should be able to access and update its elements at given times.
Fat Nodes
C++
In C++, one can implement this to run in time per query and
time per update by using an array of vector
s.
vector<pair<int, int>> arr[100001]; // The persistent arrayint get_item(int index, int time) {// Gets the array item at a given index and timeauto ub =upper_bound(arr[index].begin(), arr[index].end(), make_pair(time, INT_MAX));return prev(ub)->second;}void update_item(int index, int value, int time) {
This approach (i.e. storing multiple values at each index without erasing old values) is known as fat nodes.
Although easy to implement, fat nodes are only partially persistent, meaning that only the latest version of the data structure can be modified.
For most competitive programming problems involving persistent data structures, we use path copying instead.
Path Copying
One can implement path copying to run in time per query and update by using a binary-tree-like structure where array elements are the leaves.
This is very similar to a sparse segment tree. The key differences are that we have multiple roots and every time we "update" a node, we actually create a new node in its place (hence persistence).
C++
struct Node {int val;Node *l, *r;Node(ll x) : val(x), l(nullptr), r(nullptr) {}Node(Node *ll, Node *rr) : val(0), l(ll), r(rr) {}};int n, a[100001]; // The initial array and its sizeNode *roots[100001]; // The persistent array's roots
Path copying is fully persistent.
Persistent Segment Tree
Since persistent arrays with path copying are so similar to sparse segment trees, it's relatively straightforward to convert one into a persistent segment tree - just add range queries!
Focus Problem – try your best to solve this problem before continuing!
Resources
Resources | ||||
---|---|---|---|---|
oml1111 | ||||
Anudeep2011 | not great formatting | |||
cp-algo | ||||
SecondThread | good video on persistent data structures |
This section is not complete.
Solution
Since this problem involves range queries, we'll use some type of segment tree to solve it. (We can also use a Fenwick tree, but that's much harder to make persistent.)
When dealing with problems involving multiple dimensions, it's often helpful to view one of those dimensions as time. In this problem, we'll view the index of each array as its time dimension.
Using a persistent segment tree, we can then turn the problem into the following:
- Type 1 queries involve a point update on the segment tree at some time.
- Type 2 queries involve a range query on the segment tree at some time.
- Type 3 queries involve copying the root of the segment tree at some time and appending it to the array of segment tree roots.
C++
#include <bits/stdc++.h>typedef long long ll;using namespace std;struct Node {ll val;Node *l, *r;Node(ll x) : val(x), l(nullptr), r(nullptr) {}Node(Node *ll, Node *rr) {
Application 1 - Static 2D Range Sums on Large Grids
Persistent segment trees can be used for online 2D static range sum queries in time (think of it like prefix sums).
Note that 2D Fenwick trees with coordinate compression often also work for this (and are easier to implement), but it's still good to know this application.
Application 2 - Largest Interval Completely Inside a Range
Consider the following problem:
Given intervals on the number line, answer queries of the form "what is the largest interval completely contained inside the range ?"
.
Since each interval has two dimensions (i.e. left and right endpoints and ), we can view it as a point on the number line at with "value" that was inserted at time .
Now, each query becomes "what is the most valuable point in the range that was inserted at or before time ?" This is much easier to handle, so we can solve this problem in time.
Problems
Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|
CF | Easy | Show TagsPersistent Segtree | |||
SPOJ | Easy | Show TagsPersistent Segtree | |||
SPOJ | Normal | Show TagsPersistent Segtree | |||
SPOJ | Normal | Show TagsPersistent Segtree | |||
SPOJ | Normal | Show TagsPersistent Segtree | |||
COCI | Normal | Show TagsPersistent Segtree | |||
NOI | Normal | Show TagsPersistent Segtree | |||
CEOI | Hard | Show TagsPersistent Segtree, Sqrt | |||
APIO | Hard | Show Tags2DRQ, Euler's Formula, Persistent Segtree | |||
COCI | Very Hard | Show TagsPersistent Segtree | |||
IOI | Very Hard | Show Tags2DRQ, Persistent Segtree |
Persistent Heap
Resources | ||||
---|---|---|---|---|
Benq |
Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|
Wesley's Anger Contest | Very Hard | ||||
YS | Very Hard |
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!