Towards Process Foundation Models

A self-supervised approach to learning the latent state of dynamic graphs, for radically better process modeling
May 22nd, 2026
Geri Skenderi
Geri Skenderi
Member of Technical Staff
Towards Process Foundation Models

Temporal Graphs

In our previous post on graph-based world models we explored how prevalent graph data is in the modern world, and discussed the open research question of learning useful representations of graphs at scale. One important aspect of process data that we omitted from that introduction was its temporal nature.

In the real world, graphs rarely sit still, but evolve. A supply chain changes when a supplier fails, demand spikes, or a shipment is rerouted. A software system changes when a service is deployed, a package is upgraded, or a vulnerability is discovered. An organization changes when a team restructures, a decision is made, or a workflow breaks down. Generally speaking, knowledge graphs change (or at least they should change!) when new evidence arrives, old claims are contradicted, and confidence shifts.

This leads us to some challenging questions: β€œHow do we represent dynamic processes over a graph?” And, more ambitiously: β€œCan we learn a compact state of a graph that tells us what is likely to happen next, what may go wrong, and what actions might change its future?”

Such questions are hard by their nature β€” but that is what makes them fun and impactful. In this post we outline some of the ideas we're developing at HASH in answer to them.

Relevance to the broader AI community

This post represents our research position: a view about where graph machine learning, self-supervised learning, agentic systems, and simulation may need to go next… and our central claim is simple:

AI has become remarkably good at modeling language, but remains much weaker at modeling structured, evolving systems.

HASH has developed infrastructure around strongly typed, temporal, provenance-aware graphs. These are graphs that do not merely store facts as "subject-predicate-object" tuples (semantic triples), but represent entities, relationships, uncertainty, and sources β€” including the history of all of these things over time. Unlike static graph databases (which are furthermore typically untyped), they are living models of real systems.

If we want AI systems that can reason over such worlds, we need models that treat dynamic graphs as first-class objects β€” world states and processes that evolve under actions β€” not just as things to query or to embed statically (as in "Graph RAG").

Learning predictive representations through time

The problem of Graph Representation Learning usually begins with a graph and asks for an embedding, i.e., a vector representation of the input structure. A model observes the graph and produces a vector representing a node, an edge, a subgraph, or the whole graph. That embedding can then be used for classification, regression, retrieval, ranking, anomaly detection, clustering, or some other downstream task.

Many important real-world questions concern not what a graph is now, but how it will change (prediction), or how it might be changed (optimization). This problem doesn't simply require embedding a graph; it requires learning representations through time. A static graph embedding may summarize the current shape of a system, but decision-making requires more than a summary of the present. It requires a complete representation of state: what has happened (path dependency), what matters, what is uncertain, and how possible actions may influence what comes next. This is the idea behind a process foundation model.

What would a process foundation model be?

By β€œprocess foundation model”, we mean a model that learns a reusable latent state over a structured, evolving system. The input is not just text, tabular data, or a static graph, but a history of graph states and actions. This graph may be typed, attributed, temporal, and heterogeneous. It may contain people, organizations, tasks, documents, machines, shipments, software components, claims, simulated agents, or any other kind of entity. Suppose furthermore that at every moment in time we have access to an action or intervention: a deployment, a rerouting, a policy change, a decision, an update, or some other event that may influence the system.

Given the history up to a time t, our goal is to learn a latent state that both compresses the relevant process history (the canonical embedding problem) and is predictive of future changes and outcomes under possible future actions.

In this way the model does not merely learn β€œan embedding of a graph”, but a predictive process state. A good process state should answer questions like: β€œGiven what has happened so far, and given this possible intervention, what futures are likely?” That learning procedure is much closer in nature to world modeling than ordinary graph embedding.

This is not simply forecasting over a graph

One might ask whether this is simply time-series forecasting with graphs. The answer, however, is no. Time-series forecasting often assumes that the system state is already represented in a convenient numerical form. But many real systems are relational. Their most important properties are not just values changing over time, but dependencies, paths, neighborhoods, hierarchies, constraints, and interactions.

In a supply chain, risk depends on network position, substitutability, geography, inventory, supplier reliability, and cascading dependencies. In software, deployment risk depends on dependency structure, ownership, test coverage, code paths, traffic, version history, and hidden coupling. In an organization, the effect of a decision depends on roles, permissions, communication channels, incentives, institutional memory, and informal dependencies. These are graph-structured facts with a temporal component. They are not just temporal facts. We are not simply forecasting over a scalar or multivariable sequence, but rather trying to both learn such a sequence and forecast it.

The promise of self-supervised process learning

If you want to rely on labels to learn at scale, you will quickly find out that they're scarce in graph-heavy domains. We may have many observed process histories, but relatively few clean labels saying: this process will fail, this intervention was optimal, this subgraph caused the problem, and so on.

The key thing to consider is that process histories themselves contain signal. Graphs change, meaning edges appear and disappear, node states update, interventions occur, and the system responds. Through self-supervised learning we aim to turn these raw trajectories into their own training signal.

For static graphs, a self-supervised model might predict missing subgraphs, masked attributes, or latent representations of graph neighborhoods. For dynamic graphs, the more interesting task is: β€œGiven the current process state and a possible action, can the model predict the latent representation of the future process state?” This is where the connection to JEPA-style learning becomes natural.

From Graph-JEPA to dynamic graph state models

Graph-JEPA learns graph-level representations by predicting latent representations of target subgraphs from context subgraphs. Rather than attempting to reconstruct every missing node, edge, or feature, it predicts in representation space β€” the underlying intuition being that a model which predicts useful latent structure may learn abstractions that matter.

For evolving process graphs, we can start with an analogous idea. Don't force the model to reconstruct the future graph exactly, but train it to predict the latent state of the future process. In this situation, we need to distinguish between two things:

First, we need context (up to time t). This can be a recent window of graph states, actions, and relevant local neighborhoods. Next, we need a target to predict. In this setting, we can instantiate the previous desiderata of our representations by choosing a future target: perhaps the graph at the next time step, a future graph difference, or a downstream outcome.

An encoder network turns the current context into a latent representation of the current system state. A target encoder network produces a representation of the true future. A predictor network then tries to predict that future representation, conditioned on the current state representation and action. The intuition is straightforward: encode the present, condition on an action, and predict the latent future.

Masking entities to force interaction reasoning

A recent and compelling direction in latent world modeling is to use object-level masking, as proposed in Causal-JEPA (Han et al., 2025). Suppose a model observes a video as a set of object representations: one slot for each object. Instead of giving the model every object at every time step, we mask some objects across time and ask the model to infer their states from the other objects and auxiliary variables, such as actions. This prevents the model from solving the task by merely extrapolating each object independently from its own past. If an object is hidden, the model must infer its dynamics from its interactions with the rest of the scene.

This idea transfers naturally to graphs. In a dynamic process graph, the β€œobjects” are entities, nodes, edges, subgraphs, or typed relational units. At each time step, the graph can be represented as a set of entity tokens. Now imagine masking selected entities across the history window: perhaps a node’s current state is hidden, or a relation, or an entire subgraph trajectory. The model is given only a minimal identity anchor so it knows which entity is being referred to, but it cannot directly observe that entity’s full state over the masked interval. It must infer the missing state from the surrounding graph. This changes the learning problem. Without masking, a model may learn trivial self-dynamics: β€œThis node was in state A yesterday, so it will probably be in state A tomorrow.” With entity-level masking, the model is forced to ask: β€œGiven how the rest of the system evolved, and given the actions that occurred, what must have happened to this entity?” This encourages the model to learn more useful, interaction-aware process representations.

Without entity masking
tβˆ’2
tβˆ’1
t
t+1
Operator U
A
A
B
B
Order O
new
open
open
done
Action a
a
a
a
a
A
U Β· tβˆ’1
A
U Β· t
Every token is visible, so copying each entity's own previous state is right most of the time β€” the model never has to learn how entities interact.
With entity masking
tβˆ’2
tβˆ’1
t
t+1
Operator U
A
?
?
B
Order O
new
open
open
done
Action a
a
a
a
a
open
Order O
a
action
?
U Β· t
Only U's identity anchors remain, so its masked states must be recovered from Order O and the actions in the window β€” exactly the interaction-aware reasoning we want.
Two-panel comparison built from one shared trajectory: Operator U goes A, A, B, B; Order O goes new, open, open, done; an action occurs each step. In the left panel every token is observed, so the model can copy each entity’s previous state forward and avoid learning interactions. In the right panel Operator U’s two middle states are masked, leaving only its identity anchors, so the model must infer them from Order O and the actions in that window β€” the interaction-aware reasoning the section argues for.

Latent interventions on graphs

This masking operation can be interpreted as a kind of latent intervention. Rather than intervene in the real system, we intervene in the model’s access to information, selectively removing an entity’s state and asking the model to reconstruct or predict it from other variables. In doing so, we force the model to rely on relational dependencies.

The history mask discourages trivial reliance on an entity’s own observed trajectory, and the future mask enforces forward world modeling. Together, they create a more demanding predictive game: infer hidden parts of the process from interactions, and use those inferred dynamics to predict the future. This leads precisely to the kind of representations that we would like our process foundation model to produce.

Building such a model comes with a combinatorial amount of design choices, which we plan to explore in depth at HASH:

  1. β†’

    Optimized masking: In our dynamical learning game, masking is itself a design space. We might mask node states, edges, time intervals, and so on. Each masking strategy asks the model to learn a different kind of dependency, and the most interesting objectives may combine several of them. For example, one could mask entity trajectories over the history window while always masking future tokens; the model would then learn both retrospective inference and forward prediction.
  2. β†’

    Objective design: Another central question is what counts as the future target. A naive version of the project might try to predict the exact future graph: which edge will appear, which edge will disappear, which node attribute will change. This may be useful in some settings, but exact graph prediction is often brittle. The more interesting target may be a representation of future-relevant change, where relevance is decided by the domain. The choice of target is not a detail. It defines what the model is being asked to understand. The scientific challenge is to find objectives that preserve the right abstractions.
  3. β†’

    Action-conditioned learning: In many systems, the future is not something that simply happens to us, but something people and agents can influence. A useful process model should therefore learn not only β€œwhat is likely to happen next?”, but also β€œwhat is likely to happen if this action is taken?” For example, two supply chains may look similar today but respond very differently to rerouting a shipment. Conversely, two systems may look different on the surface but behave similarly under the same intervention. This means the model’s representation of state should capture the parts of the process that matter for action and decision-making. Learning representations conditioned on actions or interventions is a natural way to move in this direction. Ideally, these representations would also capture causal structure, but how best to induce that causality through self-supervised objectives remains an open research challenge.

Towards a dynamic graph JEPA

Putting the pieces together, one possible research direction is a dynamic graph JEPA: a latent predictive architecture for typed, temporal process graphs. We have a temporal trajectory of a typed attributed graph alongside actions or interventions, and optionally some observable outcome such as delay, failure, cost, reward, anomaly, contradiction, or success. Each graph is then converted into a set of tokens that can represent an entity, relation, or subgraph state.

During training, selected entity tokens are masked across the history window, preserving only identity anchors. Future tokens are also masked. A predictor receives the partially observed trajectory and predicts the latent representations of the masked tokens. The model must therefore learn both to infer hidden history from context and to predict the future. An appropriate distance-based loss function gives a self-supervised objective for learning the dynamic process states. The representation learned at a particular time can then be used for downstream tasks, in typical β€œfoundation model” fashion.

Tokenize the process trajectory
Each graph in the temporal trajectory is converted into a set of typed tokens β€” entities, relations, and the action taken at each step.
Process trajectory (typed graph per step)
Gβ‚œβ‚‹β‚‚
Gβ‚œβ‚‹β‚
Gβ‚œ
Typed tokens per timestep
tβˆ’2
e₁
eβ‚‚
r₁
a
tβˆ’1
e₁
eβ‚‚
r₁
a
t
e₁
eβ‚‚
r₁
a
t+1
e₁
eβ‚‚
r₁
a
Mask over history and the future
Selected individual tokens are hidden across the history window (only identity anchors remain), and the entire next step is masked as the prediction target.
tβˆ’2
e₁
?
r₁
a
tβˆ’1
?
eβ‚‚
r₁
a
t
e₁
eβ‚‚
?
a
t+1
target
?
?
?
?
History mask: random tokens hidden, identity anchors kept.Future mask: the whole next step is the prediction target.
Predict latents with a JEPA predictor
A context encoder embeds the partially-observed trajectory; a predictor then predicts the latent vectors of the masked tokens, conditioned on the action.
Observed context
e₁
?
r₁
?
a
Context encoder
Embeds the partially-observed trajectory.
Predictor
Predicts latents for the masked tokens, conditioned on action at.
Predicted latents
ẑ₁
αΊ‘β‚‚
Compare latents β€” the self-supervised objective
A distance loss pulls the predicted latents toward target latents produced by an EMA target encoder over the true tokens, teaching the model dynamic process states without labels.
Predicted latents
ẑ₁
αΊ‘β‚‚
From the predictor
β€–αΊ‘ βˆ’ zβ€–Β²
Target latents
z₁
zβ‚‚
From the target encoder over the true tokens (EMA, stop-gradient)
Minimizing the distance between predicted and target latents is the self-supervised objective β€” with no labels, the model learns a reusable, dynamic process state.
Entity token
Relation token
Action token
Masked token (state inferred / predicted)
A four-step dynamic Graph-JEPA pipeline. Step one tokenizes a temporal typed-graph trajectory into entity, relation, and action tokens arranged by timestep. Step two masks scattered individual tokens across the history window while keeping identity anchors, and masks the entire future step as the prediction target. Step three feeds the partially-observed tokens through a context encoder and a predictor, which predicts latent vectors for the masked tokens conditioned on the action. Step four compares those predicted latents against target latents from an exponential-moving-average target encoder over the true tokens, using a squared-distance loss as the self-supervised objective that lets the model learn reusable dynamic process states.

Correct benchmarking is crucial

Unlike the process-specific models HASH builds for clients, a process foundation model cannot be evaluated only by asking whether it improves one supervised metric in one environment, or on a single given dataset β€” the whole point is to learn reusable process states. A good evaluation should therefore test several things.

First, prediction: can the model forecast future graph changes or outcomes? Second, action sensitivity: does conditioning on actions improve the forecast? Third, temporal understanding: does the model use history, or only the latest snapshot? Fourth, transfer: does the learned state help on new tasks or related processes? And fifth, utility: does it help agents, analysts, or downstream systems make better decisions? This is why the early stages of such a project should be unusually careful. The goal is not to rush to the most advanced architecture. The goal is to define the right graph representation, the right trajectory format, the right graph difference operators, the right splits, and the right baselines. The goal is never to build an impressive model that answers the wrong question.

Why we’re excited about PFMs at HASH

Language models learn from sequences of tokens, image models learn from pixels or patches, and object-centric world models learn from interacting entities in scenes (i.e. agent-based simulations).

Real-world systems are evolving relational processes that can be defined through temporal graphs, as enabled by HASH. These graphs may already be close to the level at which real-world decisions are made β€” representing not just text, but entities with specific relationships over time. As a data structure, they can become the substrate on which organizations, agents, and simulation models internally represent the world. Graph learning can help us understand how those worlds evolve. A process foundation model promises to make graphs not only queryable, but learnable as dynamic systems.

AI needs stronger models of such systems. Not because graphs are fashionable, or because of a misplaced belief that every problem should be forced into graph form, but because β€” we argue β€” so much of the world we care about is made up of entities (things connected to other things), changing over time, as a result of actions that occur. Graph-JEPA points toward a useful principle: learn graph representations by predicting latent structure rather than reconstructing every surface detail. Causal-JEPA-style object masking adds another: hide parts of the entity history so that interaction reasoning becomes necessary.

Our ambition is not merely to embed graphs, but to learn representations of structured change. If we want AI systems that can understand, assist with, and eventually help improve real-world processes, this is one of the central problems we need to solve β€” and a research direction I am excited to develop at HASH.

Create a free account

Sign up to try HASH out for yourself, and see what all the fuss is about

By signing up you agree to our terms and conditions and privacy policy