Towards Process Foundation Models
A self-supervised approach to learning the latent state of dynamic graphs, for radically better process modeling

Geri Skenderi
Member of Technical Staff
Temporal Graphs
In our previous post on graph-based world models we explored how prevalent graph data is in the modern world, and discussed the open research question of learning useful representations of graphs at scale. One important aspect of process data that we omitted from that introduction was its temporal nature.
In the real world, graphs rarely sit still, but evolve. A supply chain changes when a supplier fails, demand spikes, or a shipment is rerouted. A software system changes when a service is deployed, a package is upgraded, or a vulnerability is discovered. An organization changes when a team restructures, a decision is made, or a workflow breaks down. Generally speaking, knowledge graphs change (or at least they should change!) when new evidence arrives, old claims are contradicted, and confidence shifts.
This leads us to some challenging questions: βHow do we represent dynamic processes over a graph?β And, more ambitiously: βCan we learn a compact state of a graph that tells us what is likely to happen next, what may go wrong, and what actions might change its future?β
Such questions are hard by their nature β but that is what makes them fun and impactful. In this post we outline some of the ideas we're developing at HASH in answer to them.
Relevance to the broader AI community
This post represents our research position: a view about where graph machine learning, self-supervised learning, agentic systems, and simulation may need to go next⦠and our central claim is simple:
AI has become remarkably good at modeling language, but remains much weaker at modeling structured, evolving systems.
HASH has developed infrastructure around strongly typed, temporal, provenance-aware graphs. These are graphs that do not merely store facts as "subject-predicate-object" tuples (semantic triples), but represent entities, relationships, uncertainty, and sources β including the history of all of these things over time. Unlike static graph databases (which are furthermore typically untyped), they are living models of real systems.
If we want AI systems that can reason over such worlds, we need models that treat dynamic graphs as first-class objects β world states and processes that evolve under actions β not just as things to query or to embed statically (as in "Graph RAG").
Learning predictive representations through time
The problem of Graph Representation Learning usually begins with a graph and asks for an embedding, i.e., a vector representation of the input structure. A model observes the graph and produces a vector representing a node, an edge, a subgraph, or the whole graph. That embedding can then be used for classification, regression, retrieval, ranking, anomaly detection, clustering, or some other downstream task.
Many important real-world questions concern not what a graph is now, but how it will change (prediction), or how it might be changed (optimization). This problem doesn't simply require embedding a graph; it requires learning representations through time. A static graph embedding may summarize the current shape of a system, but decision-making requires more than a summary of the present. It requires a complete representation of state: what has happened (path dependency), what matters, what is uncertain, and how possible actions may influence what comes next. This is the idea behind a process foundation model.
What would a process foundation model be?
By βprocess foundation modelβ, we mean a model that learns a reusable latent state over a structured, evolving system. The input is not just text, tabular data, or a static graph, but a history of graph states and actions. This graph may be typed, attributed, temporal, and heterogeneous. It may contain people, organizations, tasks, documents, machines, shipments, software components, claims, simulated agents, or any other kind of entity. Suppose furthermore that at every moment in time we have access to an action or intervention: a deployment, a rerouting, a policy change, a decision, an update, or some other event that may influence the system.
Given the history up to a time t, our goal is to learn a latent state that both compresses the relevant process history (the canonical embedding problem) and is predictive of future changes and outcomes under possible future actions.
In this way the model does not merely learn βan embedding of a graphβ, but a predictive process state. A good process state should answer questions like: βGiven what has happened so far, and given this possible intervention, what futures are likely?β That learning procedure is much closer in nature to world modeling than ordinary graph embedding.
This is not simply forecasting over a graph
One might ask whether this is simply time-series forecasting with graphs. The answer, however, is no. Time-series forecasting often assumes that the system state is already represented in a convenient numerical form. But many real systems are relational. Their most important properties are not just values changing over time, but dependencies, paths, neighborhoods, hierarchies, constraints, and interactions.
In a supply chain, risk depends on network position, substitutability, geography, inventory, supplier reliability, and cascading dependencies. In software, deployment risk depends on dependency structure, ownership, test coverage, code paths, traffic, version history, and hidden coupling. In an organization, the effect of a decision depends on roles, permissions, communication channels, incentives, institutional memory, and informal dependencies. These are graph-structured facts with a temporal component. They are not just temporal facts. We are not simply forecasting over a scalar or multivariable sequence, but rather trying to both learn such a sequence and forecast it.
The promise of self-supervised process learning
If you want to rely on labels to learn at scale, you will quickly find out that they're scarce in graph-heavy domains. We may have many observed process histories, but relatively few clean labels saying: this process will fail, this intervention was optimal, this subgraph caused the problem, and so on.
The key thing to consider is that process histories themselves contain signal. Graphs change, meaning edges appear and disappear, node states update, interventions occur, and the system responds. Through self-supervised learning we aim to turn these raw trajectories into their own training signal.
For static graphs, a self-supervised model might predict missing subgraphs, masked attributes, or latent representations of graph neighborhoods. For dynamic graphs, the more interesting task is: βGiven the current process state and a possible action, can the model predict the latent representation of the future process state?β This is where the connection to JEPA-style learning becomes natural.
From Graph-JEPA to dynamic graph state models
Graph-JEPA learns graph-level representations by predicting latent representations of target subgraphs from context subgraphs. Rather than attempting to reconstruct every missing node, edge, or feature, it predicts in representation space β the underlying intuition being that a model which predicts useful latent structure may learn abstractions that matter.
For evolving process graphs, we can start with an analogous idea. Don't force the model to reconstruct the future graph exactly, but train it to predict the latent state of the future process. In this situation, we need to distinguish between two things:
First, we need context (up to time t). This can be a recent window of graph states, actions, and relevant local neighborhoods. Next, we need a target to predict. In this setting, we can instantiate the previous desiderata of our representations by choosing a future target: perhaps the graph at the next time step, a future graph difference, or a downstream outcome.
An encoder network turns the current context into a latent representation of the current system state. A target encoder network produces a representation of the true future. A predictor network then tries to predict that future representation, conditioned on the current state representation and action. The intuition is straightforward: encode the present, condition on an action, and predict the latent future.
Masking entities to force interaction reasoning
A recent and compelling direction in latent world modeling is to use object-level masking, as proposed in Causal-JEPA (Han et al., 2025). Suppose a model observes a video as a set of object representations: one slot for each object. Instead of giving the model every object at every time step, we mask some objects across time and ask the model to infer their states from the other objects and auxiliary variables, such as actions. This prevents the model from solving the task by merely extrapolating each object independently from its own past. If an object is hidden, the model must infer its dynamics from its interactions with the rest of the scene.
This idea transfers naturally to graphs. In a dynamic process graph, the βobjectsβ are entities, nodes, edges, subgraphs, or typed relational units. At each time step, the graph can be represented as a set of entity tokens. Now imagine masking selected entities across the history window: perhaps a nodeβs current state is hidden, or a relation, or an entire subgraph trajectory. The model is given only a minimal identity anchor so it knows which entity is being referred to, but it cannot directly observe that entityβs full state over the masked interval. It must infer the missing state from the surrounding graph. This changes the learning problem. Without masking, a model may learn trivial self-dynamics: βThis node was in state A yesterday, so it will probably be in state A tomorrow.β With entity-level masking, the model is forced to ask: βGiven how the rest of the system evolved, and given the actions that occurred, what must have happened to this entity?β This encourages the model to learn more useful, interaction-aware process representations.
Latent interventions on graphs
This masking operation can be interpreted as a kind of latent intervention. Rather than intervene in the real system, we intervene in the modelβs access to information, selectively removing an entityβs state and asking the model to reconstruct or predict it from other variables. In doing so, we force the model to rely on relational dependencies.
The history mask discourages trivial reliance on an entityβs own observed trajectory, and the future mask enforces forward world modeling. Together, they create a more demanding predictive game: infer hidden parts of the process from interactions, and use those inferred dynamics to predict the future. This leads precisely to the kind of representations that we would like our process foundation model to produce.
Navigating the design space
Building such a model comes with a combinatorial amount of design choices, which we plan to explore in depth at HASH:
β
Optimized masking: In our dynamical learning game, masking is itself a design space. We might mask node states, edges, time intervals, and so on. Each masking strategy asks the model to learn a different kind of dependency, and the most interesting objectives may combine several of them. For example, one could mask entity trajectories over the history window while always masking future tokens; the model would then learn both retrospective inference and forward prediction.β
Objective design: Another central question is what counts as the future target. A naive version of the project might try to predict the exact future graph: which edge will appear, which edge will disappear, which node attribute will change. This may be useful in some settings, but exact graph prediction is often brittle. The more interesting target may be a representation of future-relevant change, where relevance is decided by the domain. The choice of target is not a detail. It defines what the model is being asked to understand. The scientific challenge is to find objectives that preserve the right abstractions.β
Action-conditioned learning: In many systems, the future is not something that simply happens to us, but something people and agents can influence. A useful process model should therefore learn not only βwhat is likely to happen next?β, but also βwhat is likely to happen if this action is taken?β For example, two supply chains may look similar today but respond very differently to rerouting a shipment. Conversely, two systems may look different on the surface but behave similarly under the same intervention. This means the modelβs representation of state should capture the parts of the process that matter for action and decision-making. Learning representations conditioned on actions or interventions is a natural way to move in this direction. Ideally, these representations would also capture causal structure, but how best to induce that causality through self-supervised objectives remains an open research challenge.
Towards a dynamic graph JEPA
Putting the pieces together, one possible research direction is a dynamic graph JEPA: a latent predictive architecture for typed, temporal process graphs. We have a temporal trajectory of a typed attributed graph alongside actions or interventions, and optionally some observable outcome such as delay, failure, cost, reward, anomaly, contradiction, or success. Each graph is then converted into a set of tokens that can represent an entity, relation, or subgraph state.
During training, selected entity tokens are masked across the history window, preserving only identity anchors. Future tokens are also masked. A predictor receives the partially observed trajectory and predicts the latent representations of the masked tokens. The model must therefore learn both to infer hidden history from context and to predict the future. An appropriate distance-based loss function gives a self-supervised objective for learning the dynamic process states. The representation learned at a particular time can then be used for downstream tasks, in typical βfoundation modelβ fashion.
Correct benchmarking is crucial
Unlike the process-specific models HASH builds for clients, a process foundation model cannot be evaluated only by asking whether it improves one supervised metric in one environment, or on a single given dataset β the whole point is to learn reusable process states. A good evaluation should therefore test several things.
First, prediction: can the model forecast future graph changes or outcomes? Second, action sensitivity: does conditioning on actions improve the forecast? Third, temporal understanding: does the model use history, or only the latest snapshot? Fourth, transfer: does the learned state help on new tasks or related processes? And fifth, utility: does it help agents, analysts, or downstream systems make better decisions? This is why the early stages of such a project should be unusually careful. The goal is not to rush to the most advanced architecture. The goal is to define the right graph representation, the right trajectory format, the right graph difference operators, the right splits, and the right baselines. The goal is never to build an impressive model that answers the wrong question.
Why weβre excited about PFMs at HASH
Language models learn from sequences of tokens, image models learn from pixels or patches, and object-centric world models learn from interacting entities in scenes (i.e. agent-based simulations).
Real-world systems are evolving relational processes that can be defined through temporal graphs, as enabled by HASH. These graphs may already be close to the level at which real-world decisions are made β representing not just text, but entities with specific relationships over time. As a data structure, they can become the substrate on which organizations, agents, and simulation models internally represent the world. Graph learning can help us understand how those worlds evolve. A process foundation model promises to make graphs not only queryable, but learnable as dynamic systems.
AI needs stronger models of such systems. Not because graphs are fashionable, or because of a misplaced belief that every problem should be forced into graph form, but because β we argue β so much of the world we care about is made up of entities (things connected to other things), changing over time, as a result of actions that occur. Graph-JEPA points toward a useful principle: learn graph representations by predicting latent structure rather than reconstructing every surface detail. Causal-JEPA-style object masking adds another: hide parts of the entity history so that interaction reasoning becomes necessary.
Our ambition is not merely to embed graphs, but to learn representations of structured change. If we want AI systems that can understand, assist with, and eventually help improve real-world processes, this is one of the central problems we need to solve β and a research direction I am excited to develop at HASH.
Create a free account
Sign up to try HASH out for yourself, and see what all the fuss is about
By signing up you agree to our terms and conditions and privacy policy