Graphs are everywhere
Molecules are graphs of atoms and bonds. Supply chains are graphs of firms, facilities, products, routes, and dependencies. Software systems are graphs of packages, services, functions, and data flows. And an organization’s collective knowledge is often best represented as a graph of entities, facts, sources, claims, and relationships.
In fact, many of the hardest problems in AI aren’t about rows in tables, or standalone chunks of text at all… but connected structures:
→
What depends on what?→
What resembles what?→
What caused what?→
What changed (and how), what conflicts (and where), and what is missing (which might actually be important)?
HASH has built a variety of tools for inferring graphs from unstructured information, as well as structured data, and allows building systems around these strongly typed, temporal and provenance-aware graphs. Real-world decisionmaking requires this level of insight – an ability to reason over entities, relationships, history, uncertainty, and context – as well as just perform loose text retrieval.
But these capabilities raise a practical question: how can machine learning systems learn useful representations of graphs, especially when labels are scarce?
Graph-JEPA
In this post I’ll explain what Graph-JEPA is, why it differs from other graph representation learning methods, and how ideas from this line of research may be useful for knowledge graphs, agentic systems, simulation, and other graph-heavy applications.
The problem: graphs are rich, but labels are expensive
Graph Neural Networks, or GNNs, have become a standard tool for learning from graph-structured data. Given a graph, a GNN can produce embeddings for nodes, edges, subgraphs, or entire graphs. These embeddings can then be used for tasks such as classification, regression, ranking, retrieval, anomaly detection, and clustering.
The catch is that many successful GNN applications depend on supervised learning. We need examples of graphs paired with labels:
→
this molecule is toxic, this one is not;→
this transaction network is fraudulent, this one is not;→
this process graph will fail, this one will not;→
this entity cluster is a duplicate, this one is not.
In many domains, these labels are expensive, noisy, delayed, proprietary, or simply unavailable. But the graphs themselves often contain a huge amount of structure. The arrangement of nodes and edges, the local neighborhoods, the repeated motifs, the hierarchy, and the global shape all provide signals.
The promise of self-supervised learning
Self-supervised learning tries to use those signals. Instead of requiring humans to label every example, the model creates a training task from the data itself.
For images, a self-supervised model might hide part of an image and learn to infer something about the missing region. For text, a language model might hide or predict tokens. For graphs, the question becomes: what should the model predict, and in what space should it make that prediction?
A common self-supervised strategy for learning is reconstruction. Mask part of the input, then train a model to reconstruct what was removed. While this works well in some settings, graphs make reconstruction tricky.
Graphs are generally non-Euclidean, irregular, and heterogeneous, with no natural ordering of nodes. Two graphs can look different at the surface while expressing the same underlying structure. Conversely, small local changes may matter enormously in one context… but not at all in another.
If a model is trained to reconstruct low-level details too faithfully, it may learn to preserve incidental information rather than the abstract structure we care about. For many downstream tasks, we do not need a model that can reproduce every node feature or edge. We need a model that understands what the graph semantically means.
That is the intuition behind Joint-Embedding Predictive Architectures, or JEPAs.
What is a JEPA?
JEPAs learn by prediction, but not by reconstructing raw input. Instead, they work in what is called “representation space”. The model observes one part, or view, of the data: the context. It then tries to predict the embedding of another part: the target. In other words, the model does not ask:
> “Can I reconstruct the missing data exactly?”
Instead, it asks:
> “Given this context, can I predict the representation of the missing part?”
This distinction is important. Predicting in embedding space (specialized, dense, and continuous representation space where distance and direction carry semantic meaning) encourages the model to capture higher-level regularities, allowing it to ignore details that are hard to reconstruct but irrelevant to the task, while preserving information that helps explain how different parts of the data relate. This gives JEPAs an appealing middle ground.
Contrastive methods often require carefully designed augmentations and negative samples. Generative methods often require reconstructing detailed input data.
JEPAs aim to avoid the pitfalls of both contrastive and generative methods, instead learning by predicting latent representations.
Graph-JEPA adapts this idea to graphs, unlocking joint-embedding predictive architectures for graph-level representation learning. The result is a graph-level representation that can be used for downstream tasks, such as classification or regression, without needing to train the entire model from scratch on a large labeled dataset.
How Graph-JEPA works
This section is where things get technical. Feel free to skip ahead if you aren’t interested in hearing about the underlying process by which Graph-JEPA works.The core approach
The following key steps make Graph-JEPA work:
→
Split a graph into subgraphs. The input graph is partitioned into smaller graph “patches,” somewhat like splitting an image into patches for a Vision Transformer. In the paper, this is done using graph partitioning, followed by a local neighborhood expansion so that subgraphs retain useful context.→
Encode each subgraph. A graph neural network produces an embedding for each subgraph. Positional information is also added, so the model has some sense of where each subgraph sits within the overall graph.→
Choose a context and targets. At each training step, one subgraph is treated as the context, and other subgraphs are treated as targets.→
Predict target representations from context. The model uses a predictor network to infer the latent representation of target subgraphs from the context subgraph and positional information.→
Pool subgraph representations into a graph representation. After training, the learned subgraph representations can be aggregated into a single embedding for the whole graph.
The hyperbolic twist
One of Graph-JEPA’s most interesting design choices is its use of a hyperbolic objective. Many real-world graphs have hierarchical structure. Organizations have departments and teams. Supply chains have tiers. Knowledge graphs contain categories, subcategories, instances, and relationships. Biological systems contain nested functional structures. Even social and communication networks often exhibit hierarchy.
Euclidean space is not always the best fit for representing hierarchy. Hyperbolic geometry, by contrast, is often a more natural way to represent tree-like or hierarchical structures, because it has “more room” as you move outward from the origin.
Graph-JEPA uses this intuition without requiring the whole model to operate as a conventional hyperbolic embedding system. Instead, the target subgraph embedding is treated as a high-dimensional description of a hyperbolic angle, and the predictor learns to predict coordinates on the 2D unit hyperbola.
The core underlying idea is simple:
> Graph-JEPA encourages the model to organize subgraph representations in a way that reflects hierarchical relationships, while retaining the expressivity of higher-dimensional neural embeddings.
Or, in more technical terms, Graph-JEPA penalizes representations that drift too far from the origin towards the hyperbola’s asymptotes. This creates a regularization effect that forces the model to compactly reflect the subgraph hierarchies near the root while preserving high-dimensional expressive capacity.
That makes the prediction task better aligned with the kinds of structure graphs often contain.
How well does it work?
In Graph-level Representation Learning with Joint-Embedding Predictive Architectures, we evaluate the Graph-JEPA approach on graph classification and regression benchmarks, and demonstrate that Graph-JEPA can learn competitive graph-level representations in a self-supervised way.
On standard graph classification datasets, Graph-JEPA achieves state of the art results on test datasets, performing strongly as a pretrained backbone. The approach is also evaluated on ZINC, a molecular regression benchmark, showing that the approach is not limited to classification tasks.
In the paper we also test Graph-JEPA on a synthetic graph isomorphism benchmark containing graph pairs that are hard for the 1-WL test to distinguish. This is a useful probe because it asks whether the learned graph representations capture structural differences that common message-passing GNNs may miss. Graph-JEPA achieves near-perfect performance on this benchmark, closely approaching the fully supervised Graph-MLP-Mixer baseline.
This makes Graph-JEPA attractive from an efficiency perspective: because it does not rely on negative samples or complex graph augmentations, it avoids some of the overhead associated with contrastive graph learning.
Why this matters
Graph-JEPA is useful because it offers a way to learn from graph structure before labels are available.
That has several practical implications.
1. Better pretraining for graph-heavy domains
Many domains have abundant graph data and limited labels. Examples include molecules, biological pathways, transaction networks, infrastructure networks, knowledge graphs, process graphs, and software dependency graphs.
A Graph-JEPA-style approach can pretrain on the structure already present in those graphs. Later, the learned representations can be fine-tuned for a specific task with fewer labels.
This is especially valuable when labels require expert judgment, long observation windows, or real-world outcomes that are expensive to obtain.
2. More semantic graph embeddings
Not all graph embeddings are equally useful.
For many applications, we want embeddings that capture the meaning of a whole graph: its role, function, risk, similarity to other graphs, or likely behavior. Graph-JEPA’s latent prediction objective encourages the model to learn representations that are not just local summaries of edges, but semantic summaries of how parts of a graph relate to one another.
That can help with tasks such as:
→
finding similar entities or subgraphs;→
clustering related process patterns;→
detecting anomalous graph structures;→
ranking candidate matches during entity resolution;→
predicting missing or unreliable relationships;→
comparing alternative simulated worlds or scenarios.
3. Less dependence on brittle augmentations
Contrastive learning often depends on augmentations: create two altered views of the same object, then train the model to bring them together while pushing apart unrelated examples.
For graphs, choosing good augmentations is hard. Removing an edge might be harmless in one graph and destructive in another. Masking features might preserve meaning in one domain but erase it in another. The augmentation policy can end up encoding domain assumptions that are difficult to validate.
Graph-JEPA avoids the need for negative samples and complex augmentations. It instead learns by predicting target subgraph representations from context. This does not remove all modeling choices, but it does shift the burden away from manually defining which graph transformations should preserve meaning.
4. A bridge between graphs and agentic AI
Graphs are not just another data structure in HASH, but a backbone for AI consisting of typed, temporal and provenance-aware entities and relationships. Agents can use these graph to retrieve information, update knowledge, check claims, and reason about the state of the world. Organizations, plans, workflows and (most interestingly) simulations can also be represented as HASH graphs.
This opens up several possible uses for graph representation learning:
→
Entity resolution: learning when two parts of a graph refer to the same real-world thing.→
Schema and ontology assistance: suggesting types, links, or constraints from observed graph structure.→
Confidence and anomaly signals: identifying relationships or subgraphs that do not fit expected patterns.→
Graph-based retrieval: improving RAG by retrieving not just text chunks, but relevant entities, neighborhoods, and paths. While HASH enables direct corpus interaction, providing secure permissioned access to the raw information underlying derived graphs, graph RAG offers benefits in a range of scenarios minimizing the need for multi-step querying and associated latency.→
Agent memory: helping agents summarize and compare graph-structured states over time.→
Simulation analysis: embedding simulated worlds so that outcomes, interventions, and scenarios can be compared.
By itself Graph-JEPA is not a complete solution to all of these problems, but it points toward a useful direction: models that learn from the internal structure of graphs, without needing every downstream task to be labeled in advance.
Why we’re excited about this direction at HASH
The promise of JEPA is that intelligence might be learned by predicting missing abstractions rather than by reconstructing every detail of the world.
For graphs, this is especially compelling, as they are already abstractions that turn messy reality into entities, relationships, constraints, and events.
A model that learns to predict graph structure in latent space is learning over abstractions of abstractions – exactly at the level which real-world AI systems need to operate.
HASH is building infrastructure for trustworthy, structured, evolving graphs, requiring models that can work with graphs as first-class objects: not just as storage formats, but as learnable, queryable, analyzable representations of the world.
Graph-JEPA shows that a model can learn rich graph-level representations by predicting the latent structure of subgraphs. It shows that hierarchy-aware objectives can be useful when learning from graph data. And it suggests a path toward graph learning systems that are less dependent on labels, less dependent on brittle augmentations, and more aligned with the structured knowledge that real AI applications require.
Alongside HASH, I’m excited to continue exploring these ideas: how graph machine learning, self-supervised objectives, and typed knowledge systems can come together to make AI more reliable, inspectable, and useful.
Further reading
Create a free account
Sign up to try HASH out for yourself, and see what all the fuss is about
By signing up you agree to our terms and conditions and privacy policy