Mission

Announcing HASH

Today, along with Joel Spolsky and Jude Allred, I’m excited to introduce HASH, the company we founded a little over a year ago. We believe that most bad things in the world are the product of some form of information failure. From economic collapse and the outbreak of war and disease, to choosing the right life partner or university degree, we’re on a mission to help everybody make the right decisions and overcome information failure.

Brilliant innovators have sought to organize the world’s information and make it accessible to all, and the next step on this journey is to make that information understandable and usable to everybody.

While high-tech, highly-funded organizations like hedge-funds are able to process vast swathes of the world’s information efficiently for minute gains and millisecond edges in economic trades, the vast majority of businesses and individuals have no systematic way of parsing the wealth of signals contained in the world around them.

Simulation has the power to unlock a better world: advancing our understanding and appreciation of the world around us. Not only are simulations useful cognitive tools for humans, but they have the potential to be rich, machine-readable representations of real world problems as well. As such, they are universal interfaces for both humans and AI — and in our view the best bet we have for growing connective tissue that bridges both human and machine learning.

We hope to enable better human and automated decision-making: bringing about the rational resolution of conflict, the reduction and elimination of market failures, and supporting people to achieve happier, healthier lives. We don’t want to wait for this.

If you can’t wait to get started either, sign up now – or read on to find out more.

Origins

I used to run a digital consultancy in London that developed sites, software, and ran data-driven campaigns for a wide range of clients: from private equity firms and startups through to government clients of the largest size.

From time to time we’d encounter really interesting problems, such as how to track the spread of a behaviorally-driven disease (such as a sexually transmitted infection), assess the effectiveness of interventions against it (e.g. informational advertising campaigns), and optimize ad-spend (i.e. target influencing nodes in networks most likely to stymie disease spread.)

It turns out that a single gold standard exists in both epidemiology and behavioral advertising for answering these types of questions, which is ‘agent-based modeling’ (ABM). ABMs work as follows…

  • Agents represent actors: be they individuals, companies, households, machinery in a factory, or any other thing. Different models look at systems in differing levels of granularity. In theory an ‘agent’ could be a molecule.
  • Agents have properties: these are values attached to agents which vary between them. In the case of a person, a property might be a boolean like ‘is registered voter’ (Y/N), ‘party affiliation’ (multi-choice), or ‘annual income’ (some numeric value).
  • Agents exist in environments — and often multiple at once — e.g. geospatially and on a network graph.
  • Agents are driven by behaviors: behaviors are essentially code that explain how agents should interact with and react to the world around them.

ABMs can be constructed from ground-up first principles, and are useful for counterfactual “what if” hypothesis testing, enabling safe exploration of the digital twins of real-world systems. That makes multi-agent simulations useful for a whole lot more than predicting the spread of disease and information through networks.

Solving Problems Traditional Data Science Can’t

A whole range of complex systems problems defy attempts to be predictably modeled. These are typically problems categorized by nonlinearity, emergence, adaptation, interdependence and feedback loops between agents. The resulting “black swan events” are by definition not reflected in existing patterns and historical data, and are therefore missed entirely.

No systems exist in true isolation — all are part of our complex real world — and as such all business, policy, and human problems are ultimately problems of understanding complex systems. Smart abstraction enables us to discount most of the extraneous world, most of the time, but it’s hard to know what might be interesting when, and under which circumstances.

In some systems this doesn’t matter, but in answering other questions like how we can contribute to a more stable economy or good foreign relations, they can be matters of life and death. In order to fully understand these high-impact, critical-risk problems, we need to generatively search the space around them based on the observable dynamics of those systems. Pattern recognition and analysis of historical outputs alone are good for cheap base-casing, but provide little understanding of problems’ tails.

Because the space around problems representing all possible configurations of the world is so much greater than the historical space in which problems have been observed, there is a temptation to sometimes write off proper scientific simulation as infeasible. But simulation properly used doesn’t seek to simulate every possible version of the world that might ever occur (infinite, of course) — but rather help its users understand which versions are likely to occur, and bring attention to possible novel scenarios which might not previously have occurred to human analysts alone due to their emergent nature.

Crises like the 07/08 financial crash became disasters precisely because decision-makers didn’t understand or account for the underlying dynamics of complex systems — in this case the economy. Well-intentioned pieces of regulation such as Basel II put in place capital reserve requirements, which when combined with mark-to-market accounting practices led to asset fire sales, with market participants forced to sell into declining markets, deepening the trough.

While historical and present value data can be used to pre-populate and backtest agent-based models; it’s not required to construct ABMs, which opens the door to explicit formal modeling in a wide variety of domains where machine learning cannot be readily applied today.

Moreover, simulations combine the benefits of formal modelling with the richness of qualitative description, making them highly explainable and easy to understand by humans. In contrast to often black-box models, agent-based simulations are inherently inspectable and users can step through time to see exactly how outcomes are arrived at, and what factors contribute.

So why then do they remain so unspoken about, unappreciated and underutilized?

Problems with Agent-Based Simulations Today

Simulations are time-consuming and costly to build, as well as expensive to maintain, run and support. They require knowledge of specialist tools, frameworks, and even weird proprietary programming languages. The resulting simulations are often not particularly portable or repurposeable, and where simulation logic is the product of conjecture or lacks calibration, this can lead to a false sense of confidence or security which may compound existing poor decision-making.

Although simulations claim prominent users across the worlds of supply chain, manufacturing, finance, defense, and more, market-leading agent-based modeling software packages today run north of $10k+/user/year, and are based on dated technologies and paradigms which don’t lend themselves well to distributed computation at real scale. Their user interfaces haven’t been touched since the 1990s, the developer experience they offer are equally dated, they don’t run in the browser at all, they can’t be used on mobile devices, and users often need to deploy special software just to access them.

For the most part these simulations are toy models, built to showcase specific dynamics, and lack interoperability. Once built, models are siloed, and there’s little sharing or building on the work of others. Most models built are so scoped down to ensure they run in a timely fashion that they capture only a fraction of the dynamics within the systems they represent. Rather than build rich virtual worlds and selectively include relevant parts on a per-experiment basis, cheap toy abstractions are created which fail to inspire confidence amongst users and are much less easily explored. There’s deep, justifiable skepticism as to whether toy models are truly ‘scientific’, and on the flip side that more complex models can be appropriately calibrated and parameterized. 

Throw into the mix problems finding appropriately granular agent-level data, difficulties translating domain expertise into code, and a wide range of structural barriers to creating ABMs and it’s not hard to see why general purpose simulation remains out of favor and rarely used in business today.

Simulation For Everybody

When faced with a lot of systemic problems, we want to build system level solutions. HASH aims to ‘solve simulation’ by vertically-integrating the entire stack, providing a unified platform for building, running, and learning from simulations.

Today we’re launching publicly two parts of HASH:

  1. HASH Core: a web-based developer environment and viewer for simulations.
  2. HASH Index: a collection of simulations and modular component parts.

All HASH simulations consist of agents (represented by descriptive schemas), and behaviors (which are generally pure functions). Behaviors drive agents, datasets can be used to instantiate or update agents within simulations based on real-world observations, or used to backtest and calibrate models. Behaviors and datasets are mapped to appropriate subjects and schema, making them easily discoverable by model-builders using H-Index, and in future cross-linkable within H-Core.

Simulations, datasets and behaviors are all accessible on the H-Index. Today, everything within the H-Index is available free of charge. Envisaged to be something of a cross between GitHub and a package manager, in the future, H-Index will be extended to become a marketplace in addition, facilitating the purchase and sale of paid behaviors, datasets and simulations. We imagine consultancies publishing components for free to establish credibility and expertise, then selling more complete simulations and consultancy services atop.

Our future plans for H-Index involve explicit Git-like support for forking, branching, reporting issues and making pull requests — functionality that, like the use of package managers, are now second nature to most software developers today.

The impact of these changes to developer workflow are significant: as H-Index grows, domain experts with limited programming knowledge will be able to fork and adapt, or wholesale incorporate existing behaviors into simulations, enabling them to model complex dynamics without the need to write vast swathes of custom code from scratch.

Our current lineup is not, however, complete. Although our blazing fast HASH Engine enables simulation at unparalleled speed, it is currently only available through our H-Core web-based IDE, which necessarily constrains it to the memory and CPU available to the browser tab, which is in many cases is severely limited. This has meant that while H-Engine is designed to handle truly world-sized simulations, our early beta users have been limited to building relatively small-scale models in our platform. This makes H-Core in its current iteration comparable to something like NetLogo, the academic agent-based modeling tool: useful for illustrating the impact of heterogeneous agents within complex systems, and helpful in explaining to users the dynamics of these systems, but limited in its capacity to model real-world environments with a high degree of fidelity or at scale. Because of these current constraints, tools for running optimization experiments (parameter sweeps, Monte Carlo simulations, and more exotic reinforcement learning) have been hidden away for now — but are very much priorities for us.

To this end, today we’re releasing our roadmap for unlocking these features and the use of simulation for everyday ‘real world’ decision-making:

  • HASH Core and HASH Index are both now officially live in beta.
    • We’ll be iterating on both platforms heavily in the coming weeks ahead and welcome your input
  • We’re proud to announce that we’ll be open-sourcing HASH Engine, the simulation engine at the heart of HASH, later this year.
    • Written in Rust, with bindings today in existence for JavaScript and Python, H-Engine is the ultra-fast actor system that underpins all computation in HASH.
    • Our goal is to make the platform accessible to everybody, and enabling folks to run H-Engine locally and within closed systems is a significant part of this.
    • We’re currently aiming to release a public version of H-Engine under an open-source license by the end of 2020. 
  • We’ll also be beginning the rollout of HASH Cloud to select beta users this year.
    • H-Cloud is the part of our platform that enables users to run simulations in the cloud with just one click, from within HASH’s existing H-Core authoring and viewing interface (and upon H-Engine’s open-source launch via the command line as well).
    • Alongside this we’ll be exposing an ‘experiments’ interface in H-Core that unlocks the door to deriving commercial insight from simulations at scale.
    • Through H-Cloud users will be able access simulation and experiment results programmatically to drive algorithms and applications outside of HASH.

You can find out more about our upcoming feature roadmap public at hash.ai/roadmap

We started as just two people a little over a year ago, and are now a team of ~10. I’m incredibly proud of the team we’ve built, and what we’ve achieved in this time.

We’re excited to meet users of HASH and have launched a Slack community which can be accessed via the icon in the bottom-right of any page at hash.ai — we’ll be around to help you build your models, answer your questions, and take your feature suggestions and bug reports.

We’re working to make HASH more accessible and extend support to as wide an audience of developers as possible. The Rust engine has bindings for Python as well as JavaScript, but until now authoring behaviors in H-Core was limited to the latter. As of today, we’re proud to announce that Python behavior authoring and simulation running is supported locally in-browser via H-Core. Thanks to Mozilla’s amazing Pyodide project we’ve been able to bring experimental support for Python to our browser-based H-Core IDE, and although it currently comes with a large performance hit, we’re hopeful that we’ll be able to improve this in a number of ways before the full rollout of H-Cloud and H-Engine (both of which will allow users to avoid any performance penalty). Developers can now build models in HASH using Python, and import any number of popular scientific Python packages (more in our docs).

To eliminate information failure, we need to build tools that have never been created before to solve problems that can’t be solved today. We need to give people superpowers, and that’s what we’re on a mission to do.

If you’d like to build a model with HASH, you can sign up at hash.ai/signup

If you want to join us on our mission of helping everybody make the right decisions, you can help publish simulations, behaviors and data to H-Index, or apply for any one of our open roles at hash.ai/careers

And finally, if you’re a business decision-maker interested in learning how HASH can be applied to help you, get in touch at hash.ai/contact
We’re grateful to HASH’s early investors for their support: amazing community creators such as Stack Overflow founder Joel Spolsky, and Kaggle founder Anthony Goldbloom, as well as Ash Fontana and Lee Edwards from Zetta Venture Partners and Root Ventures. We’re excited to kick off our public mission.


David Wilkinson
Founder and CEO of HASH