Directed Acyclic Graphs

If you don’t know your DAGs from your dogs, you can finally get some clarity and sleep easily tonight. Learn what makes a Directed Acyclic Graph a DAG.

What is a DAG?

A Directed Acyclic Graph (or DAG) is a special type of graph made up of nodes (also known as vertices), and edges, in which:

  1. all edges have a direction associated with them, and
  2. the graph as a whole contains no cycles (aka. loops).

The below figure illustrates a classic DAG, in which all nodes are connected by at least one directional edge, and all pathways lead to a single end-state.

A Directed Acylic Graph in HASH

Uses of DAGs in Data Science

In data applications like HASH, DAGs are commonly used to illustrate:

  • data pipelines: the decision and processing steps taken as data flows through a pipeline
  • schedules: any system of tasks with ordering constraints (not just a data pipeline) can be illustrated with a DAG
  • dependency/citation graphs: a list of dependencies or citations that allows the provenance of work to be tracked

More information

In mathematical terms, DAGs are a specific subclass of oriented graphs (graphs without bidirectional edges). Ultimately though, you don’t have to understand the technical ins and outs of DAGs in order to utilize them as part of a data pipeline.

Modern data engineering tools such as hCore abstract away complexity through simple, easy-to-use interfaces that provide prompts and feedback, preventing the creation of malformed DAGs (for example, those which may inadvertently contain circular loops or cycles).

Create a free

account

Sign up to try HASH out for yourself, and see what all the fuss is about

By signing up you agree to our terms and conditions and privacy policy