What is a DAG?
A Directed Acyclic Graph (or DAG) is a special type of graph made up of nodes (also known as vertices), and edges, in which:
- all edges have a direction associated with them, and
- the graph as a whole contains no cycles (aka. loops).
The below figure illustrates a classic DAG, in which all nodes are connected by at least one directional edge, and all pathways lead to a single end-state.
Uses of DAGs in Data Science
In data applications like HASH, DAGs are commonly used to illustrate:
- data pipelines: the decision and processing steps taken as data flows through a pipeline
- schedules: any system of tasks with ordering constraints (not just a data pipeline) can be illustrated with a DAG
- dependency/citation graphs: a list of dependencies or citations that allows the provenance of work to be tracked
In mathematical terms, DAGs are a specific subclass of oriented graphs (graphs without bidirectional edges). Ultimately though, you don’t have to understand the technical ins and outs of DAGs in order to utilize them as part of a data pipeline.
Modern data engineering tools such as hCore abstract away complexity through simple, easy-to-use interfaces that provide prompts and feedback, preventing the creation of malformed DAGs (for example, those which may inadvertently contain circular loops or cycles).