Metadata is data about data. For any piece of data, there is typically lots of metadata. Some common types of metadata found on HASH include:

  • Provenance metadata – information such as the timestamp of creation, the datetime a file was last modified, the identity of a file’s creator, other information about the data’s origin
  • Quality metadata – a margin of error, confidence interval, or probability score that indicates the likely accuracy of the data
  • Statistical metadata – information that describes the process (e.g. data pipeline) that produced data
  • Legal metadata – the copyright holder, and any licensing terms data may be available under
  • Security metadata – logs recording attempts to access data, and information regarding authorized users

Metadata is useful because it provides context to the data we use.

Some metadata is attached automatically by systems such as HASH when users perform certain actions (e.g. creating a file, connecting a datasource, constructing a flow, or editing a row in a dataset).

Other metadata can be added manually. For example, mapping data to schemas within HASH is an example of purposefully attaching metadata that describes the type of columns in a dataset, and the properties of the agents or events those columns represent.