Discord

Top 10 Datasets: June 2021

HASH’s Index includes a large collection of user-uploaded datasets you can use to generate and power a simulation. Below are some of our favorite datasets: 

  • Master List of LEGO Part Numbers: Master List of Official LEGO™ parts/numbers. Thousands of legos with descriptions of the parts. Provider: Peeron.
  • SARS-CoV-2 Superspreading Events: A dataset with 1,100 SSEs – cases where at least five people were infected with SARS-CoV-2 – from around the world. Provider: Koen Swinkels.
  • Cambridge Bitcoin Electricity Consumption Index (CBECI): Metrics related to the Bitcoin Hashrate and electricity consumption for Bitcoin mining per country. Provider: Cambridge University.
  • Alcohol Consumption: Country level information on the number of servings of beer, spirits, and wine consumed. Provider: FiveThirtyEight.
  • AWS Instances: A dataset of EC2 instances offered through AWS. Includes the instance name, hourly rate, # of virtual CPUs, the memory, storage, & network performance. Provider: Amazon Web Services
  • Earth Surface Temperature Data: The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets. Provider: Berkeley University.
  • California Fires: A dataset of fire incidents in the state of California. Provider: California Department of Forestry and Fire Prevention
  • FDA Product Recalls: A compilation of FDA press releases regarding product recalls issued by companies from 2009 onwards. Provider: Food and Drug Administration 
  • Anonymous Bank Call-Center Data: Call center data from a bank over a one year period. Provider: Technion University.
  • Northern Gannet Tracking Data – Bempton Breeding Birds: Migratory patterns of the Bempton Breeding Birds. GIS data that tracks the birds over a three year period. Provider: Royal Society for the Protection of Birds.

To use these and other datasets in HASH, learn more about using datasets in simulations in the HASH docs, and explore an example of data being used in practice in the HASH city infection model, where real-world housing data determines the location of houses and offices in a SEIR model.