Discord
@ccodwg/covid-canada
Dataset
1
0
Public

COVID-19 Canada Open Data Working Group

Archive of Canadian COVID-19 Data

Sourced from https://github.com/ccodwg/Covid19CanadaArchive

Archive of Canadian COVID-19 Data

This repository provides automated, daily backups of COVID-19 data from Canadian governmental and non-governmental sources.

THE DATA FOR THIS ARCHIVE ARE NO LONGER HOSTED ON GOOGLE DRIVE. For information on how to access the data in the archive, please see Accessing the data below.

File name timestamps are given in ET (America/Toronto) in the following format: %Y-%m-%d_%H-%M. Files are archived nightly around 23:00 ET.

All code in this repository is covered by the MIT License. Licenses and terms of use for each archived dataset are given below.

This repository is maintained by Jean-Paul R. Soucy on behalf of the COVID-19 Open Data Working Group.

Table of contents:

Accessing the data

The easiest way to explore the data in the archive and download individual files is with the interactive file explorer: http://data.opencovid.ca/archive/index.html#archive/

The files in the archive are hosted at the following URLS:

For example, the PHAC Epidemiology Update from November 4, 2020 may be downloaded at the following URLs:

All files in a particular directory may be listed in Python using the following code (change Prefix as desired):

# load modules
from boto3 import client

# get list of files in directory
cli = client('s3')
files = [key['Key'] for key in cli.list_objects(Bucket='data.opencovid.ca', Prefix='archive/can/epidemiology-update-2')['Contents']]

# (optional) filter out supplementary material from list of files in the directory
import re
pat = re.compile('^.*/supplementary/') # match files in supplementary folder
files = [s for s in files if not pat.match(s)]

# print list of files
print(files)

These files could then be downloaded by appending the base URL to the above file list.

In R, the above may be achieved using the following code:

# load packages
library(aws.s3)

# get list of files in directory
files <- aws.s3::get_bucket(bucket = "data.opencovid.ca" , prefix = "archive/can/epidemiology-update-2/", region = "us-east-2")
files <- unlist(lapply(files, function(x) x[["Key"]]), use.names = FALSE)

# (optional) filter out supplementary material from list of files in the directory
files <- files[!grepl("^.*/supplementary/", files)]

# print list of files
print(files)

Contribution guide

Community members may contribute to the project in several ways. In the future, more ways of contributing will be added (e.g., adding metadata).

Add a new dataset

New datasets may be added in the following ways:

  • Create a pull request on GitHub adding the dataset to the appropriate location in the "active" section of data/datasets.json. See other entries for examples.
  • Create an issue on GitHub requesting the new dataset be added.
  • Email the maintainer requesting the new dataset be added.

If you have archived versions of the dataset you are adding (e.g., you previously downloaded the dataset daily), see "Contributing historical data" below.

Retire an inactive dataset

Some datasets continue to exist at a URL but are no longer updated. These datasets should be removed from the nightly update. This may be achieved in the following ways:

  • Create a pull request on GitHub moving the dataset's entry from the "active" section of data/datsets.json to the appropriate location in the "inactive" section. Also, change the dataset's "active" flag from "True" to "False". See other entries for examples.
  • Create an issue on GitHub requesting the dataset be retired.
  • Email the maintainer with the historical data.

Contribute historical data

Historical data (e.g., archived versions of a dataset newly added to the archival tool) may be contributed in the following ways:

  • Create an issue on GitHub regarding the historical data.
  • Email the maintainer regarding the historical data.

Recommended citation

COVID-19 Canada Open Data Working Group. Archive of Canadian COVID-19 Data. https://github.com/ccodwg/Covid19CanadaArchive. (Access date).

Running archiver.py

archiver.py can run in two modes:

  • python archiver.py prod: Download files and upload them to the server.
  • python archiver.py test: Don't upload files to the server, just test that they can be successfully downloaded.

The script relies on setting environmental variables to function properly. See archiver.py for more details.

Data sources/terms of use/supplementary material

The sources and terms of use for each included dataset are linked below. Supplementary material such as data dictionaries and codebooks are also included in the list below, if available. These files are included with the relevant datasets in a directory named supplementary.

Alberta

Edmonton

British Columbia

Canada

Manitoba

Winnipeg

New Brunswick

Newfoundland and Labrador

Northwest Territories

Nova Scotia

Nunavut

Ontario

Toronto

) * Include supplementary material: Technical notes - COVID-19 Active Outbreaks - Community and Workplace Settings * Terms of use: Assumed to be Open Government Licence – Toronto

Ottawa

Quebec

When both French and English data files are available, French files should generally be considered definitive (and in many cases, these files have been captured in the archive for a longer duration). The English versions of files avaiable in both languages will always have their directories marked with "-en" at the end.

Montreal

Prince Edward Island

Saskatchewan

Yukon

Other: Non-governmental sources

Canada

Quebec

COVID-19 Canada Open Data Working Group

Data from the COVID-19 Canada Open Data Working Group is being added on an experimental basis. The full catalogue of historical data will be available in the future.

Data notes

On several occasions, the nightly archival script has failed to run. Depending on when the failure was identified, this may have resulted in a partial or total loss of archival data for that day. A list of these days is provided below:

  • Wednesday, October 21, 2020
  • Thursday, November 19, 2020

In the future, a package will be provided to more easily access the data provided in this archive and to document missing or incomplete data.

Acknowledgements

Many people are to thank for contributing archived data and code to this repository.