CDC Provisional Deaths

Data from https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm

"Number of deaths reported in [CDC] table are the total number of deaths received and coded as of the date of analysis, and do not represent all deaths that occurred in that period.

"Data during this period are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction, age, and cause of death.

"Percent of expected deaths is the number of deaths for all causes for this week in 2020 compared to the average number across the same week in 2017–2019. Previous analyses of 2015–2016 provisional data completeness have found that completeness is lower in the first few weeks following the date of death."

importdata.py is the script I used to massage the (deliberately obfuscated?) CSV (comma-separated values) file from the CDC into the JavaScript format needed by the Google Chart framework.

But as of May 1, 2020, the CDC removed the CSV file download from the page, generating the table with Javascript. I have written a Selenium script to scrape it automatically, but it's still possible to "scrape" it manually by highlighting and pasting from the page. Luckily it is parseable directly as a TSV (tab separated values) file.

See also https://www.cdc.gov/flu/weekly/ and https://gis.cdc.gov/grasp/fluview/mortality.html for more interesting charts and data. The latter is my source for historical death data, by clicking the Downloads button at the top of the page. And I just now (May 3, 2020) found out about yet another CDC page that gives a much higher number of deaths attributed to covid-19. But this uses both confirmed and suspected deaths, and the raw data is from usafacts.org.

I got inspired to look for that fluview data source after seeing I. Ratel's chart from March 2020, which makes this year's overall mortality look unimpressive compared to previous years. But then I figured "Well, April data will surely show a significant increase." As of April 25 it had not, but shortly after that a large spike appeared in this year's deaths, retroactive to April 4th, week 14.

There are oddities you will notice with this chart: first, there are fractional numbers here and there. This is due to incomplete data for some weeks, which I dealt with by multiplying the deaths by the inverse of the percentage, to get the expected deaths after the rest of the data comes in. Also, the final two weeks of the provisional data are likely to be low even after this adjustment, for whatever reason. I only know this by observing the data week after week. Another problem is the "week 53" data from 2014-15. After trying two other approaches, I used Ratel's method and put week 53 in the "week 1" slot and bumped all the rest of the weeks over.

Since it takes a few days for the flu season mortality chart to update, I wrote yet another script to merge the more-often-updated covid-19 "Total Deaths" number into the data for the chart below. Note that it marks the percent complete as being 100% when it does, to avoid overrrepresenting the deaths, but that doesn't mean it's really complete. You can compare to fludata.csv for what was actually downloaded.

This New York Times article from April continues to be updated. It's showing many other countries with a steep spike in all-mortality deaths followed by an equally steep dropoff. This gives me hope of a similar phenomenon here in the U.S. And note that a few places, e.g. Norway, Israel, and South Africa, have shown no spike in overall deaths at all during the periods charted to date (May 13, 2020).

I've got two other spinoffs of this code at heatmap.html and statechart.html. And finally I've put all the code up at github for the community to take over, with some notes on where a good place to start might be.