Corona Virus


Since Mid-March I have been posting various short snippets of my analysis of statistics for Corona /COVID-19 to my Facebook page (see following example).

John Newell

March 15 · 

RE Corona Virus situation Canada versus USA
Currently the Corona cases per 1000 people look similar in Canada and US; however, deaths and testing tell a different story.
The USA currently has ~3000 cases, 60 deaths and ~ 15K people tested (as of 2 days ago).
Canada currently has ~300 cases, 1 death and ~ 15K people tested as of 2 days ago.
Based on cases the rate per 1000 look similar so we are no better than US.
However, If we estimate cases based on deaths adjusted for pop X10 then US has 6 X more cases or a true number of around 18,000 cases.
If we adjust for cases versus test done then the US really has 10 X our cases per 1000 people or 30,000 cases
This explains why Trump was holding back on testing. My estimate is that when the US really starts testing at the same rate as Canada per 1000 pop then US cases will skyrocket. Assuming Canada is at 600 cases next week then US if they do more test might hit 50, 000 cases??

However, on April 15 I did a comparison of the relationship between Corona virus deaths and number of reported cases for major urban areas in Europe and the USA, tat was too big for a Facebook posting so I moved it here (see April 15 report further down page). I have subsequently added more updates.

Update April 30, 2020

The following is a continuation of the work posted on April 27 with updates and a reanalysis of the relationship between COVID-19 (Corona) deaths per million in selected geographic areas and population of those areas. In the April 27 posting I was focusing on the how the number of deaths per million increases as you move towards the epicenters of the outbreak. After further research I have prepared the following graph that demonstrates that the upper limit of the cases per million follows an inverse power relation with the population of the region considered (remember that stats are adjusted per million pop). The data points are based on the data presented in Table 1 and Table 2 presented in the April 27 report (data for Bergamo have been updated based on a recent press report).

This Graph suggest that the upper limit to COVID deaths per million is partially determined by population density since in general population density increases as you zoom in on urban areas. However, this is not just density since many urban areas with much lower COVID death rates have population densities similar to the Bronx and Bergamo. My interpretation is that within a particular population range the death rate increases as the social, economic, ethnic and demographic profile of the community converges (becomes more uniform) on the factors that increase both infection rate and mortality rate.

Basically, the Bronx and Bergamo both have populations that represent the perfect storm of factors for promoting COVID infections and deaths.

Factors that promote high infection rates:

  • an early start to the infection
  • overall population density in area,
  • percent of population living in multi family buildings,
  • average number of people in an household,
  • degree (number) of social interactions,
  • type of interactions (different forms of greeting),
  • use of mass transit,
  • frequency of different types of employment in area (e.g. public facing versus, office vs factory).

Factors that limit transmission:

  • wearing masks,
  • social distancing,
  • closing venues that promote social interactions.

Factors that promote high mortality:

  • demographic profile of area (% seniors in population)
  • overall health of population
  • availability of health care especially to low income individuals,
  • living conditions of seniors (with family, independent of family, in group homes).

This is not implying that both of these areas have the same combination of factors but both likely share factors like: early initial start, population density, family size and makeup.

Update April 27, 2020

This report is intended to show how unrepresentative average COVID statistics are for large geo-political units like Canada, the USA, Europe and the UK. Basically, COVID is not evenly distributed across countries but occurs in clusters. These clusters start as pseudo random outbreaks in different locations. The initial spatial distribution looks random (see map below) but was determined by travel patterns of infected individuals.

US now has more coronavirus cases than either China or Italy

Once established in an area it can grow rapidly if conditions are right or linger as a virtually invisible background level of infection. Some of the factors that determine the spread of the infection in any area include:

  • overall population density in area,
  • percent of population living in multi family buildings,
  • degree (number) of social interactions,
  • type of interactions (different forms of greeting),
  • average number of people in an household,
  • use of mass transit,
  • frequency of different types of employment in area (e.g. public facing versus, office vs factory).
  • any steps in place to mitigate transmission (e.g. wearing masks, social distancing).

The upper limit to the infection (# of cases) in an area is determined by the total population in an area and any subsequent measures to limit spread.

The death rate from the infection will be determined by a separate set of factors including:

  • demographic profile of area (% seniors in population)
  • overall health of population
  • availability of health care especially to low income individuals,
  • living conditions of seniors (with family, independent of family, in group homes),
  • any unknown variations in genetic susceptibility.

Table 1 (see below in this section) summarizes reported cases and deaths from COVID-19 Corona Virus in various regions and sub regions. It was compiled on April 25 and 27 from a number of sources. The analysis focuses on the worse case situation in four large geo-political areas: the USA, the European Union and the UK. It shows how the situation changes as you zero in on the epicenter of the infection hot spots. I have attempted to go to the finest detail possible by identifying the areas within cities with the highest rates of infections (statistics on Corona deaths are not readily available on Internet for sub City regions). Within a city/region there may be multiple hot spots and I have just identified the one with the highest rate of infection.

As you zero in on the epicenter in each region the number of cases and deaths per million population increases. The increase in the absolute number of cases is partially due to the fact that population in urban areas is generally higher than in surrounding areas but there is also an increase in cases per million (which adjust for population). This likely occurs since population density is also greater which promotes the spread of the virus. This is not that different from a colony of bacteria in a Petri Dish where the density of bacteria decreases as you go away from the center of the infection. What this demonstrates is that once the virus becomes established in an area where the the factors, outlined previously, are conducive to its spread it will multiply until it either runs out of victims (herd immunity) or some factor changes (e.g. introduction of mitigating factors like social distancing).

I have included data on cases and deaths but since statistics on Corona cases are very much a function of testing rates and testing strategies which vary widely I focus on deaths which, while still subject to problems, are much more reliable.

The data in Table 1 below shows that the deaths/ million from Corona in New York State are 5X the rate for the USA (New York State being more densely populated than the rest of the US with roughly 64% of the state’s population  in the New York City metropolitan area and 40% in New York City alone), the rate in New York City is 1.6 X the rate for the State and the rate for the Bronx is 2.2 X that for New York City as a whole. Overall, the Bronx has a death rate 19.5X that for the US as a whole. A similar pattern occurs in Europe and the UK with the areas at the epicenter having rates considerably higher than the national average. However, the degree of difference appears to be differences between regions. To investigate these differences I compiled statistics on other urban areas in all three regions plus Canada (see Table 2).

The following Graph Shows the relationship between number of Corona Virus deaths per million people versus the population of selected urban areas. Note the data for Bergamo Italy, the hardest hit area in Lombardy, is for April 6th which was the latest data I could find so likely significantly underestimates the true fig for April 26th. I have added a line (red) that might represent a trend in the upper limit of deaths/ million vs Population. Remember, these figures are adjusted for population so the relationship does not reflect the number of people but is likely a proxy for other factors like population density. Also the smaller urban areas are likely to have more uniform social, economic, ethnic and demographic profiles than larger units (The Bronx vs New York City). As a result it is more likely that the perfect storm of factors leading to higher infection rates are more likely to be found in areas with smaller populations. The ultimate example of this would be a large nursing home for seniors which would contain a large proportion of the most vulnerable people and which in many cases have have very high death rates from Corona.

Table 1.

CasesDeathsRatioPopulationDeaths /MillCases  /Mill
New York State282,000166005.9%19,500,00085114462
New York City155,000118177.6%8,400,000140718452
EU 930,12399,05210.6%446,000,002222085
Lombardy It71,2561310618.4%10,060,00013037083
Bergamo (see notes)9,7122,24523.1%1,100,00020418829
London 23,833469319.7%8,900,0005272678
Table I April 27, 2020

Table 2

Urban AreaCorona DeathsPopulationDeaths/Mill
Vancouver Coastal641,250,00051
New York City118178,400,0001407
Wayne Co MI15801,749,000903
New Orleans16444,650,000354
Miami Dade2872,717,000106
LA County89410,040,00089
San Francisco22882,00025
Lombardy It1310610,060,0001303
Bergamo Lombardy22451,100,0002041
Greater London 46939,300,000505
Stockholm County11282,344,000481
Ile de France557812,210,000457
Netherlands451817, 135,000264
North Rhine-Westphalia109617,839,00061
Table 2 April 27.

Update April 17, 2020.

The following Graph is the latest update to my analysis of Corona Virus statistics. It looks at the relationship between reported cases and deaths for 49 selected urban areas (counties, regions, cities, small countries) in Europe, the UK, the USA and Korea. The data was captured on April 15 using the latest data available on the Internet (see Table at end of this Section). All of these areas are in developed countries with relatively reliable statistical data on Corona deaths (though not perfect). The objective was to examine variations in the relationship in different regions.

In earlier analysis I noted regional variations in the relationship and the data points have been labeled to reflect these: (A) Europe & UK minus Germany and Switzerland; (B) the USA and (C) Germany & Switzerland plus Korea. In area (A) deaths were approximately 14.8% of Reported Cases in (B) 7% and in (C) 3%.

One factor that might explain this relationship is the level of testing in each region. The countries in (C), Korea, Germany and Switzerland are noted for high rates of testing. The USA (B) was initially slow in ramping up testing but since the start of April the rates of testing have increased significantly, especially in the urban areas plotted in this graph. Europe and the UK started mass testing earlier than the USA but the rapid growth in the number of cases overwhelmed the systems.

What this graph indicates that if we use the number of deaths as the most reliable (but not perfect) indicator and we assume the death rate from the virus is relatively constant then the reported cases in Europe and the UK underestimate the true case count when compared to the USA. If we use London as an example the statistic used for April 15 was 17,479 cases with 3265 deaths. If London was following the USA trend line then this would equate to approximately 46,000 cases or a case count 2.6 X higher. This would mean that London’s Case Count is not 16.4% of the case count for the City of New York but more like 43%.

Putting this in the US context it means that many European regions have much higher case counts compared to USA cities than the raw published statistics would suggest. Some reports compare make projections for US Cities like New York based on the situation in European regions like Lombardy but based on this analysis the situation in Lombardy is much worse than in New York and there is potential for the situation to get much worse in the US. It should also be noted that based on area (C) it is likely that the reported statistics for the USA significantly underestimate the true case count.

Posting April 15, 2020

Background and Methodology

I have been attempting to understand the relationship between the number of reported cases of Coronavirus (i.e. COVID-19) and the number of deaths resulting from the Virus. The true relationship between these factors is masked by a number of factors related to how the data is collected; these include:

  • The rate of testing for Corona in different countries and regions,
  • How different regions compile death statistics,
  • Geographic variability in occurrence of Corona virus in Countries,
  • Variations in the Health and Demographics (%>60) by Country,
  • Variations in Population density and culture between countries,
  • Possible political interference in data collection and reporting.

To overcome some of these problems I focused on data from large urban clusters (major cities, urban regions/counties and small high density countries) in Europe, the UK and the USA. The measurement unit depended on what statistics were available (e.g. Urban Counties in US, Regions or countries in Europe). By focusing on these smaller geographic areas I reduced problems resulting from geographic variations in occurrence and testing in large geographic areas like the USA and Italy. The analysis focused on those regions with the highest rates of Corona. The data was compiled on April 14th from sources like Google/Wikipedia. A list of the data used is provided in a Table at the end of this Report.

The Corona Virus Paradox

The following Graph presents a scatter chart for the data contained in the Table. Each dot represents the relationship between reported cases and deaths for that region. The chart excludes New York City and Lombardy Region in Italy. These were exclude since they are such extreme outliers (high values) they would skew the trend lines (more discussion on these later). I also added Labels on each data point to indicate whether the data point was in: Europe/UK (E), The USA (A), Germany (G) or Switzerland (S). The latter two markers (G & S) were identified separately from the rest of Europe since they fit a different pattern.

I fitted separate linear regression line to the data for Europe (including Germany and Switzerland) and the USA. The data for the USA (excluding New York) cluster around a regression line suggesting a death rate of approximately 3.4 deaths per 100 reported cases (3.4%). The European data (Including UK, Germany and Switzerland) show much more variability and suggest a much higher death rate of ~10.5%. However,the chart shows that Germany and Switzerland are a better fit for the USA trend line.

The following Graph shows the data for Europe and the UK including Lombardy but minus Germany and Switzerland. By excluding the latter two countries there is a much better fit for the new trend line and Lombardy now fits the data. However, this new trend line suggest a death rate of 14.7%!

Possible Explanation

Ignoring possible genetic or cultural differences between the two groups the obvious explanation is differences in how the data is collected. Death rates are perhaps the most reliable of the two factors but recent articles have pointed out that death statistics in the US and Germany might be underestimating Corona deaths due to failure to run test and not recording deaths in senior home. This might increase US and German death rates somewhat but can only explain a small part of the difference. The most likely cause of the differences is that most European Countries (excluding Germany and Switzerland) are significantly underestimating the number of cases versus the data plotted for the USA (by a factor of 4 X). We know that reported cases in the USA underestimate the actual number of cases but the European data is even further off the mark.

Germany was noted as having the highest rate of testing in Europe. The USA started testing later than most European countries and initially was testing at a lower rate but the rate of testing accelerated in April. In addition the data I show is for major urban areas with high number of cases and these are the regions of the USA that have the highest rates of testing. By contrast many European countries started testing early but the rate of infections overwhelmed their systems.

Conclusion April 15th section

The true number of Corona Virus cases in most European countries (Italy, Spain, France, UK) are likely 4 times higher than reported rates (using USA reported rates as baseline). What this suggest is that while European rates of new infections may be leveling off somewhat they are doing so at much higher levels than the USA is experiencing. This may mean that Urban areas in the USA may not start leveling off until much late

Bottom line is that the true number of cases in Europe (excluding Germany and Switzerland) may be 4 X greater than those in USA. Good news is that USA is catching up on testing but bad news is that the true number of cases per Mill pop in Italy is not 1.4 X that in US but 7 X i.e. US has a lot further to go before reaching where Italy is.

Update April 16: Refining the Results for April 15

This project is a work in progress and I have updated the report to reflect the new research. The following Graph shows how the number of cases in London would change using American trend line from the previous graph:

At the time I compiled the chart the City of London had 17,479 cases and 3265 deaths. If we extend the American trend line (Red) then London with 3265 deaths should have approximately 90,000 cases (nearing the situation in New York which had 106,000 reported cases at the time the data was compiled).

This figure of 90,000 cases for London does appear high when compared to reports of the situation in London compared to New York City. To investigate this I redid the USA graph to include data for New York City which was excluded from the earlier analysis (see below).

Deaths versus Reported Cases US Urban Areas

Doing this changed the slope of the line and left three outliers which are areas outside New York City (two of which are Long Island and Hudson River valley). These latter two are low density high income suburban areas so the new curve likely is a better fit for more Urban areas.

Using this new American trendline the estimate number of cases for London (Pop 9 mill) would be around 50,000 or about half of those in New York City (Pop 8.4 Mill) which is a better fit to the reported situation. Applying the same trend to the data for Lombardy Italy (Pop 10 Mill) would increase the number of cases from 60314 (see table) to around 160,000 (2.7 X increase).

It should be remembered that the American reported cases are themselves an underestimate of the true cases since they miss many asymptomatic cases. As a result the actual number of cases in places like London and Lombardy will be considerably higher than my estimates.

Data Table April 15

Lombardy Italy6031410901
Grand Est5,4792,008
Provence-Alpes-Côte d’Azur1,924317
North Rhine-Westphalia25,300545
Community of Madrid48,0486,568
Castile-La Mancha14,3291,714
Castile and León13,1801,299
Basque Country11,226859
Los Angeles County9,420320
San Francisco County95715
King County Seattle4,517295
Miami-Dade County7,459109
Orleans Parish LA5,651244
Jefferson Parish LA5,088186
Denver County1,34651
Fairfield County Conn6,004262
Wayne County MI11,648760
Oakland County MI5,073347
Macomb County MI3,418240
Marion County IA3,012123
Fulton County GE1,63552
Harris County TX3,26140
Dallas County TX1,53725
Bergen County NJ9,784453
Hudson County NJ7,469226
Essex County NJ7,410428
Union County NJ6,180209
Middlesex County NJ5,693193
Passaic County NJ5,590131
Nassau County NY24,358910
Suffolk County NY21,643568
Westchester County NY19,786557
Rockland County NY7,965200
Orange County NY5,598171
New York City106,8637,349