How do we determine how deadly COVID is in India?
Unpacking infection fatality rate (IFR) and case fatality rate (CFR)
In other posts we talk about the infection fatality rate (IFR) and how that can, e.g., provide information on the causes of the second wave in India. The problem is that we don’t often have access to estimates of IFR, mainly because we do not have unbiased estimates of the level of infection in the population. Here we explore some alternatives to IFR to estimate how deadly COVID infections are.
A. IFR
The infection fatality rate is
The number of deaths among those infected with COVID, divided by
The number of people infected with COVID
We will call the number of deaths the numerator, and the number of infections the denominator.
People usually estimate the numerator with official counts of the number of COVID-related deaths. The denominator is typically measured by testing for antibodies in a representative sample of people from the population. The specific calculation looks like this:
The number of deaths among those infected with COVID to date t, divided by
The fraction of people with antibodies to COVID at date t times the population
The key here is that we are examining a representative population. As we shall see, if the sample of people you are testing is not representative, e.g., drawn from a random sample, then the estimate of the number of people infected may be biased.
There are typically two types of tests for COVID, an RT-PCR or rapid antigen test for current infection or a blood test for antibodies. Few tests for current infection are done on representative samples; these are usually instead done on people with symptoms, perhaps as a diagnostic to determine appropriate treatment. The main tests done on representative samples are antibody tests, which have little diagnostic value to doctors because antibody tests do not tell you who is currently infected; they tell you who was previously infected. Therefore, if you use antibody tests (what is sometimes called a serological study), you want to change your numerator from the number that recently died to the number of people who have died up until the date of the serological survey.
Analysis of official death counts and a few population-representative infection or antibody surveys done in 2020 suggest that older individuals have an exponentially higher death rate from COVID. The black solid line and the dashed lines are estimates from meta-analyses of studies in developed countries. The colored lines are the results from 4 different surveys that one of us (Malani) have done over the last year.
The figure also reveals that the infection fatality rate in India is lower than in other countries at each age group, perhaps by as much as an order of magnitude (see figure below).
Recently there have been reports of undercounts of deaths. This suggests that the gap between developed country IFR and India’s IFR may not be as large as the graph below suggests. It is certainly the case that there is substantial undercounting; the question is it large enough to bridge the gap, which may be an order of magnitude or more. It would be hard to imagine deaths could be undercounted by 10 fold, let alone as much as cases are undercounted.
Another problem we face, e.g., when we want to figure out what is driving the second wave in India, is that we do not have similar numbers for 2021, when new variants are more common. (Tamil Nadu is in the middle of doing its second sero-survey; we may be able to calculate 2021 IFR by age off that. But we do not have that information just yet.)
B. CFR and test positivity rate
The main data we have in the second wave is case fatality rate. (We had the same problem, i.e., only CFR data, at the start of the epidemic last March - July.) Case fatality rate is
The number of deaths among those infected with COVID, divided by
The number of people who test positive for COVID
The numerator is the same. The denominator is different: it is the number of people who test positive for COVID rather than all people infected with COVID. The two numbers can differ if there are many people who are infected but not tested. While testing rates have increased, there are many -- likely the vast, vast majority -- that are not tested. That means CFR is usually much higher than IFR.
There are two solutions. One is to assume that the change or trend in CFR is the same as the change or trend in IFR. That way, even though we cannot figure out the level of IFR, we can figure out if the IFR is greater in 2020 or 2021 by looking to see if CFR is. But is this assumption reasonable? Probably not. The problem is not so much that the people tested are not a representative sample from the population, it’s that the fraction of people tested changes over time. For example, if we hold the level of infections constant, but increase testing, then we will find more positives than before. It will appear as if infections increase, but that was due to testing not due to actual infections increasing.
Another solution is to replace the denominator with the test positivity rate (TPR). The CFR based on TPR is
The number of deaths among those infected with COVID, divided by
The test positivity rate (i.e., the fraction of tests that come back positive) times the population
This may not technically be a CFR anymore, since the denominator changed. We’ll call it IFR-TPR, because the denominator is calculated like one might calculate the denominator of the IFR with a serological study. There we took the fraction of people with antibodies and multiplied by the population. Here we just replace the fraction with antibodies with the fraction with virus as indicated by RT-PCR or rapid antigen tests.
This change to the CFR denominator addresses the problem that the number of tests may increase over time. But it faces a different problem: the folks who are tested are not representative of the population.
The government has a tendency to test people who have symptoms. It tends to focus on people who show up at the hospital (who do so because they have symptoms). It also tests people as part of contact tracing. But the initial contact is typically tested because they are symptomatic. And if there are lots of contacts, then the more symptomatic ones will tend to be tested. The best proof of this is to compare the test positivity rate to (a) the number of positive cases as a fraction of the population and (b) the fraction of people with antibodies. Typically (b) will lie above (a) and below the TPR, because TPR selects people that are more likely to be infected than a random person in the population. The result is that IFR-TPR likely over underestimates the IFR.
One might wonder whether we could at least use the trend in IFR-TPR to tell us about the trend in IFR. The problem with using the trend is partly that testing policy is often set based on TPR. If TPR is too high, governments are advised that they are not testing enough and they increase testing. This means that the government is setting testing rates to limit TPR. While this is good public health practice, it does not help us in estimating the level of infections.
An even bigger part of the problem in examining the trend in IFR-TPR is that the composition of the people who are being tested changes over time. The set of people who get tested depends on the testing regime in place. When the infection level is high, only symptomatic people get tested which increases the TPR. When negative tests are required for travel or to attend school, more asymptomatic people get tested and the TPR decreases.
This all sounds depressing. But it is not a problem that we could not solve if we wanted. India has the capacity to get representative estimates of the infection rate and IFR. It could keep a constantly running, population-representative survey going. This survey would repeatedly pick representative samples and test them. India runs surveys like this all the time, e.g., the National Family Health Survey. From this we could get an unbiased estimate of the infection rate in the population.
For deaths, we could stand up a better death registry, with even crude classification of cause of death. We don’t have to fix the whole system at once; we just have to pick random places to improve. Short of that, we could take a random sample of deaths reported in the surveys such as the one in the last paragraph and do verbal autopsies for deaths in the prior months or year. Folks like Prabhat Jha have used them to give India a much better sense of the leading causes of death in the country.
But until this is done, we can rely on CFR and IFR-TPR, which use data that are imperfect but reported, to estimate a range on IFR. The discussion above suggests that:
IFR-TPR < IFR < CFR
If we can get deaths, cases and number of tests by age, we can calculate this range even by age.
Using this framework, we can now examine the CFR and IFR-TPR over time in India. This is plotted in the figure below. Both CFR and IFR-TPR appear to have decreased between January and March and increased subsequently. However, we can not say with confidence whether the trend until March is driven by better treatment/case management when cases were low, or whether the trend since March by newer variants entering circulating or a change in the composition of people who are getting infected.