Science and Empirical Research
Scientific research and theories have increasingly become part of our daily lives thanks to media. Especially now we hear a lot of research about the efficacy of different drugs or devices (and masks) in controlling the pandemic. However, some of the findings coming from scientists and researchers seem inconclusive and even contradictory. Why aren’t the results of this research reliable? To understand this, let us go back to the basic definitions.
Science is about understanding natural phenomena through a systematic approach and using this knowledge to make predictions. Often, scientists try to explain the phenomenon in terms of maths, where certain input variables undergo transformation to get the output variables. Sometimes the maths is simple (at least for practical cases) — e.g. equations for many physical measures such as gravitational force, energy, electric current, etc can simply be put in as simple arithmetic operations of input variables. Of course, there are other variables that will also have an impact on the output, but for all practical purposes, we can get the output value from the input using a simple equation. Because of this, the results can be replicated across the world.
However, all natural phenomena are not that simple. There are too many variables involved and they interact in a complex manner to produce results. E.g. weather is notoriously difficult to predict in spite of having so much historical data because too many factors determine weather. In such cases, several approaches are used by scientists. We will look at each case with an example — Is face mask effective in preventing the spread of COVID?
First, of course, is to find all the variables and all the interactions to get the complex mathematical equation. But it is not easy to find all variables and especially their interactions. So only important variables and limited interactions are used to arrive at approximate results. In our example, to test if masks are useful in COVID, we need to get an exact equation of how many viruses are out of every sneeze, which factors like weather, temperature, the density of air, etc contribute to transmission and how many actually reach the next person and so on. All these are difficult to measure and hence, almost impossible to put in an equation. Even a simplified version of the equation may not be particularly accurate.
The second approach, which is being increasingly used in recent years, is machine learning. In this approach, the algorithm is provided with a large number of input-output data points and the algorithm 'learns' the important variables and their interactions to produce results. While it is fancy to use this approach these days, it suffers from several drawbacks. The quality of data is often poor and hence the equation learnt by the algorithm is not very accurate. We don’t know what the algorithm has learnt and how variables interact — we just know that if x input is given algorithm gives y output; we don’t know how and why. Because of these and a few other shortcomings, the outcome of machine learning, though useful, may not always be perfectly accurate. Similar to the first case, we need to get all possible factors which might impact our hypothesis and huge data to run algorithm. Perhaps if we fit a range of sensors on a large number of masks, we can get data.
The third approach, which has been used extensively is empirical research. In this statistical approach as well, the researchers are not concerned about the exact dynamics of the phenomena — they want to see if x causes y with statistical significance. In this approach, we need to make a lot of assumptions such as all other variables are constant, the sample we are testing represents the population and so on. This is the reason why they are not reliable or contradictory. So in simple terms researcher observes x in acertain population and checks if it has resulted in y for statistically significant members of the sample. To continue with our example, the researcher may study groups of people with and without a mask and see how much the disease has spread. Here, the researchers may assume that other factors such as the health of the members of the sample, climactic conditions, the material of mask, etc remain the same or do not impact the study. This is the reason we can get contradictory results to the study.
Most of the research published about COVID-19 these days is of approach 3 and hence there is so much variance and confusion about the results. This is precisely research results should always be looked with scepticism. In fact, a study in 2016 showed that 70% of the scientists polled could not replicate their results! Perhaps the conditions in which they conducted their experiments were no longer valid. So whenever you see new research claiming xyz food cures some disease or condition, one may wonder if it is replicable in our daily lives.
In short, while the first approach is the most accurate and explainable, it is the most difficult one to achieve for most of the natural phenomena due to complexities. Hence, the most common approach we see is the third one, which may not be accurate and also does not explain causation.
However, another aspect I would like to discuss is the experiments conducted say thousands of times over generations and come up with the same conclusion? This is often the case with traditional methods or medicines. For example, in India, the medicinal properties of turmeric are well known. No one might have tested the statistical significance with scientific experiments, but over thousands of years, there is sufficient evidence of it stopping blood flow. Perhaps these traditional approaches are applicable only to Indians in a specific climate (as this was the control group for the experiment) and not to other geographies or when Indians move to other climes. But, at least in my opinion, they should not be dismissed immediately and should be examined for replicability just like one would do for scientific research papers.
To summarize, most complex phenomena are difficult to put in simple equations/ models and to make accurate predictions. Hence, most research employs an empirical approach, which may be accurate only with certain restrictions and should not be blindly adopted.