Methodological Scepticism | Hendrik Erz

Methodological Scepticism

It took several weeks for me to write this article. Not so much because I don't think that today's topic is important, but rather because I felt I would do injustice to the cited papers here. I still have that vague feeling, but I am convinced that I did my best to honour the work of my fellow researchers. This article deals with methodological scepticism, or, for short: Never trust your methods.

There are a few fundamental truths about every science. For example, every science has a few general disagreements between groups occupying different viewpoints. Also, while the basic facts are mostly undisputed, their interpretation can result in fierce arguments.

In (German) sociology, one particularly hot topic researchers have been debating for over a century is known as the Werturteilsstreit, later called Positivismusstreit. The gist of the whole debate is that there are several alternative perspectives on the question “Can a sociologist be completely objective?”. One position stated that each researcher can be one hundred percent objective if they try hard enough. The opposing stance is that a researcher as a human being will always be biased, and hence they should not even try. Somewhere in the middle are positions that revolve around strategies to mitigate potential bias, such as a disclosure of the researcher’s assumptions to enable the reader to “subtract” this bias from all results to arrive at an objective reading.

One of these instances where there are different positions with the fundamental truths being undisputed has recently surfaced during my reading of analytical sociology.

There are roughly three camps within quantitative sociology that I can identify (with my still limited knowledge): The trusting, the sceptics, and the virtuosos. The first group I call “trusting” because they tend to use methods without fully acknowledging their potential implications. The second group I call “sceptics” because they tend to question the very heart of analytical methods. The third one are “virtuosos” because they are both aware of the limitations of the methods they use and yet able to make use of them to meet their ends without running into bigger metaphysical problems.

Of course, the latter group naturally consists of papers that are very limited in scope in such a way that the chosen method makes sense and can be used to infer the results with not much room for discussion. Usually these are papers that have one single problem which they analyse. Salganik’s et al. paper on “Artificial Cultural Markets” belongs to this group (2006). Of course, there are other problems with this paper, but these belong to a different debate (see van de Rijt 2019).

Salganik et al. use the Gini coefficient to measure the inequality among certain songs in their experiment. The Gini coefficient is meant to measure the inequality of continuous variables within a limited set of observations. The team uses each song as an observation and their download count as the continuous variable across which the Gini coefficient shall be computed, which is – in its limited application – perfectly fine.

The first group, however, contains papers that make use of methods and do not reflect on the assumptions of both the researchers and the methods. In this case, the results are debatable. Two papers that fall into this category are a paper on the shifting meaning of class in the twentieth century (Kozlowski, Taddy, and Evans 2019) and on partisanship in US-American politics (Hunzaker and Valentino 2019).

Both papers make use of rather recent machine learning methods to uncover textual clues to answer their stated research question. However, in both cases a lot of assumptions are found in the main text that remain undiscussed. One fundamental question that is not being answered here is: Can the data actually be described by a continuous function as is necessary as soon as a model-based classifier (“neural network”) is involved? Other assumptions are the words chosen to describe the hyper-planes along which the remaining corpus is sorted (Kozlowski et al.) or that moderates would necessarily skew analysis results if you are interested in partisan politicians. The choices seem always right from the perspective of a gut-feeling, but if they really are remains unclear.

And then there is the middle group, the “sceptics.” The sceptics are a group of people who disdain from using methods too loosely, because they are convinced that no model actually can solve the problem at hand on its own. The most famous, yet not the first critic is certainly Leo Breiman (2001). In his “Statistical Modeling: The Two Cultures,” he distinguishes two “cultures” of data modelers: those that try to find a model that resembles the data at hand in order to understand it, and those that try to find models that are fit for the analytical task at hand, which he dubs “algorithmic modeling.” The gist of his argument is summarised in this great statement:

Statisticians in applied research consider data modeling as the template for statistical analysis: Faced with an applied problem, think of a data model. This enterprise has at its heart the belief that a statistician, by imagination and by looking at the data, can invent a reasonably good parametric class of models for a complex mechanism devised by nature. Then parameters are estimated and conclusions are drawn. (Breiman 2001, 202)

Blindly trying to retrieve a model that yields a good approximation of the data resembles a form of a circular argument: it does not care about the actual analytical task at hand and stops short of taking into account mechanisms by which the data is generated in the first place. The first ones to explicate that argument, however, are White et al. as early as 1976:

We argue, instead, that sociological analysis needs explicit models of the structures in observed populations, not measures or statistical indices of deviations from some convenient ideal structure. (White, Boorman, and Breiger 1976, 737)

Somewhat more contemporary is the insight by Grimmer and Stewart with regard to text analysis that no method is an accurate representation of the reality around us. However, with useful validation, we can still use those models:

And because text analysis methods are necessarily incorrect models of language, the output always necessitates careful validation. For supervised classification methods, this requires demonstrating that the classification from machines replicates hand coding. For unsupervised classification and scaling methods, this requires validating that the measures produced correspond with the concepts claimed. (Grimmer and Stewart 2013, 28)

Why am I telling you this? Let us come back to the initial statement about the German Werturteilsstreit. I am of the opinion that, while no human being can be fully unbiased, I still think that it is possible to at least try to be objective as much as possible. If we, as scientists, explicate as many assumptions we have as possible by rigorously reflecting upon our own opinions, we can at least get close to scientific objectivity.

The same applies to models: Only if we really think about the underlying assumptions of data models and algorithms, we can be objective and perform science to a degree where our results fulfil the basic requirements for science: objectivity, validity, and reproducibility.

Salganik et al. used a single correct method to analyse data for a single purpose. What they missed are some long-term implications, but that has been correctly rectified by later research. What especially Kozlowski et al. did, however, is to use methods that are good at analysing large swathes of text without theoretically reasoning for their choice of corpus, their choice of method, and their choice of words.

Sure enough, word embeddings can be interpreted geometrically using cosine similarities (I won’t go into much detail, but in short, cosine similarities give you a number that tells you how close the meanings of two words are). But if you want to trace a single concept (“class”) across a full century, and use a single corpus consisting of various types of text without taking into account different styles of writing, just assuming that certain biases will cancel each other out rather shows that you maybe trust the methods a little too much.

Machine learning promises great new research avenues, but not only are they still in their early phases; we, as researchers, need to fully understand what these new methods can and can not do in order to use them for higher-order tasks. Using methods always involves a trade-off: Either you use something that has been developed for decades. Then you can be sure that your method does what you intend it to and can rest assured your results show what you think. Or you use some shiny new stuff and then run the risk that you did not actually find out what you hoped to find out. Then your results may be questionable. (There is a great book by John Levi Martin, “Thinking through methods,” which is really good at telling you that whatever you think you do is never what you actually do.)

The more I read on state of the art research (often abbreviated SOTA – remember this because I might in the future be absorbed by the abbreviations of my field) the more I become aware of this trade-off. And the more I become aware of this trade-off, the less reasonable seems the majority opinion among data scientists and artificial intelligence practitioners that seem to see no methodological flaw in their methods. And that, finally, might also explain why the discourse around artificial intelligence is such a car wreck to look at.


  • Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.”
  • Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97.
  • Hunzaker, M.B. Fallin, and Lauren Valentino. 2019. “Mapping Cultural Schemas: From Theory to Method.” American Sociological Review 84 (5): 950–81.
  • Kozlowski, Austin C., Matt Taddy, and James A. Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84 (5): 905–49.
  • van de Rijt, Arnout. 2019. “Self-Correcting Dynamics in Social Influence Processes.” American Journal of Sociology 124 (5): 1468–95.
  • Salganik, M. J. 2006. “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.” Science 311 (5762): 854–56.
  • White, Harrison C, Scott A Boorman, and Ronald L Breiger. 1976. “Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions.” American Journal of Sociology 81 (4): 730–80.

Return to the post list