Paths in strange spaces, part II
by Danielle Navarro, 24 Nov 2019
[WORK IN PROGRESS. NOT READY FOR HUMAN CONSUMPTION!!]
I’m caught up in something I don’t get,
And I don’t understand how I got here,
And I’m losing everything I knew,
And it was all for you.
Could you be a little easier?
Could you be a little easier on me?
– Leddra Chapman, A Little Easier
The first half of this post paints a bleak picture of the scientific process. Not for the first time, I argued that in most situations facing psychological scientists, there is little reason to be worried about “p-hacking” per se because the p-values we report in our papers were never fit for purpose in the first place. Nor do I let Bayesians such as myself off the hook. While discussion among psychological methods researchers tends to focus on the pathologies of p-values in both the presence or absence of preregistration, statisticians are quick to point out that in practice the Bayes factor – often touted as the Bayesian alternative to orthodox hypothesis tests – has pathologies of its own. If preregistration is unlikely to provide me with the oracular foreknowledge I need to construct a Neyman-admissable decision procedure, it it hardly any better equipped to provide me with the precise knowledge I require to specify priors, particularly not with respect to complicated models that require me to think about high-dimensional parameter spaces. Taken at face value, I seem to be arguing a rather nihilistic position. Nothing works. Everything is broken. Inference is a doomed enterprise. Perhaps we are deluding ourselves to think that there is any hope…
library(dplyr) library(jasmines) use_seed(126) %>% entity_heart(1000, size = 10) %>% unfold_tempest(iterations = 50, scale = .1) %>% style_ribbon(type = "curve", size = .5, alpha = c(.5, .02), palette = palette_manual("gray"), background = pagecolour)
On epistemological anarchy
In my early 20s I read Paul Feyerabend’s (1975) classic work on the philosophy of science, provocatively entitled Against Method and I hated it. Part of the reason I hated it so much is the way that his work was introduced to me in my undergraduate philosophy of science class (yes, I actually took one!) As it was described to me, Feyerabend was arguing that there is nothing special that differentiates science from any other belief system, that scientific methodology adds nothing worthwhile, and that when it comes to making inferences about our world, there is only one principle: “anything goes”. I approached the book with a very hostile mindset, and to me it seemed disorganised, unscientific, and riddled with logical errors. In retrospect, I suspect mine was an uncharitable reading. Because I’ve lost my copy of the book, I’ll cheat again and use the Stanford Encyclopedia of Philosophy to supply the context I was missing:
By the early 1970s Feyerabend had flown the falsificationist coop and was ready to expound his own perspective on scientific method. In 1970, he published a long article entitled “Against Method” in which he attacked several prominent accounts of scientific methodology. In their correspondence, he and Lakatos subsequently planned the construction of a debate volume, to be entitled For and Against Method, in which Lakatos would put forward the “rationalist” case that there was an identifiable set of rules of scientific method which make all good science science, and Feyerabend would attack it. Lakatos’ unexpected death in February 1974, which seems to have shocked Feyerabend deeply, meant that the rationalist part of the joint work was never completed.
In other words, the intended structure of the work was one that should be familiar to most of us as scientists: on the one side (Lakatos) we have theory building, and on the other (Feyerabend) we have theory criticism. These two components are supposed to work together, and as individual scientists we try (hope) to engage in both sides of this process iteratively. We build our theory, attack our own theory, when it fails we build a new one and so forth. Both are necessary, and in retrospect I think Against Method is a less impressive work than For and Against Method would have been. In any case, here’s the summary of Feyerabend’s argument:
Against Method explicitly drew the “epistemological anarchist” conclusion that there are no useful and exceptionless methodological rules governing the progress of science or the growth of knowledge. The history of science is so complex that if we insist on a general methodology which will not inhibit progress the only “rule” it will contain will be the useless suggestion: “anything goes”. In particular, logical empiricist methodologies and Popper’s Critical Rationalism would inhibit scientific progress by enforcing restrictive conditions on new theories. The more sophisticated “methodology of scientific research programmes” developed by Lakatos either contains ungrounded value-judgements about what constitutes good science, or is reasonable only because it is epistemological anarchism in disguise. The phenomenon of incommensurability renders the standards which these “rationalists” use for comparing theories inapplicable.
This summary matches my recollection of the book rather well… and it mirrors my own experience as a scientist rather well too. For example, in my own area of research there is a degree of tension between the “Bayesian models of cognition” school of thought that views human inductive reasoning as a form of probablistic inference, the “heuristics and biases” school that assumes our reasoning is based on simple, error-prone approximations, and the “connectionist” school of thought that emphasises the importance of parallel distributed computation and the underlying cognitive architecture. These three different frameworks are largely incommensurate. I’ve used all three at different stages of my career, and while most of my work falls within the “Bayesian cognition” framework I don’t necessarily think it is “better” than the other two.
I’m not even sure the question makes sense. Much as Kuhn points out in The Structure of Scientific Revolutions, each paradigm emphasises different empirical phenomena, selects different operationalisations, and produces formal models that have different intended scope. While in the long run – of course – we would hope to construct a single unifying framework that encompasses all of human cognition, we are not even remotely close to that point. Trying to decide on which of these three completely incommensurable paradigms is “least wrong” when the reality is almost certainly that all three are spectacularly wrong, is just silly. Right now, with the cognitive science literature being where it stands, all three paradigms offer useful insights, and it is a good thing that we as a discipline retain all three.
The Stanford Encyclopedia entry continues:
At a time when Kuhn was downplaying the “irrationalist” implications of his own book, Feyerabend was perceived to be casting himself in the role others already saw as his for the taking.
I respect Feyerabend a lot for this. I did read Kuhn’s book too (though I don’t remember it very well) and I did get the impression that he was a little nervous about the entailments of the “incommensurability” problem, and sought to hide or minimise them. Feyerabend did not shy away from it, and as a result Against Method is a very provocative and unsettling read for a scientist.
On modesty and the scientific bootstrap
Oh dear. I seem to be digging myself into a deeper and deeper hole. I started this post with some statistical concerns about p-values and apparently I’m now at the point of endorsing epistemological anarchy? Really? That’s… not a good place for a scientist to be! Well… maybe it’s not so bad. To me, the major, substantive point that Feyerabend made is this one:
The history of science is so complex that if we insist on a general methodology which will not inhibit progress the only “rule” it will contain will be the useless suggestion: “anything goes”.
I think this is entirely correct. There are no hard and fast rules for good science, no magical set of procedures that we can follow that will guarantee – not even to a known probability of error – discovery of truths. To me it seems logically incoherent even to imagine that such a set of rules could be proposed by humans. It would be a different story if we already knew the truth. If we already knew the truth about our world, and how observations can be made within that world, then we would be able to work out what inferential rules make sense for that world. Until that time comes that we have such an complete understanding, however, we are relying on our best guess about the structure of the world to work out what the rules for learning about the world should be! As scientists we are hoping to bootstrap our way to the truth.
To me that seems like a very reasonable strategy, and I can’t think of anything better, but let’s not pretend that we really know what we’re doing here. We’re all making it up as we go along, to the best of our ability, hoping not to make a mess of everything. Under the circumstances, I think a little modesty in our scientific and statistical claims would be in order, no?
In the garden of forking paths
Besides the importance of being modest, my little story about reading Feyerabend in the misspent years of my youth contains the kernel of a defence for preregistration. Why was I so hostile to Feyerabend the first time I read Against Method? Mostly it was because of my history: I brought my own preconceptions to the book that were based on someone else’s reading of the book (i.e., my undergrad lecturer) and that led me to emphasise some things and not others. I chose those parts of the book that seemed most relevant to me at the time, and those choices shaped my conclusions. Not only that, I was unaware of Feyerabend’s history. I did not know that the work was originally intended to be joint work with Lakatos. Had I read the intended work, For and Against Method, I suspect I’d have come to different conclusions.
Sure, when I sat down as an impressionable 20-something to read the book, I read it with what I thought of as an open mind, and the book itself is what it is, but this “state” is not sufficient to properly describe or make sense of the situation. The earlier states matter too. The book has a history, a path that brought this specific volume to me, and I had a path that brought me to read it. Both of those histories matter. Knowing those histories better, as I do now, leads to a rather different impression of the book (and the reader!) than one might have without knowing this history. If you want to understand my reading of Against Method you need need to know where it came from, and where I came from.
So it goes in scientific research also… if you want to understand any scientific work properly, you need to know its history. You need to know where the data are coming from, what inferences the experiment was designed to support (and what inferences it was not), and because as psychologists we have only limited knowledge of what is and is not relevant to the study of a phenomenon, the only plausible mechanism we have is to document as much as we possibly can, as carefully as we possibly can. Epistemic modesty requires that we acknowledge our own limitations as researchers; we all make mistakes, we all miss details that matter, and the best hope we have for making progress – in my view at least – is to document the path we took through “the garden of forking paths” in the hope anyone who wants to build from our work can “backtrack”, auditing and retracing the process. It also, I think, entails a principle of *kind critique**. Acknowledging our own flaws requires us to avoid harshness in how we evaluate the work of others; to the extent that we begin to endorse a culture of harsh criticism, we encourage others to be competitive, defensive and hostile. This is the antithesis of what we should desire in a scientific process, I think.
The parable Jasmine and Rosemary
Throughout this post I’ve been inserting small snippets of code that construct generative artwork using the jasmines R package that I’ve been slowly developing for myself. The story behind the jasmines package is an interesting one, because I started it as an exercise in reverse engineering – I wanted to understand how Thomas Lin Pederson was creating beautiful pieces like this one:
As a matter of personal aesthetic principle Thomas doesn’t release the source code to his artwork, which does make it tricky to work out what he’s doing. Over time my goals with jasmines have shifted, but my initial intent was simple: work out how the magic of Thomas’ art is performed and construct my own code that could approximate the behaviour of his system. This endeavour has a lot in common with how I do science: “out there” in the world there is a system (the mind) whose behaviour I don’t understand, and my goal when developing computational models of cognition is to develop a formal system whose properties I do understand and whose behaviour approximates (in some limited way) that unknown system. Theoretical modelling in science is fundamentally an exercise in reverse engineering: trying to work out how system A works by building another system B that you control, and whose behaviour is the same as system A. The way I wrote (and am still writing) the jasmines package provides a neat way of thinking about how I do science.
As is usually the case in science, my artistic project did not start as a blank slate. I began the project using some clues that the world had already given me. For instance, I knew that Thomas wrote the ambient package that provides an R interface into the C++ FastNoise library, and I’d played with the ambient package before, enough to know you can use it to generate a variety of different textures. I’d also knew from twitter that Thomas had been playing around with curl noise slightly before these lovely pieces started to appear. So it was natural to expect that the
ambient::curl_noise() function would be the place to start, and the
jasmines::unfold_tempest() function is built on top of it.
My first efforts did not resemble Thomas’ in any meaninfful sense:
This is very pretty, but it tells me very little about how Thomas performs his magic. If you squint hard enough you might suspect that there is something in common between the long flowing tendrils in Thomas’ piece and those in my piece, but it’s hardly very compelling. I retained the source code for this “constellations” piece, and continued tinkering. I rewrote various functions in the jasmines package, I tried new parameter settings and I started generating other pieces. Most of this process was exploratory. Sometimes I would create things that feel qualitatively similar to Thomas’ pieces. For example, while there is almost no pixel-by-pixel match between Thomas’ art and this piece…
or this piece…
… it is hard to avoid the intuition that these pieces share something with Thomas’ at a fundamental level, in a way that the constellations piece does not. As time went on, I created more and more pieces using the jasmines package, and from these creations I started to extract some general sense of what Thomas is doing when he creates his artwork. I liked these experiments in generative artwork enough that I decided to bundle them together into a separate R package that I called rosemary.
The two packages have a symbiotic relationship. It is immediately obvious that the rosemary package is deeply reliant on the toolkit provided by jasmines: it is a collection of experiments designed using jasmines, and cannot function without it. However, it is no less true to not that jasmines depends on rosemary. The experiments that I ran using rosemary are my only mechanism to link my artistic “theory” (jasmines) with the underlying phenomenon (Thomas’ artwork) that I’m studying. Whenever I create a new artistic work with rosemary, it influences how I think about how the generative system needs to be structured, and guides the next step in the development of jasmines.
This kind of “virtuous cycle” is how we hope our scientific processes unfold. We rely on our theoretical insights (jasmines) to design experiments (rosemary) that allow us to modify our theories, design new experiments, and so on. The interplay between these two components is – we hope – the process that allows our theories to better approximate the truth, and our experimental results to better target the phenomenon of interest (I’ve talked about this iterative process before). However… the scientific bootstrap is not magic, and if we are not careful we may find ourselves wandering around at random, making little progress.
The principle that underpins the scientific bootstrap, as I see it, is one of incremental improvement. Each new experiment that Rosemary conducts provides a constraint that Jasmine can incorporate into her theories, giving Rosemary an opportunity to design a better experiment, and so on. In an ideal world every time Jasmine adjusts her theory, she must do so in a way that is consistent with Rosemary’s history. Jasmine is very reactive, and her theory building is almost entirely post hoc, adapting to each new piece of evidence that Rosemary provides. However, she must operate within tight constraints: she must maintain backward compatible with previous experiments by Rosemary. As the volume of experiments increasts, this is much harder to do than it looks. As an example, a couple of weeks ago I pushed an innocuous-looking update to the
jasmines package that broke backward compatibility and rendered a few of my very favourite pieces irreproducible:
You know, the whole purpose of writing integration tests was to prove that my artwork is reproducible… it wasn't supposed to reveal that it is not. Sheesh. 🙄 pic.twitter.com/T5pgIoapIn— Danielle Navarro (@djnavarro) December 27, 2019
The vast majority of the output was fine but a smallish number of the pieces had distortions introduced to the palettes, and it took me several hours of work to figure out what had happened. Why did it distort such a small number of pieces, and why did it do so only in this specific way?
Path tracing in scientific hypothesis spaces
By the time I encountered this problem I’d created a great many pieces with rosemary, and the code base for the jasmines package had become a sprawling mess. This is of course to be expected in any exploratory process, be it scientific or artistic in nature. The underlying process is a search for something, and as you react to different cues in your environment you can end up tracing a very strange path and leaving quite a mess behind you. When something breaks your connection to your past, it is easy to find yourself completely, hopelessly lost.
What saved me is that I had documentation. Both of my packages were developed using git for version control, and I put a modest amount of work into ensuring that each change I made to the source code was added to the repositories with a somewhat informative commit message. It took me a while but I found the specific, utterly innocuous-looking change that had broken my code. There was a line of code that I had used to count the number of colours needed to draw an image that was unnecessarily restrictive, so I modified it so that it would work for a broader range of possible scenarios. The old code only gave a meaningful answer in some cases, so I wrote a new version that always returned the same answer in those cases, but would also work in other cases.
You’d think that would ensure reproducibility of my code, right? The new version always returned the same number as the old one, for every situation of when the old one worked. So it’s functionally “the same” as far as all retrospective cases are concerned and nothing should break. Unfortunately, this intuition is wrong. The two functions produce their answers by invoking the random number generator (RNG), and although both versions produce the same answer, they invoke the RNG a different number of times to do so. When executed within R sessions that have the same RNG state, they always produce the same output, and so one is tempted to conclude that they are equivalent, but because they leave the the RNG in different states, any subsequent computations that rely on the RNG will no longer yield the same output. A simplified illustration of the problem I encountered can be found here and here.
Corrupting the state of the RNG is one particularly insidious way to break the reproducibility of one’s code, but as any software developer will tell you, there’s no shortage of frustrating ways to accidentally break code, even when you are being careful and using good coding practice, especially when you are involved in a collaborative project where you rely on other people’s code and are not always aware of exactly what that code does and how it changes over time. As I said at the time I discovered my RNG state bug, when seeking to write reproducible code, dark shapes move beneath us…
TL;DR: When it comes to reproducibility…https://t.co/Zr2ImhLIyE— Danielle Navarro (@djnavarro) December 27, 2019
Documentation in a time of anarchy
In keeping with my terribly meandering habits when writing blog posts, this discussion has meandered across a variety of topics. Even so, I think there is a single underlying theme to all this, and an important cautionary note for psychologists and other scientists. Whenever you are investigating a complicated system that you barely understand, there is a need for “epistemic modesty”. We need to recognise that there the limits to the diagnosticity of experiments, to the informativeness of our theories and to the relevance of our statistical tools. Very often we don’t even know what is relevant to a phenomenon and what is irrelevant. We will – more often than not, I suspect – turn out to be wrong in what we infer from our data.
If this is the case, what hope do we have for incremental science? If it is in fact true (and I suspect that it is) that most of our empirical findings are wrong and our theoretical models poorly constructed, how will we ever “build” on previous findings? If you endorse a full blown “epistemological anarchist” view the way that Feyerabend does (and the older I get the more my view does start to look rather anarchic), what hope do we have?
The answer, I think, lies in meticulous documentation. In any given project I will always try my very best not to make mistakes, not to rely on foolish assumptions, and so on, but there are so very many ways to make a mistake that it is almost inevitable that I will slip up somewhere. This is exactly the kind of scenario I found myself in with the Rosemary/Jasmine code. I tried so hard to preserve reproducibility, to write good code, and I still made mistakes – but because I left behind a trail showing exactly what I had done and what decisions I had made at each step, I was able to work out what my error was and fix it. I think this principle holds more generally, and highlights the overwhelming importance of transparency and documentation. If someone else wants to rely on my work (even if that’s just me a few months later), it’s not sufficient for me to simply assert “I did X and found Y” the way that a brief report journal article often does. I need to give you more details than that. I need to leave behind this rich trail of breadcrumbs, exposing all the decisions I made and my reasons for making them. In an ideal world my work should “speak for itself”, and it should be possible for anyone reading my 4000 word brief report to extract what they need. The real world, however, is less than ideal, and documentation is critical.
Preregistration as a documentation system
Returning at long last to preregistration, I hope it is clear that while I have been deeply skeptical of the idea that psychologists should be using preregistation to prevent p-hacking (see part 1 of this post), I am extremely sympathetic when people advocate preregistration as a tool to improve documentation and the transparency of the research process. The only sense in which I have “reservations” about preregistration in this context is that I worry that it doesn’t go far enough. Here’s what I wrote in my original blog post. In the original blog post I’m talking primarily about the kind of computational modelling work that I do, but I suspect it has value for other situations as well.
There are reasons why one might want to employ something akin to preregistration here: building a new computational model is a creative and iterative process of trying out different possible models, evaluating them against data, revising the model and so on. As a consequence, of course, there is quite rightly a concern that any model that emerges will be overfit to the data from my experiments. There are tools that I can use to minimize this concern (e.g., cross validation on a hold-out data set, evaluating the model on a replication experiment, and so on), but to employ them effectively I need to have alternatives to my model, and this is where an extremely high degree of transparency is important. Should someone else (even if that’s just me a few months later) want to test this model properly at a later date, it helps to be able to follow my trail backwards through the “garden of forking paths” to see all the little decisions I made along the way, in order to ask what are the alternative models she didn’t build? To my mind this really matters – it’s very easy to make one’s preferred model look good by pitting it against a few competitors that you aren’t all that enthusiastic about. To develop a “severe test”, a model should be evaluated against the best set of competitors you can think of, and that’s the reason I want to be able to “go back in time” to various different decision points in my process and then try to work out what plausible alternatives might have emerged if I’d followed different paths.
With this in mind, I don’t think that (for example) the current OSF registration system provides the right toolkit. To produce the fine-grained document trail that shows precisely what I did, I would need to create a great many registrations for every project (dozens, at the very least). This is technically possible within the OSF system, of course, but there are much better ways to do it. Because what I’m really talking about here is something closer to an “open notebook” approach to research, and there are other excellent tools that can support this. For my own part I try to use git repositories to leave an auditable trail of commit logs that can be archived on any number of public servers (e.g., GitHub, BitBucket, GitLab), and I use literate programming methods such as R Markdown and Jupyter notebooks to allow me to document my thinking on the fly during the model building process. Other researchers might have different approaches.
Although the goals of the open notebook approach are not dissimilar to preregistration insofar as transparency is relevant to both, there are a lot of differences. The workflow around the OSF registration system makes it easy to lodge a small number of detailed registrations, whereas a notebook approach based around git repositories emphasizes many small registrations (“commits”) and allows many paths to be explored in parallel in different branches of the repository. Neither workflow seems inherently better than the other in my view: they’re useful for different things, and as such it is best not to conflate them. Trying to force an open notebook to fit within a framework designed for preregistrations seems likely to cause confusion rather than clarity, so I typically don’t use preregistration as a tool for aiding transparency.
My point when writing this was not to suggest that nobody should use preregistration as a tool for aiding transparency. On the contrary, I think it is a very useful tool for many people. However, it is not the only tool in our toolbox, and there will be many occasions when it isn’t the right tool for the job. From my perspective, I’m in favour of any effort that a researcher can undertake that makes their research more transparent (within the limits of what is ethical), and any form of documentation of process they can provide.