Marginal Revolution: Why Most Published Research Findings are False

Marginal Revolution: Why Most Published Research Findings are False

There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims. However, this should not be surprising. It can be proven that most claimed research findings are false. - John Ioannidis

The argument is from a paper by John Ionnidis, but Alex Tabarrok gives a much easier to read analysis of the fairly simply Bayesian reasoning behind it. Essentially, this is the classic problem of false positives vs. true positives when the condition being tested for is rare in the population (e.g. presence of AIDs in non-high-risk groups, or in this case the truth of a hypothesis).

It might be tempting to argue that the case of a hypothesis under test being true isn’t typically as bad as the general assumptions being made to drive the argument, since the researchers presumably have some thought or intuition that drives them to pick a particular hypothesis to test (they’re not just throwing darts at a board), but consider that works both ways. Despite the common complaint that this or that study is “just another case of science proving what everybody already knows (and so a waste of money)”, I suspect very few researchers deliberately pick hypotheses that are widely believed to be true, particularly if there’s a lot of evidence and research backing up that belief. That’s not, generally speaking, believed to be the way to advance the frontiers of scientific knowledge. But in that case, the sample is biased in the other direction–a random hypothesis to test would include already-known-to-be-true hypotheses in the same proportion that they occur in the population of all hypotheses, so the hypotheses actually attracting attention are less likely to be true than random chance would dictate. Whether the scientist’s intuition towards selecting true hypothesis is a bigger bias than the elimination of all the ones believed to be true is something you can’t really be sure of, so I’d be really cautious about asserting that P(hypothesis is true) must be a lot better than Ionaddis’ calculations allow for.

Monday, November 19th, 2007

More Nerd Pr0n

SAGE: Open Source Mathematics Software

General and Advanced Pure and Applied Mathematics
Use SAGE for studying a huge range of mathematics, including algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, and exact linear algebra.

Recently I’ve found myself having to do real mathematics for the first time in many years. Surprisingly, despite being a programmer of actuarial math calculations, there’s not a lot of call for solving equations; the algorithms seldom change. This year, though, we’ve been rushing to implement changes for the Pension Protection Act which have made things very much more complicated and changed the way we calculate benefits in a big way. And what I’ve found is that I’d forgotten a lot of what I once knew.

So I decided to refresh my memory of a lot of the math I once studied as an undergrad, and, being a nerd one of the first things I did (after buying a couple of books) is to go fishing around for some software to play with. I was particularly interested in stuff that would let me model and graph equations, to try to regain some intuitive sense of their behavior, and I wanted something fairly easy to program. While it’s possible to write spreadsheets to validate the actuarial calculations I’ve been working on (and I have), it’s a bitch-and-a-half to read them again later or debug them. And if possible I wanted it to be open-source.

SAGE is what I found, and to my delight it’s yet another application that makes heavy use of Python. And, no, I didn’t go looking on a Python site to find these. Straight Google searches turned up both SAGE and the NLTK. It’s no coincidence that applications looking for a way to provide straight-forward but powerful programming tend to be built on or with Python, but it wasn’t one of my search criteria.

Again I installed it on both my Windows and Mac boxen, and again the Windows installation was a bit more straightforward, though this time not by much. In the case of Windows, you have to first install a VMWare player (free, but not open source) so that SAGE can run in its own virtual machine, and then you have to configure your firewall (I use ZoneAlarm) so that you can hit the web-server that SAGE runs (if you’re going to use the graphical interface, which is built as a web application). In the case of Mac, stuffit repeatedly had problems opening the .tar.gz file and I ended up just downloading and unzipping and untarring it from the command-line; after that running the setup.py script was straightforward.

Once you have it set up, it’s a breeze to use. You can run it from a command prompt (in fact, you have to start it that way), but the most convenient way to use it is to run a “notebook” sub-application that sets up a web server; surf to that server on your localhost and you get a graphical interface (web-page) that lets you create and manage “notebook” pages–basically persistable interactive sessions. You can even upload these sessions to public instances of SAGE running on the internet (for instance at the University of Washington math department); in fact a great way to explore SAGE is to surf there, create an account and start playing around.

Like most really powerful pieces of software, there is a learning curve to using SAGE, and it’s a steeper curve if you don’t know any Python, but a lot of the most basic stuff (assigning variables, solving simple equations) is pretty much exactly what you expect. Tip: Enter what you evaluate into the box on the screen (the boxes are for code). Shift+Enter to cause SAGE to evaluate what you’ve entered in the box and create a new empty box below, use your mouse to put the cursor in a box you’ve evaluated to edit it, and hover your mouse above the top border of a box until you see a bluish-purple line across the page then click to insert a new empty box prior to an existing box.

I’d also recommend starting with the SAGE Programming for Newbies link from the SAGE Documentation, even though it’s incomplete, rather than the SAGE Tutorial. The Tutorial jumps right into operator precedence, “rings” and other such minutiae without even stopping to explain what’s facing you at the prompt once you’ve completed the install. The SAGE Programming for Newbies is a much gentler introduction (and you can skip the parts that are too gentle, like “what is a computer”).

SAGE is also built to interface with other standard mathematics software packages, like Maple, Mathematica, MATLAB, and so forth, if that floats your boat. I don’t have access to them, but I can see how that would be useful.

What can I say? I find this kind of thing really, really cool.

Wednesday, November 7th, 2007