Genetics researchers using Excel: not a good combination, since Excel (and other spreadsheet apps) autoformat gene names like SEPT2 to 2nd of September, which is not gene, but a date. Oops.
The story was in the news all over the place (and was triggered by this paper. The problem is not new, there is an older paper on Excels malicious number conversions which also shows some practical workarounds.)
While the recent examples focused on biology, other sciences might have similar problems, be it due to autoformatting, lack of needed statistical methods or whatever – Excel is not made for use in science. Which naturally leads to the request for scientists to learn a »proper« way to do analysis , like R, Julia or Python – as mentioned in this article .
Even if I think it is a great idea to use and learn one of these, spreadsheets apps like Excel has some advantages that the most "proper" tools don’t have:
Actions can be (at least partly) done using a graphical interface and the applications provides direct feedback for user actions: If you change something, you will see the effects immediately. Contrast this with Julia, R, or Python, were you execute a command, and get no feedback at all. You can make an error and only notice it 10 Min later (It happened to me last week, luckily it was easy to spot)
This is particularly important in early stages of analysis, when data cleaning and preparation takes place. In a spreadsheet, you can easily rename columns (in R it is something like
colnames(myData) <- "newnameCol2"), highlight all the values that have an undefined value or do a simple scatterplot. Yes, all these are easy and even quicker if you know the right commands in R (except for the highlighting – would that be possible?) But often, I don't and an application with an UI shines if that is the case.
So for many early-stage tasks, spreadsheet apps may make sense and may actually reduce errors due to immediate feedback. This is certainly not an endorsement of spreadsheet apps as a great tool for doing statistical analysis in science. But it highlights that for important standard tasks in standard situations it does satisfy user needs like immediate feedback, data exploration and easily doing most common tasks more than »proper« tools.
Why do scientists use Excel instead of »proper« tools? by Jan Dittrich is licensed under a Creative Commons Attribution 4.0 International License.
- 2018-08-20: If you are interested in organizaing your data in spreadsheets, read, well, Data Organization in Spreadsheets ( Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989).
- 2018-08-20: Have a look at the EuSpRIG, the European Spreadsheet Risk Interest Group.