Many of us became Excel experts when we were undergraduates, and with good reasons. It is easy to correct errors in your data or your analysis when you can see the data. Also, the WYSIWYG graphics tools are simple to use and can generate pleasant illustrations. (They can also create abominations, but that’s another story for another time).
The fact is, however, that Excel or other spreadsheet software is usually not the right tool to use for graduate or other academic research. There are several reasons for this:
- Excel is not reproducible. Five or six months after you did that analysis, it can be very hard to figure out what exactly you did.
- Excel has limited sheet size (1,048,576 rows by 16,384 columns in Office 2010). Better than it used to have, but if you’ve got mountains of data, you’re sunk. And let’s face it: if you have 17 million data points, you’re never going to want to visually inspect them anyways.
- Excel’s statistics tools are limited and somewhat weak. Try running a regression with two or more variables, and you’ll see what I mean.
- Excel is limited in its platforms. The Mac version is buggy, and nothing exists for Linux (except this hack)
If Excel doesn’t work, what should you replace it with? I submit that R is the program for you.
- R is command-driven, so you are essentially forced to record the steps you take in data management and analysis.
- R is only limited by the size of your computer’s memory.
- R has a worldwide community of users and software designers, almost all of whom are professional computer scientists, statisticians, or other experts.
- R is free and open-source, meaning you can install it on whatever you want, whenever you want. It runs identically on all operating systems.
R has a fairly steep learning curve, and the application itself feels more clunky than competitors like Stata. But I have found that the documentation and help community is better, and RStudio provides a fairly elegant interface.