A large part of R’s success is the ecosystem of open-sourced packages that add extra functionality to what the core installation of R (referred to as “base R”) includes.

CRAN

The Comprehensive R Archive Network (CRAN) is the official central repository for R packages, where human-reviewed packages go through quality checks for publishing on the CRAN network around the world. It is through CRAN that you install packages with the install.packages() function, or via the Packages pane in the bottom right of RStudio.

One of the nicer aspects of having CRAN as a centralized resource is that, well, you don’t actually have to visit the site very often. When you use the install.packages() function without any of the optional parameters, R automatically looks for the package on CRAN, downloads it, and installs it. There is no need to download a file and then double-click to install it.

Using a Package

This will become absolutely second nature, but it’s one of those things that is not 100% intuitive at first. When you’re using a package (which, really, means you’re using one or more of the functions in a package), there are two things that have to happen:

  1. The package has to be installed. This is what is described above. The first time you use a package, you will need to type install.packages("[the package name]") (note that the name of the package is inside quotation marks) in the console and press <Enter>. That will pull the package (and any packages the package depends on – few packages are built entirely from scratch!) from CRAN. And…it will show up in the Packages tab in the bottom right of your RStudio environment. This is a one-time step, although there is no harm in installing the same package multiple times, and, occasionally, you will find that a package has been updated and needs to be re-installed.

  2. The package has to be loaded. In your script (or in the console), enter library([the package name]) (unlike install.packages(), the name of the package is not placed in quotation marks when using library()). Typically, you will have a list of these at the beginning of your scripts (although there are other techniques for centralizing a list of packages you commonly use…we’re not going to go there for now). Once a package is loaded, it will show with a checkbox next to it in the Packages list in your RStudio environment.

When you get a “function not found” error when running a script, 9 times out of 10 that’s because the function you’re calling is in a package that isn’t loaded.

Dynamically Installing Packages

Throughout this site, as well as in examples across the interwebs, you will see code examples that look something like this:

# This installs googleAnalyticsR if you haven't got it installed already
if(!require(googleAnalyticsR)) install.packages("googleAnalyticsR")

# This then loads the library
library(googleAnalyticsR)

The if() statement is simply conditionally checking to see if the package is already installed. If it is, then we don’t want the script to re-install it. While there is (generally) no harm in re-installation, it’s unnecessary and will tack on a few seconds to the running time for your script.

If, however, the package is not already installed, then you generally want to install it. Otherwise, the script will break when you try to load the library or use any of the functions within it.

Some Useful Packages

This list simply cannot be comprehensive, but, to give you a sense of the breadth and scope of R Packages, below are a few representative examples:

Accessing Web Analytics, Search, and Social Media Data

Many of the platforms that digital analysts work with already have one or more packages developed for accessing their data quickly and easily. See the pages under WORKING WITH APIs in the I/O section of this site for a list of packages that are particularly useful to digital analysts.

Some Other Packages You Will Get Familiar With

You will learn to know the name “Hadley Wickham,” as he is quite possibly the most influential R user (and package and content creator) of the modern era. He also now works for RStudio. All of the packages below are part of what has been dubbed “the Hadleyverse,” because Wickham was key in their creation. He’s actually been working to rebrand this “the tidyverse” because, well, that’s how he rolls.

For a more comprehensive list of packages in the Hadley-tidyverse, you can check out The Hitchhiker’s Guide to the Hadleyverse. But, here, we’re just going to list a small handful of those packages that are practically core to using R:

  • ggplot2 – It’s a package with its own web site! ggplot2 is a very powerful data visualization package. It is also extremely confusing to learn, as you have to throw how you think about the construction of charts and graphs out the window. We’ll be using ggplot2 in this course a bit, but know that it takes a while to fully learn. You can check out this site for some examples of visualizations with ggplot2.
  • dplyr – in some respects, this is a non-essential package, in that there are “base R” ways to do everything that its handful of functions do. On the other hand…those handful of functions, as well as the %>% operator (which is available once you’ve loaded dplyr…but actually comes from another package that dplyr uses) can streamline the heck out of your code!
  • lubridate – it’s a play on words: you’re “lubricating the process of working with dates” (I think). Dates come in all sorts of formats and structures, and this package makes short work of working with them.
  • tidyr – this is a package for getting your data into a “tidy” format…which, put in Excel terms, means a bunch of rows and a limited number of columns so that you could pivot the data very easily (because data that can be easily pivoted can be easily subsetted, plotted, summarized, and aggregated).

If you’re not worried about cluttering up your system with too much stuff, ther is actually a tidyverse package that is just a collection of all of the packages in the tidyverse. So, you get all of the above (and a lot, lot more) with just install.packages("tidyverse"), which you can then load and use with library(tidyverse).

Github

There are also thousands of more experimental packages that are only available through Github. These packages have become much easier to work with since the introduction of the package devtools, which via its function install_github provides access to this universe of packages through scriptable commands.

Beware, though, that the packages on Github are, by their nature, more experimental, so exercise due care to ensure they are trustworthy!