A large part of R’s success is the ecosystem of open-sourced packages that add extra functionality to what the core installation of R (referred to as “base R”) includes.

CRAN

The Comprehensive R Archive Network (CRAN) is the official central repository for R packages, where human-reviewed packages go through quality checks for publishing on the CRAN network around the world. It is through CRAN that you install packages with the install.packages() function, or via the Packages pane in the bottom right of RStudio.

One of the nicer aspects of having CRAN as a centralized resource is that, well, you don’t actually have to visit the site very often. When you use the install.packages() function without any of the optional parameters, R automatically looks for the package on CRAN, downloads it, and installs it. There is no need to download a file and then double-click to install it.

Using a Package

This will become absolutely second nature, but it’s one of those things that is not 100% intuitive at first. When you’re using a package (which, really, means you’re using one or more of the functions in a package), there are two things that have to happen:

  1. The package has to be installed. This is what is described above. The first time you use a package, you will need to type install.packages("[the package name]") (note that the name of the package is inside quotation marks) in the console and press <Enter>. That will pull the package (and any packages the package depends on – few packages are built entirely from scratch!) from CRAN. And…it will show up in the Packages tab in the bottom right of your RStudio environment. This is a one-time step, although there is no harm in installing the same package multiple times, and, occasionally, you will find that a package has been updated and needs to be re-installed.

  2. The package has to be loaded. In your script (or in the console), enter library([the package name]) (unlike install.packages(), the name of the package is not placed in quotation marks when using library()). Typically, you will have a list of these at the beginning of your scripts (although there are other techniques for centralizing a list of packages you commonly use…we’re not going to go there for now). Once a package is loaded, it will show with a checkbox next to it in the Packages list in your RStudio environment.

When you get a “function not found” error when running a script, 9 times out of 10 that’s because the function you’re calling is in a package that isn’t loaded.

Dynamically Installing Packages

Throughout this site, as well as in examples across the interwebs, you will see code examples that look something like this:

# This installs googleAnalyticsR if you haven't got it installed already
if(!require(googleAnalyticsR)) install.packages("googleAnalyticsR")

# This then loads the library
library(googleAnalyticsR)

The if() statement is simply conditionally checking to see if the package is already installed. If it is, then we don’t want the script to re-install it. While there is (generally) no harm in re-installation, it’s unnecessary and will tack on a few seconds to the running time for your script.

If, however, the package is not already installed, then you generally want to install it. Otherwise, the script will break when you try to load the library or use any of the functions within it.

Some Useful Packages

This list simply cannot be comprehensive, but, to give you a sense of the breadth and scope of R Packages, below are a few representative examples:

Accessing Web Analytics, Search, and Social Media Data

Many of the platforms that digital analysts work with already have one or more packages developed for accessing their data quickly and easily. See the pages under WORKING WITH APIs in the I/O section of this site for a list of packages that are particularly useful to digital analysts.

Some Other Packages You Will Get Familiar With

You will learn to know the name “Hadley Wickham,” as he is quite possibly the most influential R user (and package and content creator) of the modern era. He also has now worked for RStudio for a number of years. Most of the packages below are part of what used to be called “the Hadleyverse,” because Wickham was key in their creation. But, Wickham was successful at rebranding these as “the Tidyverse” because, at their core, they’re focused on working with data that is in a “tidy” format (which we’ll get to later).

The creators of this site recommend working in the tidyverse world, which means almost every one of their scripts installs the tidyverse package (library(tidyverse)), which is a quick way to install all (or…most?) of the packages in the tidyverse (there are some Tidyverse-adjacent packages that are part of the Tidyverse, but that aren’t part of the core Tidyverse, so they have to be installed separately if they are being used). There is a whole website that goes into great detail as to the tidyverse packages. Some of the key packages in the Tidyverse are:

  • ggplot2 – It’s a package with its own web site! ggplot2 is a very powerful data visualization package. It is also extremely confusing to learn, as you have to throw how you think about the construction of charts and graphs out the window. We’ll be using ggplot2 in this course a bit, but know that it takes a while to fully learn. You can check out this site for some examples of visualizations with ggplot2.
  • dplyr – in some respects, this is a non-essential package, in that there are “base R” ways to do everything that its handful of functions do. On the other hand…those handful of functions, as well as the %>% operator (which is available once you’ve loaded dplyr…but actually comes from another package that dplyr uses called magrittr) can streamline the heck out of your code!
  • tidyr – this is a package for getting your data into a “tidy” format…which, put in Excel terms, means a bunch of rows and a limited number of columns so that you could pivot the data very easily (because data that can be easily pivoted can be easily subsetted, plotted, summarized, and aggregated).
  • purrr – as you dive into R, you will hear that, in general, you should avoid “loops” in your code. This is a somewhat unique aspect of R and its “vector operations” ability. This will all become clearer (and downright magical) later, but, suffice it to say that this package provides some really slick functions for working with functions and vectors.

There are other packages in the core Tidyverse, but we don’t want to overwhelm you (yet). There are a few other packages that are super common, so it’s good to know about them:

  • lubridate – it’s a play on words: you’re “lubricating the process of working with dates”. Dates come in all sorts of formats and structures, and this package makes short work of working with them. This package is considered part of the Tidyverse, but it’s not part of the core Tidyverse.
  • scales – this package is great when working with how “guides” (legends and axes) get displayed and formatted, so it is often handy to have when working with ggplot2. Seriously. If you want to have a comma included in your large numbers, scales is the way to go.

Github

There are also thousands of more experimental packages that are only available through Github. These packages have become much easier to work with since the introduction of the package devtools, which via its function install_github, provides access to this universe of packages through scriptable commands.

Beware, though, that the packages on Github are, by their nature, more experimental, so exercise due care to ensure they are trustworthy!