Credit to Jamarius Taylor for thinking up this one. The idea is that visitors that are new have the opportunity to return to the site (just like customers who buy once have an opportunity to buy more). Obviously, not every site can reasonably expect all visitors to return, and, even those that do will have differing timeframes within which they would expect visitors to return.

Nonetheless, this seems to be an illustrative example. We’re going to use a waterfall chart for the visualization. The waterfalls package offers one option for doing this, but that doesn’t exactly seem to be a mainstream package, so this example simply relies on plain ol’ ggplot2 and the various machinations required therein.

Website Churn

If a website has zero churn, then we would expect each week’s users to be the number of users from the _previous week plus the number of new users from the current week. This assumes that active users return at least once per week (which is fair, if, clearly, an ideal that is unachievable):

$[\#\ of\ Users\ Current\ Week] = [\#\ of\ Users\ Previous\ Week] + [\#\ of\ New\ Users\ Current\ Week]$

Therefore, one way to calculate the churn is to take that theoretical ideal and subtract the actual number of users from it. That would be the number of non-returning users:

$[Churned\ Visitors] = [\#\ of\ Users\ Previous\ Week] + [\#\ of\ New\ Users\ Current\ Week] - [\#\ of\ Users\ Current\ Week]$

The churn rate would then be:

$[Churn\ Rate] = \frac{[Churned\ Visitors]}{[\#\ of\ Users\ Previous\ Week]}$

We’re not actually going to calculate Churn Rate in this example…but we could.

Setup/Config

Be sure you’ve completed the steps on the Initial Setup page before running this code.

# Load the necessary libraries. These libraries aren't all necessarily required for every
# example, but, for simplicity's sake, we're going ahead and including them in every example.
# The "typical" way to load these is simply with "library([package name])." But, the handy
# thing about using the approach below -- which uses the pacman package -- is that it will
# check that each package exists and actually install any that are missing before loading
# the package.
if (!require("pacman")) install.packages("pacman")
tidyverse,         # Includes dplyr, ggplot2, and others; very key!
devtools,          # Generally handy
googleVis,         # Useful for some of the visualizations
scales)            # Useful for some number formatting in the visualizations

# Authorize GA. Depending on if you've done this already and a .ga-httr-oauth file has
# been saved or not, this may pop you over to a browser to authenticate.

ga_auth(token = ".ga-httr-oauth")

# Set the view ID and the date range. If you want to, you can swap out the Sys.getenv()
# call and just replace that with a hardcoded value for the view ID. And, the start
# and end date are currently set to choose the last 30 days, but those can be
# hardcoded as well.
view_id <- Sys.getenv("GA_VIEW_ID")
totals_and_churn_df$newUsers[1] <- NA # Display the results head(totals_and_churn_df) date users newUsers previous_week_users churned_users 2018-33 1063 NA 0 NA 2018-34 1121 934 1063 876 2018-35 1049 866 1121 938 2018-36 992 825 1049 882 2018-37 966 807 992 833 2018-38 1034 864 966 796 Prep the Data to Be a Ready for Plotting Unfortunately, waterfall charts are kind of tricky to do in ggplot2. But, we can use geom_rect() to draw this, which means we need to define the min and max for x and y for each box on the chart. We’ll go ahead and introduce some redundancy in the plotting for the sake of clarity by defining three types of bars that, for each week, will appear in this order: • newUsers – these are the new users for each week; we’ll plot these as green • churned_users – these are the churned users each week; we’ll plot these as red • users – these are the actual users for each week; we’ll plot these as gray The xmin and xmax for each of these will differ: newUsers will be offset to the left of the week, and users will be offset to the right. Yeesh. This is confusing. And it takes a bit of code to get that all worked out: # Add a few intermediate values that we'll use after we convert to a long forma waterfall_df <- totals_and_churn_df %>% mutate(week_max = users + churned_users, week_current_users = users) # Convert to long format waterfall_df <- waterfall_df %>% gather(key = "user_type", value = "number_of_users", -date, -previous_week_users, -week_max, -week_current_users, na.rm = TRUE) # Function to calculate xmin, xmax, ymin, and ymax when passed a row from the # data set. calc_rect_coords <- function(date, previous_week_users, week_max, week_current_users, user_type, number_of_users){ x_offset <- 0.25 # Base offset for x for rects bar_half_width <- 0.1 # Define the bar width if(user_type == "users"){ xmin <- as.numeric(date) + x_offset - bar_half_width - 0.02 # Make it a little wider xmax <- as.numeric(date) + x_offset + bar_half_width + 0.02 ymin <- 0 ymax <- number_of_users user_type = "Users" } else if(user_type == "churned_users"){ xmin <- as.numeric(date) - bar_half_width xmax <- as.numeric(date) + bar_half_width ymin <- week_current_users ymax <- week_current_users + number_of_users user_type = "Churned Users" } else # New Users { xmin <- as.numeric(date) - x_offset - bar_half_width xmax <- as.numeric(date) - x_offset + bar_half_width ymin <- previous_week_users ymax <- previous_week_users + number_of_users user_type = "New Users" } result = data.frame(date = date, user_type = user_type, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax) } # Build out all of our rectangles. plot_data <- pmap_dfr(waterfall_df, calc_rect_coords) # This trashes our nice date factor, so drop the one that came back # and add on our original plot_data <- select(plot_data, -date) plot_data <- cbind(plot_data, waterfall_df$date)
plot_data <- plot_data %>%
select(date = waterfall_df\$date,
user_type,
xmin, xmax, ymin, ymax)

head(plot_data, 20)
date user_type xmin xmax ymin ymax
2018-33 Users 1.13 1.37 0 1063
2018-34 Users 2.13 2.37 0 1121
2018-35 Users 3.13 3.37 0 1049
2018-36 Users 4.13 4.37 0 992
2018-37 Users 5.13 5.37 0 966
2018-38 Users 6.13 6.37 0 1034
2018-39 Users 7.13 7.37 0 1098
2018-40 Users 8.13 8.37 0 1059
2018-34 New Users 1.65 1.85 1063 1997
2018-35 New Users 2.65 2.85 1121 1987
2018-36 New Users 3.65 3.85 1049 1874
2018-37 New Users 4.65 4.85 992 1799
2018-38 New Users 5.65 5.85 966 1830
2018-39 New Users 6.65 6.85 1034 1975
2018-40 New Users 7.65 7.85 1098 2007
2018-34 Churned Users 1.90 2.10 1121 1997
2018-35 Churned Users 2.90 3.10 1049 1987
2018-36 Churned Users 3.90 4.10 992 1874
2018-37 Churned Users 4.90 5.10 966 1799
2018-38 Churned Users 5.90 6.10 1034 1830

Data Visualization

With all that data prep, we can now plot the results. The way to read this is that, each week, there is a number of users. Then, for the next week, in order, we:

• Add the new users gained (new users for the week) – green
• Then, subtract the (calculated) churned users – red
• So we wind up with the total users for that week – gray

The thing that may be a little misleading here is that we’re making an assumption that “non-churned users” actually return to the site each subsequent week. But, that may not be too bad, especially if we’re looking to illustrate that it’s a constant battle to get new traffic to the site!

As one note, ideally, this chart would also include horizontal lines connecting the top of each Users bar to the bottom of each New Users bar, from the top of each New Users bar to the top of each Churned Users bar, and from the bottom of each Churned Users bar to the top of each Users bar. But, that’s an even bigger hassle, so it’s not included here.

# Define the theme
theme_waterfall <- theme_light() +
theme(legend.position = "top",
legend.title = element_blank(),
plot.title = element_text(hjust=0.5),
panel.grid.major.x = element_blank(),
panel.border = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank())

# Build the actual plot. What we're actually doing here is drawing each bar as a
# separate rectangle.
gg_results <- ggplot(plot_data, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax,
fill = user_type, x=date)) +
geom_rect() +
scale_fill_manual(values = c("#f98e8e","#8ee26a","gray60")) +
scale_y_continuous(label = comma, expand = c(0,0)) +
labs(title = "Waterfall of Users Over Time", y="Users") +
theme_waterfall

# Output the plot
gg_results

This site is a sub-site to dartistics.com