Here are some good habits to get into that I learned from experience to make working with R easier.
script and data folder in the project straight away./scripts/your-script.R in the script folder, and raw or processed data written to the data folder (./data/your_data.csv)setwd()) in the scripts; it ruins portabilitystop() to write informative error messages.roxygen format (#' lines above function).rowname instead of row.name, for instance).class(obj) == "list" not class(obj) == "data.frame"). Read the function documentation carefully.str() a lot.str()?[function name].paste0 - shortcut for paste("blah", "blah", sep = "")list() - it’s a way to group related objects:stuff = list(mydata = data,
the_author = "Bob",
created = Sys.Date())
## can then access items via $
stuff$mydata
stuff$the_author
stuff$created
lapply(), apply(), sapply(), etc.data.frames are special lists…just with equal length objects.## example with some user Ids
lookup <- c("Bill","Ben","Sue","Linda","Gerry")
names(lookup) <- c("1231","2323","5353","3434","9999")
lookup
## 1231 2323 5353 3434 9999
## "Bill" "Ben" "Sue" "Linda" "Gerry"
## this is a big vector of Ids you want to lookup
big_list_of_ids <- c("2323","2323","3434","9999","9999","1231","5353","9999","2323","1231","9999")
lookup[big_list_of_ids]
## 2323 2323 3434 9999 9999 1231 5353 9999 2323
## "Ben" "Ben" "Linda" "Gerry" "Gerry" "Bill" "Sue" "Gerry" "Ben"
## 1231 9999
## "Bill" "Gerry"
Reduce() operates repeatedly on a list, adding the result to its previous. A good example is for reading a folder full of files:## say you have lots of files in folder "./data/myfolder"
## we can use lapply on write.csv to read in all the files:
folder <- "./data/myfolder"
filenames <- list.files(folder)
## a list of data.frames read from the csv
df_list <- lapply(filenames, read.csv)
## operate rbind (bind the rows) on the list, iterativly
all_files <- Reduce(rbind, df_list)
## all_files is now one big dataframe, all_files
## in one line:
all_files <- Reduce(rbind, lapply(filenames, read.csv))
the_data[-1] where the_data is your data.frame. This is equivalent to the_data[,-1]library(dplyr) and the select() function that lets you specify dropped columns like select(-notthisone, -notthisonetoo) or in base R use setdiff() against the names:names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
new_data <- mtcars[ , setdiff(names(mtcars), c("mpg","disp","drat"))]
names(new_data)
## [1] "cyl" "hp" "wt" "qsec" "vs" "am" "gear" "carb"