Everything you store in R - datasets, variables, a list of village names, a total population number, even outputs such as graphs - are objects which are assigned a name and can be referenced in later commands.
An object exists when you have assigned it a value (see the assignment section below). When it is assigned a value, the object appears in the Environment (see the upper right pane of RStudio). It can then be operated upon, manipulated, changed, and re-defined.
<-
)You create objects by assigning them a value with the <- operator. You can think of the assignment operator <-
as the words “is defined as”. Assignment commands generally follow a standard order:
object_name <- value (or process/calculation that produce a value)
For example: While using an outbreak template you may want to record the current reporting week for later reference. In this example, the object
reporting_week
is created when it is assigned the character value"2018-W10"
.reporting_week
will appear in the R Environment (upper-right pane) and can be referenced in later commands.
See the commands and their output in the boxes below. Note the [1] in the output is simply indicating that you are viewing the first (and only) item of the output
reporting_week <- "2018-W10" # creates the object reporting_week by assigning a value
reporting_week # prints the current value of reporting_week
## [1] "2018-W10"
IMPORTANT: An object’s value can be over-written at any time by running an assignment command to re-define its value. Thus, the order of the commands run is very important.
reporting_week <- "2018-W51" # assigns a NEW value to the object reporting_week
reporting_week # prints the current value of reporting_week
## [1] "2018-W51"
Datasets are also assigned names and defined as objects when they are imported. In the code below:
The object linelist_raw
is created and assigned the value of an imported CSV file
The object linelist_cleaned
is created and assigned the value of linelist_raw
linelist_cleaned
is re-defined as itself, but mutated to include a new variable, obs_days
, representing the number of days between patient admission and exit
# linelist_raw is created and assigned the value of the imported CSV file
linelist_raw <- rio::import(here("linelist.csv"))
# linelist_cleaned is created and assigned the value of linelist_raw
linelist_cleaned <- linelist_raw
# linelist_cleaned is RE-defined as itself, but modified to include a new variable
linelist_cleaned <- mutate(linelist_cleaned,
obs_days = as.numeric(date_of_exit - date_of_admission))
You can read more about importing and exporting datasets with the rio package in this vignette
A quick note on naming of objects:
Objects can be a single piece of data (e.g. my_number <- 24
), or they can consist of structured data. The graphic below, sourced from this online R tutorial shows some common data structures and their names. Not included in this image is spatial data, which is discussed in another page.
Using the templates, you will most commonly encounter data frames and vectors:
Common structure | Explanation | Example from templates |
---|---|---|
Vectors | A container for a sequence of singular objects, all of the same class (e.g. numeric, character). | “Variables” (columns) in data frames are vectors (e.g. the variable age_years ). |
Data Frames | Vectors (e.g. columns) that are bound together that all have the same number of rows. | linelist_raw and linelist_cleaned are both data frames. |
Note that to create a vector that “stands alone”, or is not part of a data frame (such as a list of location names), the function c()
is often used:
list_of_names <- c("Ruhengeri", "Gisenyi", "Kigali", "Butare")
$
to access/call variablesVectors within a data frame (variables in a dataset) can be called or referenced using the $
symbol. The $
symbol connects the name of the column to the name of its dataset. The $
symbol must be used, otherwise R will not know where to look for or create the column.
# Retrieve the length of the vector age_years
# (age_years is in the data frame linelist_cleaned)
length(linelist_cleaned$age_years)
## [1] 300
By typing the name of the data frame followed by $
you will also see a list of all variables in the data frame. You can scroll through them using arrow key, select one with your Enter key, and avoid spelling mistakes!
All the objects stored in R have a class which tells R how to handle the object. There are many possible classes, but common ones include:
Class | Explanation | Examples |
---|---|---|
Character | These are text/words/sentences “within quotation marks”. Math cannot be done on these objects. | “Character objects are in quotation marks” |
Numeric | These are numbers and can include decimals. If within quotation marks the will be considered character. | 23.1 or 14 |
Integer | Numbers that are whole only (no decimals) | -5, 14, or 2000 |
Factor | These are vectors that have a specified order or hierarchy of values | Variable msf_involvement with ordered values N, S, SUB, and U. |
Date | Once R is told that certain data are Dates, these data can be manipulated and displayed in special ways. See the page on Dates for more information. | 2018-04-12 or 15/3/1954 or Wed 4 Jan 1980 |
Logical | Values must be one of the two special values TRUE or FALSE (note these are not “TRUE” and “FALSE” in quotation marks) | TRUE or FALSE |
data.frame | A data frame is how R stores a typical dataset. It consists of vectors (columns) of data bound together, that all have the same number of observations (rows). | The example AJS dataset named linelist_raw contains 68 variables with 300 observations (rows) each. |
You can test the class of an object by writing it within the function class()
. Note: you can reference a specific column within a dataset using the $
notation to separate the name of the dataset and the name of the column.
class(linelist_raw$age_years)
## [1] "integer"
class(linelist_raw$patient_origin)
## [1] "character"
The templates sometimes contain code converting objects between classes.
Function | Action |
---|---|
as.character() |
Converts to character class |
as.numeric() |
Converts to numeric class |
as.integer() |
Converts to integer class |
as.Date() |
Converts to Date class - Note: see section on dates for details |
as.factor() |
Converts to factor - Note: re-defining order of value levels requires extra arguments |