Chapter 5 Functions and functional programming
In the previous chapter, we introduced data structures that serve as the building blocks for holding, manipulating, and analyzing data. Functions, such as c()
, data.frame()
, str()
, length()
, sum()
, and mean()
, are used to build data structures, describe characteristics of those structures, and summarize/analyze data held within those structures—functions are fundamental to the R language. Gaining a better understanding of existing functions and the ability to write your own functions dramatically increases what you can do with R.
In this chapter, we introduce some key elements of functional programming used in R and found in nearly all programming and scripting languages. We begin with describing characteristics of functions and how you can define and use your own functions. We then introduce several control statements commonly found in functions and used to manipulate inputs and form desired output. We motivate this initial exploration of functional programming using forestry topics and examples. We cover only those topics needed in subsequent chapters, hence this is a limited introduction. References for further readings are provided in the chapter summary.
5.1 Introduction to functions
A function has four key components:
- Name: adheres to R’s naming syntax (Section 3.2.2).
- Arguments: the inputs (optional). Arguments are variables in the function’s body.
- Body: performs some task.
- Return: the output (optional).
Let’s map these components to function pseudocode.31
name_of_function <- function(arg_1, arg_2) {
do something
return(something)
}
- Name: we assign the function to
name_of_function
. - Arguments: are defined between
function()
parentheses. The function has argumentsarg_1
andarg_2
. - Body: what the function will do is defined between the left and right curly brace
{}
. We wrotedo something
to indicate this is where the body of the function is written. - Return: before closing the function body, we define the function output within
return()
. While this component is optional—not all functions return something—most functions we write do return something.
Let’s look at a syntactically correct function.
Consider the four components:
- Name:
my_first_function
. - Arguments: this function has no arguments, i.e., there is nothing defined between the
function()
parentheses. - Body: tells R to print a fun quote by Alan A. Milne.
- Return: no objects are returned by this function.
Once defined, the function can be called. Below we call my_first_function()
.
#> [1] "Ever stop to think, and forget to start again?"
Here’s another function.
Consider the four components:
- Name:
pow
. - Arguments: this function has two arguments,
x
andv
. - Body: uses the values held in
x
andv
to computex^v
and assigns the result toresult
. - Return: returns the value held in
result
.
Let’s try it out.
#> [1] 32
#> [1] 1
#> Warning in x^v: longer object length is not a multiple
#> of shorter object length
#> [1] 4 NA 25
Notice in the last call to pow()
the return vector is the length of the longest input vector (in this case x
) and a warning is thrown. Given R’s recycling rules this behavior is expected, see Section 4.2.3 for topics on vector operations and recycling (here, to match length(x)
, v
is recycled to c(2, 4, 2)
before x^v
is computed).
5.1.1 Specifying arguments
When calling a function, arguments can be specified using their full name, partial name, or position.
We demonstrate each approach to calling a function using args_fn()
defined below.
args_fn()
takes two arguments first_arg
and second_arg
. The values for arguments are combined into a single vector and returned. To reduce clutter, we define the vector to be returned within return()
.
Let’s try calling the function using the full name of each argument. Below we set the arguments to values 1
and 2
, respectively.
#> [1] 1 2
When using the argument names, the order in which they are passed doesn’t matter.
#> [1] 1 2
Partial matching exists to save you from typing long argument names. With partial matching you can specify arguments using the first portion of an argument’s name as long as it’s unambiguous with names of other arguments. Consider the example below, where s
partially matches second_arg
and f
partially matches first_arg
.
#> [1] 1 2
Because it can lead to user mistakes and is not nice to read if you can’t recall what the partial argument name refers to, we rarely use partial matching in practice.
You can call a function without giving argument names, but they must follow the defined function argument order. This is often done in practice when we know the argument order and are feeling lazy and don’t want to type argument names.
#> [1] 1 2
5.1.2 Default values for arguments
Functions can be defined with default values for arguments. In such cases, the user can override a default value by specifying their own value for the argument. Consider the following example that defines a function fn()
with arguments a
and b
. Argument b
is defined with default value 1. After defining the function, it’s called using the default value for b
, and then again with b
set to 2.
#> [1] 2
#> [1] 3
Most functions introduced in previous chapters have default values for arguments. Some functions we’ll introduce later in the book have dozens of arguments and, fortunately, authors of those functions have set reasonable default values.
5.1.3 Scope rules
A variable defined using a function argument or within the function body is a local variable, meaning the variable does not exist outside the function. Consider the example below, where a
and b
are defined in fn()
and hence local to fn()
. When a
or b
are called outside the function, R throws an object not found error.
#> [1] 2
#> [1] 1
#> Error in eval(expr, envir, enclos): object 'b' not found
The code above illustrates a scope rule. A language’s scope rules determine where variables exist, their values, and how those values can be changed. In the code above, these rules dictate that a
and b
do not exist outside of fn()
.
Consider the next example, where b
is used in fn()
but not defined in fn()
. Here, b
is referred to as a free variable and it’s value will need to be set outside the function. Here too, b
is a global variable because it’s available outside and inside fn()
.
#> Error in fn(a = 1): object 'b' not found
#> [1] 2
#> [1] 11
Above, when fn()
is called, R first looks inside the function body to see if b
is defined, when it finds that b
has not been defined in the function body, it looks outside the function, which in this case is the environment from which fn()
was called. In the first call to fn(a = 1)
, the value for b
has yet to be defined, so an object not found error is thrown. Then, prior to the second call to fn(a = 1)
, a global b
is defined and equals 1, hence the free b
is 1 and the output is 2. Then, prior to the third call to fn(a = 1)
, the global b
is redefined to equal 10, hence the free b
is 10 and the output is 11.
So what’s the rule when b
is defined both inside and outside the function? Consider the example below.
fn <- function() {
b <- 1 # Local b.
return(b)
}
b <- 10 # Separate from fn()'s local b.
fn() # Uses the local b.
#> [1] 1
#> [1] 10
Above, when fn()
is called it always uses its local b
, i.e., the local b
is separate from the b
defined outside fn()
. Here, there are two variables with the same name but they exist separately because of their scope.
There are often times when the same variable name is used inside and outside of functions. In such cases, it’s critical to understand the basic scope rules illustrated above.
Functions are objects and hence follow the same scope rules as variables. You can have local and global functions. For example, a function defined within another function has local scope. Consider the following example.
fn_1 <- function(){
# Define local fn_2().
fn_2 <- function(){
print("I'm fn_2() in fn_1().")
}
fn_2() # Call local fn_2().
}
fn_1() # Will call its local fn_2().
#> [1] "I'm fn_2() in fn_1()."
#> Error in fn_2(): could not find function "fn_2"
Above, fn_2()
is local to fn_1()
. Each time fn_1()
is called, fn_2()
is defined and then called to print "I'm fn_2() in fn_1()"
. Then, consistent with local scope rules, when fn_2()
is then called outside fn_1()
an error message lets you know that fn_2()
could not be found.
We can make fn_2()
global to fn_1()
by defining it prior to calling fn_1()
as demonstrated below.
# Define global fn_2().
fn_2 <- function(){
print("I'm global fn_2().")
}
fn_1 <- function(){
fn_2() # Call global fn_2().
}
fn_1() # Will call global fn_2().
#> [1] "I'm global fn_2()."
Talking about scope using local and global terms is useful for building some initial intuition about how functions you write will behave, but it leads to a limited view of what’s going on behind the scenes. We encourage you to eventually seek a more formal understanding of R environments and scope because it will make you a more effective programmer and analyst. Wickham (2019) is an excellent resource for building understanding about these and related topics.
5.1.4 Checking function inputs
It’s easy to make mistakes while coding. One common mistake is providing the wrong data type or object as input to a function. A goal when writing functions is to catch such input errors before they create an unexpected function error or output.
Consider our earlier pow()
function that computes \(x^v\).
Clearly \(x\) and \(v\) should be numeric. So, what happens if we send in a character data type?
#> Error in x^v: non-numeric argument to binary operator
As you can see from the output above, when attempting "a"^2
R throws an error that stops the code. It’s easy to see where the error originates in this simple example. However, for more complex functions, an error might be more difficult to diagnose or, worse yet, the function does not throw an error and returns erroneous output. So, when writing a function, it’s good practice to include input checks and, if needed, stop the function early and provide informative error messages.
A simple approach to including input checks is to use base R’s stopifnot()
(stop if not). This function takes any number of logical expressions as inputs and if any expression is false it calls stop()
and produces an error message indicating the first false expression. Below we add stopifnot()
to pow()
to check that x
and v
are numeric, then test it out.
pow <- function(x, v) {
stopifnot(is.numeric(x), is.numeric(v))
result <- x^v
return(result)
}
# Try it out.
pow(x = "a", v = 2)
#> Error in pow(x = "a", v = 2): is.numeric(x) is not TRUE
You can add a name to each stopifnot()
expression, which is printed if the expression is false. As demonstrated below, this name can provide user-friendly guidance.
pow <- function(x, v) {
stopifnot("The x argument value should be numeric." = is.numeric(x),
"The v argument value should be numeric." = is.numeric(v))
result <- x^v
return(result)
}
# Try it out.
pow(x = "a", v = 2)
#> Error in pow(x = "a", v = 2): The x argument value should be numeric.
5.2 Control statements
5.2.1 if
and if else
statements
We often want to run different code conditional on characteristics of the data or object. This can be done using an if
statement, the pseudocode for which is given below.
if (condition) {
do something
}
The if
statement’s condition
argument is a single logical value. If condition
is TRUE
the code within the if
statement’s body is run. If condition
is FALSE
the code within the if
statement’s body is skipped. Importantly, the if
statement considers a single logical value for condition
, you cannot use a logical vector longer than 1. The condition
value often involves comparison and/or logical operators (see Section 4.7).
Consider the following example that tests if spp
is "Quercus rubrum"
and if true the common name is printed.
#> [1] "The species is northern red oak."
Often we want to run one bit of code if some logical condition is true, and run a different bit of code if the condition is false. This is accomplished using an if else
statement which has the following form.
if (condition) {
do something
} else {
do something else
}
Consider the following example that extends the previous if
example to print a message that lets us know the common name was not found.
spp <- "Acer saccharum"
if (spp == "Quercus rubrum") {
print("The species is northern red oak.")
} else {
print("The species' common name was not found.")
}
#> [1] "The species' common name was not found."
The if else
statement can be extended with an else if
to consider multiple conditions. The code within the first TRUE
else if (condition)
is run and evaluation of any subsequent else if
statements is skipped.
Below we use else if
to match spp
with different Latin names before printing the species’ common name was not found.
spp <- "Populus tremuloides"
if (spp == "Quercus rubrum") {
print("The species is northern red oak.")
} else if (spp == "Acer saccharum") {
print("The species is sugar maple.")
} else if (spp == "Populus tremuloides") {
print("The species is quaking aspen.")
} else {
print("The species' common name was not found.")
}
#> [1] "The species is quaking aspen."
Above, spp == "Quercus rubrum"
is FALSE
so the code next tests spp == "Acer saccharum"
which is also FALSE
, then it tests spp == "Populus tremuloides"
which is TRUE
so "The species is quaking aspen."
is printed.
Next, let’s embed the if else
code above in a function that prints the common name of a given species. We’ll call our function get_common()
.
get_common <- function(spp){
if (spp == "Quercus rubrum") {
print("The species is northern red oak.")
} else if(spp == "Acer saccharum") {
print("The species is sugar maple.")
} else if(spp == "Populus tremuloides") {
print("The species is quaking aspen.")
} else {
print("The species' common name was not found.")
}
}
# Try it out.
get_common("Quercus rubrum")
#> [1] "The species is northern red oak."
#> [1] "The species is quaking aspen."
#> [1] "The species' common name was not found."
5.2.2 Vectorized ifelse
As noted previously, the if
statement accepts a logical condition
of length 1. So what happens if you want to condition on a longer vector of logical values? For these cases, base R offers ifelse()
which is a vectorized function of the following form.
ifelse(condition, if true do something, if false do something else)
Here’s an example of ifelse()
that determines if values in x
are even or odd (%%
is the modulus operator which computes the remainder of x / 2
).
#> [1] "odd" "even" "odd" "even" "odd"
We’ll work with this and related vectorized control functions in Section 7.6.
5.2.3 Loops
Loops are an important component of most programming languages, and R is no exception. That said, in R there are often more efficient and less code intensive approaches to performing the type of iterative operations that typically employ loops. We’ll cover a few of these more efficient approaches in Section 7. There are, however, several good reasons for learning about loops, they are: 1) convenient in some situations; 2) common in R and other languages; 3) instructive for understanding iterative operations and translating mathematical notation to code.
We provide a very brief introduction to while
and for
loops.
A while
loop has the following form.
while (condition) {
do something
}
As long as condition
is TRUE
, do something
is iteratively evaluated. Once condition
is FALSE
, the loop stops. For example, the body of the while loop below prints the value for i
, then increments i
by 1, until i
is greater than 3.
#> [1] 1
#> [1] 2
#> [1] 3
A for
loop has the following form.
for (variable in vector) {
do something
}
Here’s a for
loop example.
#> [1] 1
#> [1] 2
#> [1] 3
As shown using print()
, for each successive iteration in the loop, the variable i
takes the associated increment value in the vector 1:3
.
This increment change in the variable makes for
loops particularly useful for stepping through elements in a vector. As an example, let’s use a for
loop to sum a numeric vector’s values. Below, x
is the vector to be summed, total
will hold the sum, and the for
loop variable i
indexes each successive element in x
. Notice, how length(x)
is used to get the correct number of elements in x
#> [1] 3
Now let’s embed the for
loop above in a function called my_sum()
.
my_sum <- function(x){
total <- 0
for(i in 1:length(x)){
total <- total + x[i]
}
return(total)
}
# Try it out.
v <- c(0, 1, 2)
my_sum(v)
#> [1] 3
5.3 Illustrations
5.3.1 Basal area
Basal area is the cross-sectional area of a tree at breast height (4.5 ft or 1.35 m above ground) and is calculated from the tree’s DBH measurement. Total basal area for a collection of trees, often expressed on a per unit area basis (e.g., square feet per acre or square meters per hectare), is used to characterize stand structure. Such basal area summaries are directly related to stand volume and, to some degree, density, which is the number of trees per unit area.
Given DBH measured in inches, basal area in square feet is \[\begin{equation} \frac{\pi}{144}\cdot \left(\frac{\text{DBH}}{2}\right)^2 = 0.005454\cdot \text{DBH}^2. \tag{5.1} \end{equation}\] Similarly, when DBH is measured in centimeters, basal area in square meters is \[\begin{equation} \frac{\pi}{10000}\cdot\left(\frac{\text{DBH}}{2}\right)^2 = 0.00007854\cdot \text{DBH}^2. \tag{5.2} \end{equation}\]
Looking at (5.1) and (5.2), a general formula for basal area is \(c\cdot \text{DBH}^2\), where \(c\) is often referred to as the ``foresters constant” and, depending on your measurement system, is either 0.005454 or 0.00007854.
Let’s write a function that takes DBH in inches or centimeters and returns basal area in square feet or square meters. We define basal_area()
below with default arguments that assume DBH in inches and basal area to be returned in square feet. Within the function body, we include some input checks that also serve to let the user know accepted values for the dbh_units
and ba_units
arguments. After the input checks, an if
statement is used to compute the appropriate constant \(c\), then an else if
statement computes the desired basal area ba
, which is subsequently returned.
basal_area <- function(dbh, dbh_units = "in", ba_units = "sq_ft"){
# Some input checks.
stopifnot("dbh should be numeric." =
is.numeric(dbh),
"dbh_units should be either 'in' or 'cm'." =
dbh_units %in% c("in", "cm"),
"ba_units should be either 'sq_ft' or 'sq_m'." =
ba_units %in% c("sq_ft", "sq_m")
)
# Define c.
if(ba_units == "sq_ft"){
c <- pi / (4 * 144)
} else{ # ba_units == "sq_m"
c <- pi / (4 * 10000)
}
# Compute ba and return.
if (dbh_units == "in" & ba_units == "sq_ft") {
ba <- c * dbh^2
} else if (dbh_units == "in" & ba_units == "sq_m") {
ba <- c * (2.54 * dbh)^2
} else if (dbh_units == "cm" & ba_units == "sq_m") {
ba <- c * dbh^2
} else{ # dbh_units == "cm" & ba_units == "sq_ft"
ba <- c * (dbh / 2.54)^2
}
return(ba)
}
Let’s try our basal area function.
#> [1] 0.54542
#> [1] 0.050671
# Basal area in square meters from 10 centimeters DBH.
basal_area(dbh = 10, dbh_units = "cm", ba_units = "sq_m")
#> [1] 0.007854
#> [1] 0.08454
#> Error in basal_area(dbh = 10, dbh_units = "ft"): dbh_units should
be either 'in' or 'cm'.
Looks pretty good. The first four calls to basal_area()
try the various input combinations. The last call to basal_area()
tests some error handling by wrongly setting dbh_units = "ft"
, which stops the function and throws an error with an informative message.
Because if
and else if
in basal_area()
involve only dbh_units
and ba_units
, our function can work on a vector of multiple DBH values as demonstrated by the code below.
#> [1] 0.13635 1.22718 0.54542 NA 0.13635
5.3.2 Aboveground biomass allometric equation
Aboveground biomass (AGB) is directly related to carbon stored in woody material and hence a key variable used in forest carbon monitoring programs. A tree’s total AGB is related to its DBH and species. Jenkins et al. (2003) provide the following allometric equation to estimate AGB given a tree’s species and DBH. \[\begin{equation} \text{AGB} = \exp\left(\beta_0 + \beta_1 \ln(\text{DBH})\right), \tag{5.3} \end{equation}\] where
- \(\text{AGB}\) is total aboveground biomass (kg dry weight) for trees 2.5 (cm) DBH and larger,
- \(\text{DBH}\) is diameter at breast height in centimeters,
- \(\exp\) is the exponential function,
- \(\ln\) is the natural logarithm, i.e., inverse function of the exponential function,
- \(\beta_0\) and \(\beta_1\) are species-specific regression coefficients with values given in Jenkins et al. (2003) Table 4.
Let’s implement (5.3) in R. Our function will return AGB given a tree’s DBH and species.
To motivate this example, we consider some tree’s measured on one forest inventory plot in the Penobscot Experimental Forest (PEF). You can read more about the PEF in Section 1.2.4. We’ll work with a set of 99 trees measured in PEF management unit 10 on plot 11 in 2010. Below, these tree data are read into the pef_trees
data frame and the data frame’s structure and first few column values are printed using str()
.
#> 'data.frame': 99 obs. of 2 variables:
#> $ common_name: chr "red maple" "red maple" "balsam fir" "balsam fir" ...
#> $ DBH_in : num 0.7 0.5 1.8 6.4 1.7 ...
To give us a better sense of these data, the next bit of code prints the number of trees by species. The sort()
function sorts the table()
output from least to most abundant species. This table shows there are nine species present on the inventory plot and their respective stem counts.
#>
#> red or black spruce American beech
#> 1 2
#> paper birch northern white-cedar
#> 2 3
#> eastern hemlock white ash
#> 5 7
#> yellow birch red maple
#> 11 22
#> balsam fir
#> 46
There are some features we’ll need to accommodate in our function. Specifically, the allometric equation is parameterized for:
- DBH in centimeters, so the PEF’s DBH values need to be converted from inches to centimeters;
- trees of DBH 2.5 (cm) and larger, we’ll want to return
NA
for all trees less than 2.5 (cm); - species groups, so we’ll need to pass in species and have the function apply the correct set of regression coefficients.
For this example, we tailor the function to those species in the current dataset. If the function was called for a species not in the current dataset then we’d like an informative error to be thrown.
Table 5.1 provides regression coefficient values from Jenkins et al. (2003) for those species inpef_trees
. These coefficient values will be used in our function to compute species-specific AGB.
Species | \(\beta_0\) | \(\beta_1\) |
---|---|---|
American beech | -2.0127 | 2.4342 |
balsam fir | -2.5384 | 2.4814 |
eastern hemlock | -2.5384 | 2.4814 |
northern white-cedar | -2.0336 | 2.2592 |
paper birch | -1.9123 | 2.3651 |
red maple | -1.9123 | 2.3651 |
red or black spruce | -2.0773 | 2.3323 |
white ash | -2.48 | 2.4835 |
yellow birch | -1.923 | 2.3651 |
Given the desired functionality, we define agb()
below. The general form of agb()
with all its if
and if else
statements is a common approach to writing code for allometric equations with coefficient values that differ among species or other characteristics.
agb <- function(dbh, species, dbh_units = "cm") {
# Some input checks.
stopifnot("dbh should be a numeric vector of length 1." =
is.numeric(dbh) & length(dbh) == 1,
"species should be a character of length 1." =
is.character(species) & length(dbh) == 1,
"dbh_units should be either 'in' or 'cm'." =
dbh_units %in% c("in", "cm"))
# Convert DBH to cm if needed.
if(dbh_units == "in"){
dbh <- dbh * 2.54
}
# Assign the species-specific regression coefficients.
if (dbh < 2.5) {
beta_0 <- NA
beta_1 <- NA
} else if (species == "American beech") {
beta_0 <- -2.0127
beta_1 <- 2.4342
} else if (species %in% c("balsam fir", "eastern hemlock")) {
beta_0 <- -2.5384
beta_1 <- 2.4814
} else if (species == "northern white-cedar") {
beta_0 <- -2.0336
beta_1 <- 2.2592
} else if (species %in%
c('paper birch', 'red maple', 'yellow birch')) {
beta_0 <- -1.9123
beta_1 <- 2.3651
} else if (species == "red or black spruce") {
beta_0 <- -2.0773
beta_1 <- 2.3323
} else if (species == 'white ash') {
beta_0 <- -2.48
beta_1 <- 2.4835
} else {
stop(paste0("No allometric equation specified for ", species, "."))
}
# Apply the regression coefficients and return biomass.
agb_km <- exp(beta_0 + beta_1 * log(dbh))
return(agb_km)
}
Let’s step through some features and coding decisions in agb()
.
- Inputs are
dbh
,species
, anddbh_units
with a default value of"cm"
. - Given the way in which the function assigns values to
beta_0
andbeta_1
within theif else
statement, it cannot handle more than onedbh
andspecies
value pair at a time (i.e.,dbh
andspecies
must be vectors of length 1). This vector length constraint and other input requirements are checked in thestopifnot()
. - If
dbh_units == "in"
thendbh
is transformed to centimeters using anif
statement. - The
if else
statement used to assign species-specific coefficient values tobeta_0
andbeta_1
(using values in Table 5.1), also serves to assignNA
to trees with DBH < 2.5 (cm) and throw the desired error and informative message if the species name is not found.
Okay, let’s compute some AGB.
#> [1] 4.2856
#> [1] 43.308
#> [1] NA
#> Error in agb(3, "quaking aspen"): No allometric equation specified
for quaking aspen.
In practice, we’ve seen code written to implement allometric equations, like agb()
, be hundreds of lines long, with seemingly endless control structures, many of which are nested (e.g., if
statements within if
statements). Such approaches have substantial disadvantages, for example they:
- require a lot of code and hence more opportunities to make mistakes;
- are difficult to maintain and fragile, meaning small changes can break something in the control statement sequence;
- are challenging to read, following the flow of multiple branching logical statements is painful;
- need to be embedded in even more code when applied to data (see continuation of the example below);
- are computationally inefficient compared with other preferred approach (see, e.g., Section 7.14.3).
With all that said, let’s continue on and use agb()
to compute AGB for pef_trees
. In the code below, we define AGB_kg
as a vector of zeros the same length as the number of trees in pef_trees
. AGB_kg
will eventually hold AGB for each tree computed using agb()
. Then, using a for
loop, we iterate over the pef_trees
trees, passing each tree’s DBH and species into agb()
and storing the result AGB in AGB_kg
. Finally, we add a AGB_kg
column to pef_trees
.
AGB_kg <- rep(0, nrow(pef_trees)) # Empty vector to hold AGB.
# i indexes rows in pef_trees and elements in AGB_kg.
for(i in 1:nrow(pef_trees)) {
AGB_kg[i] <- agb(pef_trees$DBH_in[i],
pef_trees$common_name[i],
dbh_units = "in")
}
# Assign AGB_kg as a new column in pef_trees.
pef_trees$AGB_kg <- AGB_kg
# Take a look.
str(pef_trees)
#> 'data.frame': 99 obs. of 3 variables:
#> $ common_name: chr "red maple" "red maple" "balsam fir" "balsam fir" ...
#> $ DBH_in : num 0.7 0.5 1.8 6.4 1.7 ...
#> $ AGB_kg : num NA NA 3.43 79.91 3.8 ...
We’ll return to allometric equations in Section 7.14.3, where we present an alternative and preferred approach to implementation.
5.4 Keeping track of functions
As introduced in Section 3, scripts facilitate efficient coding and provide a reproducible data management and analysis record. To this end, it’s advantageous to also keep track of functions you’ve written. Functions you write for use in a script should be placed at the top of the script. Alternatively, if you have functions that you routinely use, they can be collected into a single .R
file then “sourced” via source()
.
For example, we’ve written lots of functions to do various tasks, e.g., compute basal area given DBH or convert various measurements from English to metric units. We keep these functions in a file called “utils.R”. As the file name suggests, these are utility functions used in near daily scripting. Instead of copying and pasting these functions into each new script, we simply load all the utility functions at once by placing source("utils.R")
at the top of the script. Like any file read into R, “utils.R” needs to be in the working directory or you can provide a path to the file’s location on your computer.
Keeping track of the functions you write over time in a “utils.R”, or script name of your choosing, is a great way to build efficiency.
5.5 Summary
Choices and loops are the two primary structures used to control flow in a program. Choices, implanted using if
and if else
, allow different code to be run conditional on the result of a logical test. These logical tests typically involve comparison and/or logical operators introduced in Section 4.7. While not covered in this chapter, switch
is an additional control structure, see, e.g., ?switch
, that is occasionally useful. We’ll return to control structures later in Section 7.6, where we consider vectorized versions of if else
and switch
. Loops, like while
and for
, allow you to iterate over a piece of code until some condition is met or desired number of iterations is reached. Loops can be very useful, however, there are many scenarios where desired iterative operation can be done using other, more efficient, functions.
As noted when introducing this chapter, our focus was narrow and high-level covering only topics needed in subsequent chapters. Our treatment of functions touched on the basics, which is enough for you to start writing your own simple functions and dissecting more complex functions written by others. For a deeper understanding of functional programming in R, we encourage you to study topics on scope and environments covered well in Advanced R by Wickham (2019) and other freely available resources you can find online. These references will also guild you if your eventual aim is to write robust software for public consumption.
5.6 Exercises
Tree id | Species | DBH (in) | Logs (16 foot) |
---|---|---|---|
1 | red oak | 38 | 3.5 |
2 | sugar maple | 10 | 2 |
3 | sugar maple | 17 | 1.5 |
4 | bitternut hickory | 22 | 3 |
5 | tulip poplar | 32 | 3 |
6 | red oak | 15 | 3 |
7 | white oak | 31 | 3 |
8 | sugar maple | 40 | 2.5 |
Exercises 5.1 - 5.6 use tree measurements recorded in Table 5.2.
Exercise 5.1 Create a data frame called plot_trees
that holds data given in Table 5.2. Your data frame should match the following when printed.
#> tree_id spp dbh_in logs_16ft
#> 1 1 red oak 38 3.5
#> 2 2 sugar maple 10 2.0
#> 3 3 sugar maple 17 1.5
#> 4 4 bitternut hickory 22 3.0
#> 5 5 tulip poplar 32 3.0
#> 6 6 red oak 15 3.0
#> 7 7 white oak 31 3.0
#> 8 8 sugar maple 40 2.5
Exercise 5.2 Write a function to compute basal area in square feet given DBH in inches. Apply your function and add your resulting basal area as a new column in plot_trees
. We called our new basal area column BA_sqft
. Your updated plot_trees
should look similar to ours below.32
#> tree_id spp dbh_in logs_16ft BA_sqft
#> 1 1 red oak 38 3.5 7.87580
#> 2 2 sugar maple 10 2.0 0.54542
#> 3 3 sugar maple 17 1.5 1.57625
#> 4 4 bitternut hickory 22 3.0 2.63981
#> 5 5 tulip poplar 32 3.0 5.58505
#> 6 6 red oak 15 3.0 1.22718
#> 7 7 white oak 31 3.0 5.24144
#> 8 8 sugar maple 40 2.5 8.72665
The following information is used for Exercises 5.3 - ??. Foresters use mathematical formulas to estimate the amount of lumber in logs. These formulas are referred to as log rules and typically take into account average taper in the tree’s stem and wood waste from saw kerf, i.e., wood lost due to the blade width. Log taper is often species specific, hence foresters use form classes to describe taper characteristics of species groups. In the eastern US, for example, form class 78 is often used to characterize hardwood species taper.33
International 1/4-inch (Int-1/4), Scribner, and Doyel are the names of common log rules used in the US. Wiant (1986) published the formulas below that estimate board foot volume given DBH (in) and number of 16 foot logs (L) for form class 78 under Int-1/4, Scribner and Doyel log rules.34 The subscript on each estimated volume variable () below indicates the log rule.
Int-1/4 log rule: \[\begin{align} \text{Vol}_I = &(-13.35212 + 9.58615 \text{L} + 1.52968 \text{L}^2) +\nonumber\\ &(1.79620 - 2.59995 \text{L} - 0.27465 \text{L}^2)\text{DBH} +\nonumber\\ &(0.04482 + 0.45997 \text{L} - 0.00961 \text{L}^2)\text{DBH}^2. \tag{5.4} \end{align}\]
Scribner log rule: \[\begin{align} \text{Vol}_S = &(-22.50365 + 17.53508 \text{L} - 0.59242 \text{L}^2) +\nonumber\\ &(3.02988 - 4.34381 \text{L} - 0.02302 \text{L}^2)\text{DBH} +\nonumber\\ &(-0.01969 + 0.51593 \text{L} - 0.02035 \text{L}^2)\text{DBH}^2. \tag{5.5} \end{align}\]
Doyle log rule: \[\begin{align} \text{Vol}_D = &(-29.37337 + 41.51275 \text{L} + 0.55743 \text{L}^2) +\nonumber\\ &(2.78043 - 8.77272 \text{L} - 0.04516 \text{L}^2)\text{DBH} +\nonumber\\ &(0.04177 + 0.59042 \text{L} - 0.01578 \text{L}^2)\text{DBH}^2. \tag{5.6} \end{align}\]
Exercise 5.3 Write a function called vol_I()
that implements (5.4). Use arguments DBH_in
and L_16ft
for DBH (in) and number of logs (16-ft), respectively. Consistent with (5.4), your function should return board foot volume. Your function should accept vectors for DBH_in
and L_16ft
and return a corresponding volume vector.
Below are some tests using our vol_I()
that you can use to test your implementation.
#> [1] 56.129
#> [1] 519.67
#> [1] 56.129 519.665
Exercise 5.4 Using vol_I()
you wrote in Exercise 5.3, add a column called vol_I_bdft
to plot_trees
. You should be able to pass plot_tree$dbh_in
and plot_trees$logs_16ft
into vol_I()
to compute vol_I_16ft
values. Your result should match ours below (we omitted the spp
columns so our output fits nicely on the page).
#> tree_id dbh_in logs_16ft BA_sqft vol_I_bdft
#> 1 1 38 3.5 7.87580 1852.968
#> 2 2 10 2.0 0.54542 59.548
#> 3 3 17 1.5 1.57625 164.301
#> 4 4 22 3.0 2.63981 490.421
#> 5 5 32 3.0 5.58505 1128.315
#> 6 6 15 3.0 1.22718 203.145
#> 7 7 31 3.0 5.24144 1052.482
#> 8 8 40 2.5 8.72665 1578.856
Exercise 5.5 Extend the function you wrote for Exercise 5.4 to allow the user to choose between the three log rules given in (5.4), (5.5), and (5.6). Call your new function vol_bdft()
. In addition to DBH_in
and L_16ft
arguments, your new function should have an argument called log_rule
that takes values "Int-1/4"
, "Scribner"
, or "Doyle"
that are used to control which log rule is computed. Make the default value for log_rule
be "Int-1/4"
. Have your function throw an informative error if the log_rule
value provided by the user does not match one of the accepted values.
Below are some tests using our vol_bdft()
that you can use to test your implementation.
#> [1] 56.129 519.665
#> [1] 46.924 484.605
#> [1] 29.01 428.94
# Check log_rule error handling.
vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Int-1/8")
#> Error in vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Int-1/8"): log_rule should be either 'Int-1/4',
#> 'Scribner', or 'Doyle'.
Exercise 5.6 Using vol_bdft()
from Exercise 5.5, add estimated board foot volume using the tree log rules to plot_trees
. Your result should match ours below (we included only tree_id
and the volume columns so our output fits nicely on the page).
#> tree_id vol_I_bdft vol_S_bdft vol_D_bdft
#> 1 1 1852.968 1777.412 1805.779
#> 2 2 59.548 45.776 22.374
#> 3 3 164.301 147.059 113.743
#> 4 4 490.421 451.134 382.179
#> 5 5 1128.315 1075.319 1045.083
#> 6 6 203.145 174.253 116.996
#> 7 7 1052.482 1000.796 963.753
#> 8 8 1578.856 1527.407 1571.260
References
Pseudocode is notation resembling a simplified programming language and used to convey code features but doesn’t actually conform to syntax requirements and hence doesn’t run.↩︎
Our basal area function uses
pi/(4*144)
. If you wrote your function using the forester’s constant 0.005454 then your basal area will differ slightly from ours beyond the second or third decimal place.↩︎The form class number, in this case 78, is form quotient and is the ratio, expressed as a percent, of some upper-stem diameter to DBH, see, e.g., Burkhart, Avery, and Bullock (2018). A larger form quotient means less stem taper, i.e., more stem volume.↩︎
One board foot as dimension \(1\times 12\times 12\) (in) with volume 144 (in\(^3\)).↩︎