This book is in Open Review. We want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right hand corner of the page

Chapter 5 Functions and functional programming

In the previous chapter, we introduced data structures that serve as the building blocks for holding, manipulating, and analyzing data. Functions, such as c(), data.frame(), str(), length(), sum(), and mean(), are used to build data structures, describe characteristics of those structures, and summarize/analyze data held within those structures—functions are fundamental to the R language. Gaining a better understanding of existing functions and the ability to write your own functions dramatically increases what you can do with R.

In this chapter, we introduce some key elements of functional programming used in R and found in nearly all programming and scripting languages. We begin with describing characteristics of functions and how you can define and use your own functions. We then introduce several control statements commonly found in functions and used to manipulate inputs and form desired output. We motivate this initial exploration of functional programming using forestry topics and examples. We cover only those topics needed in subsequent chapters, hence this is a limited introduction. References for further readings are provided in the chapter summary.

5.1 Introduction to functions

A function has four key components:

Name: adheres to R’s naming syntax (Section 3.2.2).
Arguments: the inputs (optional). Arguments are variables in the function’s body.
Body: performs some task.
Return: the output (optional).

Let’s map these components to function pseudocode.³¹

name_of_function <- function(arg_1, arg_2) {
  do something
  return(something)
}

Name: we assign the function to name_of_function.
Arguments: are defined between function() parentheses. The function has arguments arg_1 and arg_2.
Body: what the function will do is defined between the left and right curly brace {}. We wrote do something to indicate this is where the body of the function is written.
Return: before closing the function body, we define the function output within return(). While this component is optional—not all functions return something—most functions we write do return something.

Let’s look at a syntactically correct function.

my_first_function <- function() {
  print("Ever stop to think, and forget to start again?")
}

Consider the four components:

Name: my_first_function.
Arguments: this function has no arguments, i.e., there is nothing defined between the function() parentheses.
Body: tells R to print a fun quote by Alan A. Milne.
Return: no objects are returned by this function.

Once defined, the function can be called. Below we call my_first_function().

my_first_function()

#> [1] "Ever stop to think, and forget to start again?"

Here’s another function.

pow <- function(x, v) {
  result <- x^v
  return(result)
}

Consider the four components:

Name: pow.
Arguments: this function has two arguments, x and v.
Body: uses the values held in x and v to compute x^v and assigns the result to result.
Return: returns the value held in result.

Let’s try it out.

pow(x = 2, v = 5)

#> [1] 32

pow(x = 2, v = 0)

#> [1] 1

pow(x = c(2, NA, 5), v = c(2, 4))

#> Warning in x^v: longer object length is not a multiple
#> of shorter object length

#> [1]  4 NA 25

Notice in the last call to pow() the return vector is the length of the longest input vector (in this case x) and a warning is thrown. Given R’s recycling rules this behavior is expected, see Section 4.2.3 for topics on vector operations and recycling (here, to match length(x), v is recycled to c(2, 4, 2) before x^v is computed).

5.1.1 Specifying arguments

When calling a function, arguments can be specified using their full name, partial name, or position.

We demonstrate each approach to calling a function using args_fn() defined below.

args_fn <- function(first_arg, second_arg) {
  return(c(first_arg, second_arg))
}

args_fn() takes two arguments first_arg and second_arg. The values for arguments are combined into a single vector and returned. To reduce clutter, we define the vector to be returned within return().

Let’s try calling the function using the full name of each argument. Below we set the arguments to values 1 and 2, respectively.

args_fn(first_arg = 1, second_arg = 2)

#> [1] 1 2

When using the argument names, the order in which they are passed doesn’t matter.

args_fn(second_arg = 2, first_arg = 1)

#> [1] 1 2

Partial matching exists to save you from typing long argument names. With partial matching you can specify arguments using the first portion of an argument’s name as long as it’s unambiguous with names of other arguments. Consider the example below, where s partially matches second_arg and f partially matches first_arg.

args_fn(s = 2, f = 1)

#> [1] 1 2

Because it can lead to user mistakes and is not nice to read if you can’t recall what the partial argument name refers to, we rarely use partial matching in practice.

You can call a function without giving argument names, but they must follow the defined function argument order. This is often done in practice when we know the argument order and are feeling lazy and don’t want to type argument names.

args_fn(1, 2)

#> [1] 1 2

5.1.2 Default values for arguments

Functions can be defined with default values for arguments. In such cases, the user can override a default value by specifying their own value for the argument. Consider the following example that defines a function fn() with arguments a and b. Argument b is defined with default value 1. After defining the function, it’s called using the default value for b, and then again with b set to 2.

fn <- function(a, b = 1) {
  return(a + b)
}

fn(a = 1)

#> [1] 2

fn(a = 1, b = 2)

#> [1] 3

Most functions introduced in previous chapters have default values for arguments. Some functions we’ll introduce later in the book have dozens of arguments and, fortunately, authors of those functions have set reasonable default values.

5.1.3 Scope rules

A variable defined using a function argument or within the function body is a local variable, meaning the variable does not exist outside the function. Consider the example below, where a and b are defined in fn() and hence local to fn(). When a or b are called outside the function, R throws an object not found error.

fun <- function(a) {
  b <- 1 # Local a and b.
  return(a + b)
}

fn(a = 1)

#> [1] 2

a # Not defined.

#> [1] 1

b # Not defined.

#> Error in eval(expr, envir, enclos): object 'b' not found

The code above illustrates a scope rule. A language’s scope rules determine where variables exist, their values, and how those values can be changed. In the code above, these rules dictate that a and b do not exist outside of fn().

Consider the next example, where b is used in fn() but not defined in fn(). Here, b is referred to as a free variable and it’s value will need to be set outside the function. Here too, b is a global variable because it’s available outside and inside fn().

fn <- function(a) {
  return(a + b) # Local a and free b.
}

fn(a = 1) # Free b not defined.

#> Error in fn(a = 1): object 'b' not found

b <- 1 # Global b.

fn(a = 1)

#> [1] 2

b <- 10 # Global b redefined.

fn(a = 1)

#> [1] 11

Above, when fn() is called, R first looks inside the function body to see if b is defined, when it finds that b has not been defined in the function body, it looks outside the function, which in this case is the environment from which fn() was called. In the first call to fn(a = 1), the value for b has yet to be defined, so an object not found error is thrown. Then, prior to the second call to fn(a = 1), a global b is defined and equals 1, hence the free b is 1 and the output is 2. Then, prior to the third call to fn(a = 1), the global b is redefined to equal 10, hence the free b is 10 and the output is 11.

So what’s the rule when b is defined both inside and outside the function? Consider the example below.

fn <- function() {
  b <- 1 # Local b.
  return(b)
}

b <- 10 # Separate from fn()'s local b.

fn() # Uses the local b.

#> [1] 1

b # Uses the b defined outside fn().

#> [1] 10

Above, when fn() is called it always uses its local b, i.e., the local b is separate from the b defined outside fn(). Here, there are two variables with the same name but they exist separately because of their scope.

There are often times when the same variable name is used inside and outside of functions. In such cases, it’s critical to understand the basic scope rules illustrated above.

Functions are objects and hence follow the same scope rules as variables. You can have local and global functions. For example, a function defined within another function has local scope. Consider the following example.

fn_1 <- function(){
  
  # Define local fn_2().
  fn_2 <- function(){
    print("I'm fn_2() in fn_1().")
  }
  
  fn_2() # Call local fn_2().
}

fn_1() # Will call its local fn_2().

#> [1] "I'm fn_2() in fn_1()."

fn_2() # Not defined outside fn_1().

#> Error in fn_2(): could not find function "fn_2"

Above, fn_2() is local to fn_1(). Each time fn_1() is called, fn_2() is defined and then called to print "I'm fn_2() in fn_1()". Then, consistent with local scope rules, when fn_2() is then called outside fn_1() an error message lets you know that fn_2() could not be found.

We can make fn_2() global to fn_1() by defining it prior to calling fn_1() as demonstrated below.

# Define global fn_2().
fn_2 <- function(){
    print("I'm global fn_2().")
}

fn_1 <- function(){
  fn_2() # Call global fn_2().
}

fn_1() # Will call global fn_2().

#> [1] "I'm global fn_2()."

Talking about scope using local and global terms is useful for building some initial intuition about how functions you write will behave, but it leads to a limited view of what’s going on behind the scenes. We encourage you to eventually seek a more formal understanding of R environments and scope because it will make you a more effective programmer and analyst. Wickham (2019) is an excellent resource for building understanding about these and related topics.

5.1.4 Checking function inputs

It’s easy to make mistakes while coding. One common mistake is providing the wrong data type or object as input to a function. A goal when writing functions is to catch such input errors before they create an unexpected function error or output.

Consider our earlier pow() function that computes $x^v$.

pow <- function(x, v) {
  result <- x^v
  return(result)
}

Clearly $x$ and $v$ should be numeric. So, what happens if we send in a character data type?

pow(x = "a", v = 2)

#> Error in x^v: non-numeric argument to binary operator

As you can see from the output above, when attempting "a"^2 R throws an error that stops the code. It’s easy to see where the error originates in this simple example. However, for more complex functions, an error might be more difficult to diagnose or, worse yet, the function does not throw an error and returns erroneous output. So, when writing a function, it’s good practice to include input checks and, if needed, stop the function early and provide informative error messages.

A simple approach to including input checks is to use base R’s stopifnot() (stop if not). This function takes any number of logical expressions as inputs and if any expression is false it calls stop() and produces an error message indicating the first false expression. Below we add stopifnot() to pow() to check that x and v are numeric, then test it out.

pow <- function(x, v) {
  stopifnot(is.numeric(x), is.numeric(v))
  
  result <- x^v
  return(result)
}

# Try it out.
pow(x = "a", v = 2)

#> Error in pow(x = "a", v = 2): is.numeric(x) is not TRUE

You can add a name to each stopifnot() expression, which is printed if the expression is false. As demonstrated below, this name can provide user-friendly guidance.

pow <- function(x, v) {
  stopifnot("The x argument value should be numeric." = is.numeric(x), 
            "The v argument value should be numeric." = is.numeric(v))
  
  result <- x^v
  return(result)
}

# Try it out.
pow(x = "a", v = 2)

#> Error in pow(x = "a", v = 2): The x argument value should be numeric.

5.2 Control statements

5.2.1 `if` and `if else` statements

We often want to run different code conditional on characteristics of the data or object. This can be done using an if statement, the pseudocode for which is given below.

if (condition) {
  do something
}

The if statement’s condition argument is a single logical value. If condition is TRUE the code within the if statement’s body is run. If condition is FALSE the code within the if statement’s body is skipped. Importantly, the if statement considers a single logical value for condition, you cannot use a logical vector longer than 1. The condition value often involves comparison and/or logical operators (see Section 4.7).

Consider the following example that tests if spp is "Quercus rubrum" and if true the common name is printed.

spp <- "Quercus rubrum"

if(spp == "Quercus rubrum"){
  print("The species is northern red oak.")
}

#> [1] "The species is northern red oak."

Often we want to run one bit of code if some logical condition is true, and run a different bit of code if the condition is false. This is accomplished using an if else statement which has the following form.

if (condition) {
  do something
} else {
  do something else
}

Consider the following example that extends the previous if example to print a message that lets us know the common name was not found.

spp <- "Acer saccharum"

if (spp == "Quercus rubrum") {
  print("The species is northern red oak.")
} else {
  print("The species' common name was not found.")
}

#> [1] "The species' common name was not found."

The if else statement can be extended with an else if to consider multiple conditions. The code within the first TRUE else if (condition) is run and evaluation of any subsequent else if statements is skipped.

Below we use else if to match spp with different Latin names before printing the species’ common name was not found.

spp <- "Populus tremuloides"

if (spp == "Quercus rubrum") {
  print("The species is northern red oak.")
} else if (spp == "Acer saccharum") {
  print("The species is sugar maple.")
} else if (spp == "Populus tremuloides") {
  print("The species is quaking aspen.")
} else {
  print("The species' common name was not found.")
}

#> [1] "The species is quaking aspen."

Above, spp == "Quercus rubrum" is FALSE so the code next tests spp == "Acer saccharum" which is also FALSE, then it tests spp == "Populus tremuloides" which is TRUE so "The species is quaking aspen." is printed.

Next, let’s embed the if else code above in a function that prints the common name of a given species. We’ll call our function get_common().

get_common <- function(spp){
  
  if (spp == "Quercus rubrum") {
    print("The species is northern red oak.")
  } else if(spp == "Acer saccharum") {
    print("The species is sugar maple.")
  } else if(spp == "Populus tremuloides") {
    print("The species is quaking aspen.")
  } else {
    print("The species' common name was not found.")
  }
  
}

# Try it out.
get_common("Quercus rubrum")

#> [1] "The species is northern red oak."

get_common("Populus tremuloides")

#> [1] "The species is quaking aspen."

get_common("Pseudotsuga menziesii")

#> [1] "The species' common name was not found."

5.2.2 Vectorized `ifelse`

As noted previously, the if statement accepts a logical condition of length 1. So what happens if you want to condition on a longer vector of logical values? For these cases, base R offers ifelse() which is a vectorized function of the following form.

ifelse(condition, if true do something, if false do something else)

Here’s an example of ifelse() that determines if values in x are even or odd (%% is the modulus operator which computes the remainder of x / 2).

x <- 1:5
ifelse(x %% 2 == 0, "even", "odd")

#> [1] "odd"  "even" "odd"  "even" "odd"

We’ll work with this and related vectorized control functions in Section 7.6.

5.2.3 Loops

Loops are an important component of most programming languages, and R is no exception. That said, in R there are often more efficient and less code intensive approaches to performing the type of iterative operations that typically employ loops. We’ll cover a few of these more efficient approaches in Section 7. There are, however, several good reasons for learning about loops, they are: 1) convenient in some situations; 2) common in R and other languages; 3) instructive for understanding iterative operations and translating mathematical notation to code.

We provide a very brief introduction to while and for loops.

A while loop has the following form.

while (condition) {
  do something
}

As long as condition is TRUE, do something is iteratively evaluated. Once condition is FALSE, the loop stops. For example, the body of the while loop below prints the value for i, then increments i by 1, until i is greater than 3.

i <- 1

while(i <= 3) {
  print(i)
  i <- i + 1
}

#> [1] 1
#> [1] 2
#> [1] 3

A for loop has the following form.

for (variable in vector) {
  do something
}

Here’s a for loop example.

for(i in 1:3){
  print(i)
}

#> [1] 1
#> [1] 2
#> [1] 3

As shown using print(), for each successive iteration in the loop, the variable i takes the associated increment value in the vector 1:3.

This increment change in the variable makes for loops particularly useful for stepping through elements in a vector. As an example, let’s use a for loop to sum a numeric vector’s values. Below, x is the vector to be summed, total will hold the sum, and the for loop variable i indexes each successive element in x. Notice, how length(x) is used to get the correct number of elements in x

x <- c(0, 1, 2)

total <- 0

for (i in 1:length(x)) {
  total <- total + x[i]
}

total

#> [1] 3

Now let’s embed the for loop above in a function called my_sum().

my_sum <- function(x){

  total <- 0
  for(i in 1:length(x)){
    total <- total + x[i]
  }
  
  return(total)
}

# Try it out.
v <- c(0, 1, 2) 

my_sum(v)

#> [1] 3

5.3 Illustrations

5.3.1 Basal area

Basal area is the cross-sectional area of a tree at breast height (4.5 ft or 1.35 m above ground) and is calculated from the tree’s DBH measurement. Total basal area for a collection of trees, often expressed on a per unit area basis (e.g., square feet per acre or square meters per hectare), is used to characterize stand structure. Such basal area summaries are directly related to stand volume and, to some degree, density, which is the number of trees per unit area.

Given DBH measured in inches, basal area in square feet is \[\begin{equation} \frac{\pi}{144}\cdot \left(\frac{\text{DBH}}{2}\right)^2 = 0.005454\cdot \text{DBH}^2. \tag{5.1} \end{equation}\] Similarly, when DBH is measured in centimeters, basal area in square meters is \[\begin{equation} \frac{\pi}{10000}\cdot\left(\frac{\text{DBH}}{2}\right)^2 = 0.00007854\cdot \text{DBH}^2. \tag{5.2} \end{equation}\]

Looking at (5.1) and (5.2), a general formula for basal area is $c\cdot \text{DBH}^2$, where $c$ is often referred to as the ``foresters constant” and, depending on your measurement system, is either 0.005454 or 0.00007854.

Let’s write a function that takes DBH in inches or centimeters and returns basal area in square feet or square meters. We define basal_area() below with default arguments that assume DBH in inches and basal area to be returned in square feet. Within the function body, we include some input checks that also serve to let the user know accepted values for the dbh_units and ba_units arguments. After the input checks, an if statement is used to compute the appropriate constant $c$, then an else if statement computes the desired basal area ba, which is subsequently returned.

basal_area <- function(dbh, dbh_units = "in", ba_units = "sq_ft"){
  
  # Some input checks.
  stopifnot("dbh should be numeric." = 
              is.numeric(dbh),
            "dbh_units should be either 'in' or 'cm'." = 
              dbh_units %in% c("in", "cm"),
            "ba_units should be either 'sq_ft' or 'sq_m'." = 
              ba_units %in% c("sq_ft", "sq_m")
            )
  
  # Define c.
  if(ba_units == "sq_ft"){
    c <- pi / (4 * 144)
  } else{ # ba_units == "sq_m"
    c <- pi / (4 * 10000)
  }
  
  # Compute ba and return.
  if (dbh_units == "in" & ba_units == "sq_ft") {
    ba <- c * dbh^2
  } else if (dbh_units == "in" & ba_units == "sq_m") {
    ba <- c * (2.54 * dbh)^2 
  } else if (dbh_units == "cm" & ba_units == "sq_m") {
    ba <- c * dbh^2
  } else{ # dbh_units == "cm" & ba_units == "sq_ft"
    ba <- c * (dbh / 2.54)^2
  }
    
  return(ba)
}

Let’s try our basal area function.

# Basal area in square feet from 10 inches DBH.
basal_area(dbh = 10)

#> [1] 0.54542

# Basal area in square meters from 10 inches DBH.
basal_area(dbh = 10, ba_units = "sq_m")

#> [1] 0.050671

# Basal area in square meters from 10 centimeters DBH.
basal_area(dbh = 10, dbh_units = "cm", ba_units = "sq_m")

#> [1] 0.007854

# Basal area in square feet from 10 centimeters DBH.
basal_area(dbh = 10, dbh_units = "cm")

#> [1] 0.08454

# Some initial error checking.
basal_area(dbh = 10, dbh_units = "ft")

#> Error in basal_area(dbh = 10, dbh_units = "ft"): dbh_units should
be either 'in' or 'cm'.

Looks pretty good. The first four calls to basal_area() try the various input combinations. The last call to basal_area() tests some error handling by wrongly setting dbh_units = "ft", which stops the function and throws an error with an informative message.

Because if and else if in basal_area() involve only dbh_units and ba_units, our function can work on a vector of multiple DBH values as demonstrated by the code below.

# Vector of DBH values.
dbh <- c(5, 15, 10, NA, 5)

basal_area(dbh = dbh)

#> [1] 0.13635 1.22718 0.54542      NA 0.13635

5.3.2 Aboveground biomass allometric equation

Aboveground biomass (AGB) is directly related to carbon stored in woody material and hence a key variable used in forest carbon monitoring programs. A tree’s total AGB is related to its DBH and species. Jenkins et al. (2003) provide the following allometric equation to estimate AGB given a tree’s species and DBH. \[\begin{equation} \text{AGB} = \exp\left(\beta_0 + \beta_1 \ln(\text{DBH})\right), \tag{5.3} \end{equation}\] where

$\text{AGB}$ is total aboveground biomass (kg dry weight) for trees 2.5 (cm) DBH and larger,
$\text{DBH}$ is diameter at breast height in centimeters,
$\exp$ is the exponential function,
$\ln$ is the natural logarithm, i.e., inverse function of the exponential function,
$\beta_0$ and $\beta_1$ are species-specific regression coefficients with values given in Jenkins et al. (2003) Table 4.

Let’s implement (5.3) in R. Our function will return AGB given a tree’s DBH and species.

To motivate this example, we consider some tree’s measured on one forest inventory plot in the Penobscot Experimental Forest (PEF). You can read more about the PEF in Section 1.2.4. We’ll work with a set of 99 trees measured in PEF management unit 10 on plot 11 in 2010. Below, these tree data are read into the pef_trees data frame and the data frame’s structure and first few column values are printed using str().

pef_trees <- read.csv("datasets/PEF/PEF_mu_10_plot_11.csv")

str(pef_trees) # Take a look.

#> 'data.frame':    99 obs. of  2 variables:
#>  $ common_name: chr  "red maple" "red maple" "balsam fir" "balsam fir" ...
#>  $ DBH_in     : num  0.7 0.5 1.8 6.4 1.7 ...

To give us a better sense of these data, the next bit of code prints the number of trees by species. The sort() function sorts the table() output from least to most abundant species. This table shows there are nine species present on the inventory plot and their respective stem counts.

sort(table(pef_trees$common))

#> 
#>  red or black spruce       American beech 
#>                    1                    2 
#>          paper birch northern white-cedar 
#>                    2                    3 
#>      eastern hemlock            white ash 
#>                    5                    7 
#>         yellow birch            red maple 
#>                   11                   22 
#>           balsam fir 
#>                   46

There are some features we’ll need to accommodate in our function. Specifically, the allometric equation is parameterized for:

DBH in centimeters, so the PEF’s DBH values need to be converted from inches to centimeters;
trees of DBH 2.5 (cm) and larger, we’ll want to return NA for all trees less than 2.5 (cm);
species groups, so we’ll need to pass in species and have the function apply the correct set of regression coefficients.

For this example, we tailor the function to those species in the current dataset. If the function was called for a species not in the current dataset then we’d like an informative error to be thrown.

Table 5.1 provides regression coefficient values from Jenkins et al. (2003) for those species in pef_trees. These coefficient values will be used in our function to compute species-specific AGB.

TABLE 5.1: Regression coefficient values used in allometric equation ((5.3)) for tree species present in PEF example data.
Species	$\beta_0$	$\beta_1$
American beech	-2.0127	2.4342
balsam fir	-2.5384	2.4814
eastern hemlock	-2.5384	2.4814
northern white-cedar	-2.0336	2.2592
paper birch	-1.9123	2.3651
red maple	-1.9123	2.3651
red or black spruce	-2.0773	2.3323
white ash	-2.48	2.4835
yellow birch	-1.923	2.3651

Given the desired functionality, we define agb() below. The general form of agb() with all its if and if else statements is a common approach to writing code for allometric equations with coefficient values that differ among species or other characteristics.

agb <- function(dbh, species, dbh_units = "cm") {
  
  # Some input checks.
  stopifnot("dbh should be a numeric vector of length 1." = 
              is.numeric(dbh) & length(dbh) == 1,
            "species should be a character of length 1." = 
              is.character(species) & length(dbh) == 1,
            "dbh_units should be either 'in' or 'cm'." = 
              dbh_units %in% c("in", "cm"))
  
  # Convert DBH to cm if needed.
  if(dbh_units == "in"){
    dbh <- dbh * 2.54
  }
  
  # Assign the species-specific regression coefficients.
  if (dbh < 2.5) {
      beta_0 <- NA
      beta_1 <- NA
  } else if (species == "American beech") {
      beta_0 <- -2.0127
      beta_1 <- 2.4342
  } else if (species %in% c("balsam fir", "eastern hemlock")) {
      beta_0 <- -2.5384
      beta_1 <- 2.4814
  } else if (species == "northern white-cedar") {
      beta_0 <- -2.0336
      beta_1 <- 2.2592
  } else if (species %in% 
             c('paper birch', 'red maple', 'yellow birch')) {
      beta_0 <- -1.9123
      beta_1 <- 2.3651
  } else if (species == "red or black spruce") {
      beta_0 <- -2.0773
      beta_1 <- 2.3323
  } else if (species == 'white ash') {
      beta_0 <- -2.48
      beta_1 <- 2.4835
  } else {
      stop(paste0("No allometric equation specified for ", species, "."))
  }
  
  # Apply the regression coefficients and return biomass.
  agb_km <- exp(beta_0 + beta_1 * log(dbh))
  
  return(agb_km)
}

Let’s step through some features and coding decisions in agb().

Inputs are dbh, species, and dbh_units with a default value of "cm".
Given the way in which the function assigns values to beta_0 and beta_1 within the if else statement, it cannot handle more than one dbh and species value pair at a time (i.e., dbh and species must be vectors of length 1). This vector length constraint and other input requirements are checked in the stopifnot().
If dbh_units == "in" then dbh is transformed to centimeters using an if statement.
The if else statement used to assign species-specific coefficient values to beta_0 and beta_1 (using values in Table 5.1), also serves to assign NA to trees with DBH < 2.5 (cm) and throw the desired error and informative message if the species name is not found.

Okay, let’s compute some AGB.

agb(5, "balsam fir")

#> [1] 4.2856

agb(5, "balsam fir", dbh_units = "in")

#> [1] 43.308

agb(1, "balsam fir") # DBH < 2.5 cm, should be NA.

#> [1] NA

agb(3, "quaking aspen") # This species is not defined in agb().

#> Error in agb(3, "quaking aspen"): No allometric equation specified
for quaking aspen.

In practice, we’ve seen code written to implement allometric equations, like agb(), be hundreds of lines long, with seemingly endless control structures, many of which are nested (e.g., if statements within if statements). Such approaches have substantial disadvantages, for example they:

require a lot of code and hence more opportunities to make mistakes;
are difficult to maintain and fragile, meaning small changes can break something in the control statement sequence;
are challenging to read, following the flow of multiple branching logical statements is painful;
need to be embedded in even more code when applied to data (see continuation of the example below);
are computationally inefficient compared with other preferred approach (see, e.g., Section 7.14.3).

With all that said, let’s continue on and use agb() to compute AGB for pef_trees. In the code below, we define AGB_kg as a vector of zeros the same length as the number of trees in pef_trees. AGB_kg will eventually hold AGB for each tree computed using agb(). Then, using a for loop, we iterate over the pef_trees trees, passing each tree’s DBH and species into agb() and storing the result AGB in AGB_kg. Finally, we add a AGB_kg column to pef_trees.

AGB_kg <- rep(0, nrow(pef_trees)) # Empty vector to hold AGB.

# i indexes rows in pef_trees and elements in AGB_kg.
for(i in 1:nrow(pef_trees)) { 
  AGB_kg[i] <- agb(pef_trees$DBH_in[i], 
                   pef_trees$common_name[i], 
                   dbh_units = "in")
}

# Assign AGB_kg as a new column in pef_trees.
pef_trees$AGB_kg <- AGB_kg

# Take a look.
str(pef_trees)

#> 'data.frame':    99 obs. of  3 variables:
#>  $ common_name: chr  "red maple" "red maple" "balsam fir" "balsam fir" ...
#>  $ DBH_in     : num  0.7 0.5 1.8 6.4 1.7 ...
#>  $ AGB_kg     : num  NA NA 3.43 79.91 3.8 ...

We’ll return to allometric equations in Section 7.14.3, where we present an alternative and preferred approach to implementation.

5.4 Keeping track of functions

As introduced in Section 3, scripts facilitate efficient coding and provide a reproducible data management and analysis record. To this end, it’s advantageous to also keep track of functions you’ve written. Functions you write for use in a script should be placed at the top of the script. Alternatively, if you have functions that you routinely use, they can be collected into a single .R file then “sourced” via source().

For example, we’ve written lots of functions to do various tasks, e.g., compute basal area given DBH or convert various measurements from English to metric units. We keep these functions in a file called “utils.R”. As the file name suggests, these are utility functions used in near daily scripting. Instead of copying and pasting these functions into each new script, we simply load all the utility functions at once by placing source("utils.R") at the top of the script. Like any file read into R, “utils.R” needs to be in the working directory or you can provide a path to the file’s location on your computer.

Keeping track of the functions you write over time in a “utils.R”, or script name of your choosing, is a great way to build efficiency.

5.5 Summary

Choices and loops are the two primary structures used to control flow in a program. Choices, implanted using if and if else, allow different code to be run conditional on the result of a logical test. These logical tests typically involve comparison and/or logical operators introduced in Section 4.7. While not covered in this chapter, switch is an additional control structure, see, e.g., ?switch, that is occasionally useful. We’ll return to control structures later in Section 7.6, where we consider vectorized versions of if else and switch. Loops, like while and for, allow you to iterate over a piece of code until some condition is met or desired number of iterations is reached. Loops can be very useful, however, there are many scenarios where desired iterative operation can be done using other, more efficient, functions.

As noted when introducing this chapter, our focus was narrow and high-level covering only topics needed in subsequent chapters. Our treatment of functions touched on the basics, which is enough for you to start writing your own simple functions and dissecting more complex functions written by others. For a deeper understanding of functional programming in R, we encourage you to study topics on scope and environments covered well in Advanced R by Wickham (2019) and other freely available resources you can find online. These references will also guild you if your eventual aim is to write robust software for public consumption.

5.6 Exercises

TABLE 5.2: Tree measurements used for Exercises 5.1.
Tree id	Species	DBH (in)	Logs (16 foot)
1	red oak	38	3.5
2	sugar maple	10	2
3	sugar maple	17	1.5
4	bitternut hickory	22	3
5	tulip poplar	32	3
6	red oak	15	3
7	white oak	31	3
8	sugar maple	40	2.5

Exercises 5.1 - 5.6 use tree measurements recorded in Table 5.2.

Exercise 5.1 Create a data frame called plot_trees that holds data given in Table 5.2. Your data frame should match the following when printed.

plot_trees

#>   tree_id               spp dbh_in logs_16ft
#> 1       1           red oak     38       3.5
#> 2       2       sugar maple     10       2.0
#> 3       3       sugar maple     17       1.5
#> 4       4 bitternut hickory     22       3.0
#> 5       5      tulip poplar     32       3.0
#> 6       6           red oak     15       3.0
#> 7       7         white oak     31       3.0
#> 8       8       sugar maple     40       2.5

Exercise 5.2 Write a function to compute basal area in square feet given DBH in inches. Apply your function and add your resulting basal area as a new column in plot_trees. We called our new basal area column BA_sqft. Your updated plot_trees should look similar to ours below.³²

#>   tree_id               spp dbh_in logs_16ft BA_sqft
#> 1       1           red oak     38       3.5 7.87580
#> 2       2       sugar maple     10       2.0 0.54542
#> 3       3       sugar maple     17       1.5 1.57625
#> 4       4 bitternut hickory     22       3.0 2.63981
#> 5       5      tulip poplar     32       3.0 5.58505
#> 6       6           red oak     15       3.0 1.22718
#> 7       7         white oak     31       3.0 5.24144
#> 8       8       sugar maple     40       2.5 8.72665

The following information is used for Exercises 5.3 - ??. Foresters use mathematical formulas to estimate the amount of lumber in logs. These formulas are referred to as log rules and typically take into account average taper in the tree’s stem and wood waste from saw kerf, i.e., wood lost due to the blade width. Log taper is often species specific, hence foresters use form classes to describe taper characteristics of species groups. In the eastern US, for example, form class 78 is often used to characterize hardwood species taper.³³

International 1/4-inch (Int-1/4), Scribner, and Doyel are the names of common log rules used in the US. Wiant (1986) published the formulas below that estimate board foot volume given DBH (in) and number of 16 foot logs (L) for form class 78 under Int-1/4, Scribner and Doyel log rules.³⁴ The subscript on each estimated volume variable () below indicates the log rule.

Int-1/4 log rule: \[\begin{align} \text{Vol}_I = &(-13.35212 + 9.58615 \text{L} + 1.52968 \text{L}^2) +\nonumber\\ &(1.79620 - 2.59995 \text{L} - 0.27465 \text{L}^2)\text{DBH} +\nonumber\\ &(0.04482 + 0.45997 \text{L} - 0.00961 \text{L}^2)\text{DBH}^2. \tag{5.4} \end{align}\]

Scribner log rule: \[\begin{align} \text{Vol}_S = &(-22.50365 + 17.53508 \text{L} - 0.59242 \text{L}^2) +\nonumber\\ &(3.02988 - 4.34381 \text{L} - 0.02302 \text{L}^2)\text{DBH} +\nonumber\\ &(-0.01969 + 0.51593 \text{L} - 0.02035 \text{L}^2)\text{DBH}^2. \tag{5.5} \end{align}\]

Doyle log rule: \[\begin{align} \text{Vol}_D = &(-29.37337 + 41.51275 \text{L} + 0.55743 \text{L}^2) +\nonumber\\ &(2.78043 - 8.77272 \text{L} - 0.04516 \text{L}^2)\text{DBH} +\nonumber\\ &(0.04177 + 0.59042 \text{L} - 0.01578 \text{L}^2)\text{DBH}^2. \tag{5.6} \end{align}\]

Exercise 5.3 Write a function called vol_I() that implements (5.4). Use arguments DBH_in and L_16ft for DBH (in) and number of logs (16-ft), respectively. Consistent with (5.4), your function should return board foot volume. Your function should accept vectors for DBH_in and L_16ft and return a corresponding volume vector.

Below are some tests using our vol_I() that you can use to test your implementation.

vol_I(DBH_in = 12, L_16ft = 1)

#> [1] 56.129

vol_I(DBH_in = 24, L_16ft = 2.5)

#> [1] 519.67

vol_I(DBH_in = c(12, 24), L_16ft = c(1, 2.5))

#> [1]  56.129 519.665

Exercise 5.4 Using vol_I() you wrote in Exercise 5.3, add a column called vol_I_bdft to plot_trees. You should be able to pass plot_tree$dbh_in and plot_trees$logs_16ft into vol_I() to compute vol_I_16ft values. Your result should match ours below (we omitted the spp columns so our output fits nicely on the page).

#>   tree_id dbh_in logs_16ft BA_sqft vol_I_bdft
#> 1       1     38       3.5 7.87580   1852.968
#> 2       2     10       2.0 0.54542     59.548
#> 3       3     17       1.5 1.57625    164.301
#> 4       4     22       3.0 2.63981    490.421
#> 5       5     32       3.0 5.58505   1128.315
#> 6       6     15       3.0 1.22718    203.145
#> 7       7     31       3.0 5.24144   1052.482
#> 8       8     40       2.5 8.72665   1578.856

Exercise 5.5 Extend the function you wrote for Exercise 5.4 to allow the user to choose between the three log rules given in (5.4), (5.5), and (5.6). Call your new function vol_bdft(). In addition to DBH_in and L_16ft arguments, your new function should have an argument called log_rule that takes values "Int-1/4", "Scribner", or "Doyle" that are used to control which log rule is computed. Make the default value for log_rule be "Int-1/4". Have your function throw an informative error if the log_rule value provided by the user does not match one of the accepted values.

Below are some tests using our vol_bdft() that you can use to test your implementation.

vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5)) # Int-1/4 default.

#> [1]  56.129 519.665

vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Scribner")

#> [1]  46.924 484.605

vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Doyle")

#> [1]  29.01 428.94

# Check log_rule error handling.
vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Int-1/8")

#> Error in vol_bdft(DBH_in = c(12, 24), L_16ft = c(1, 2.5), log_rule = "Int-1/8"): log_rule should be either 'Int-1/4', 
#>             'Scribner', or 'Doyle'.

Exercise 5.6 Using vol_bdft() from Exercise 5.5, add estimated board foot volume using the tree log rules to plot_trees. Your result should match ours below (we included only tree_id and the volume columns so our output fits nicely on the page).

#>   tree_id vol_I_bdft vol_S_bdft vol_D_bdft
#> 1       1   1852.968   1777.412   1805.779
#> 2       2     59.548     45.776     22.374
#> 3       3    164.301    147.059    113.743
#> 4       4    490.421    451.134    382.179
#> 5       5   1128.315   1075.319   1045.083
#> 6       6    203.145    174.253    116.996
#> 7       7   1052.482   1000.796    963.753
#> 8       8   1578.856   1527.407   1571.260

References

Burkhart, H. E., T. E. Avery, and B. P. Bullock. 2018. Forest Measurements: Sixth Edition. Waveland Press. https://books.google.com/books?id=thxpDwAAQBAJ.

———. 2003. “National-Scale Biomass Estimators for United States Tree Species.” Forest Science 49 (1): 12–35.

Wiant, Harry V. 1986. “Computer Corner: Formulas for Mesavage and Girard’s Volume Tables.” Northern Journal of Applied Forestry 3 (3): 124–24. https://doi.org/10.1093/njaf/3.3.124.

———. 2019. Advanced r. CRC press.

Pseudocode is notation resembling a simplified programming language and used to convey code features but doesn’t actually conform to syntax requirements and hence doesn’t run.↩︎
Our basal area function uses pi/(4*144). If you wrote your function using the forester’s constant 0.005454 then your basal area will differ slightly from ours beyond the second or third decimal place.↩︎
The form class number, in this case 78, is form quotient and is the ratio, expressed as a percent, of some upper-stem diameter to DBH, see, e.g., Burkhart, Avery, and Bullock (2018). A larger form quotient means less stem taper, i.e., more stem volume.↩︎
One board foot as dimension $1\times 12\times 12$ (in) with volume 144 (in$^3$).↩︎

Want to know when the book is for sale? Enter your email so we can let you know.