Chapter 10 Preliminary definitions and concepts
Translating mathematical expressions, found in publications and reference material, into code is a fundamental data analyst skill. This chapter defines terminology and mathematical notation used throughout the remainder of the book to introduce statistical concepts and how they can be translated into efficient analysis workflows.
10.1 Estimating population parameters
A central aim of data analysis, as covered in this book, is to infer (meaning to deduce or learn about) population characteristics of interest from a sample taken from the population. The population is an aggregate of units that belong to some defined group. The units are individuals or objects that collectively comprise the population. In some cases we’re able to observe (meaning measure) all units in the population. Observing all population units is a census and means we can describe the population characteristic of interest in its entirety and without error (assuming no measurement error). Recall, the Harvard Forest dataset described in Section 1.2.5 is a census of the population with units defined as all woody vegetation with a DBH of 1 cm and larger. In most practical cases, however, it’s too time consuming and/or expensive to observe all units in the population. In such cases, we settle for observing a sample, which means a subset, of population units. If selected appropriately, this sample might ultimately provide a useful estimate of the population characteristic of interest.
We use the word parameter to mean a summary measure of a population characteristic, for example, a parameter could be an average, total, or proportion. Similarly, we use the word statistic to mean a summary measure of a sample characteristic and, like a parameter, a statistic could be an average, total, or proportion. Statistics are used to infer something about the population parameter of interest.
A characteristic that might vary from unit to unit is called a variable. Our observations, or measurements, of a variable in a sample of population units serve to inform statistics which service as estimates for the population parameter of interest. So, the typical data analysis progression is:
- define the population, population units, and population parameters of interest, then
- measure one or more variables on a sample of population units, compute one or more statistics that serve as estimates of the population parameters. Typically, these statistics provide our best estimate of the parameter and quantify how certain we are in the estimate.
Here’s an example to illustrate this new vocabulary. Say you want to know the total merchantable timber volume on a 50-acre property, where you define “merchantable timber volume” as volume from live trees 10.0 inches DBH or larger with at least one 16 foot sawlog. The population comprises all trees that meet your definition of merchantable timber volume on the 50-acre property. The population parameter of interest is total merchantable timber volume. The population units are individual trees that meet your merchantable timber volume definition. The variable measured on each unit (i.e., tree) is its volume (although in practice we typically measure DBH and height then compute volume using an allometric equation like those briefly discussed in Section 1.2.1 and discussed in detail in Chapter ??). Alternatively, and more typical in practice, we might define the population units as some fixed-area field plot. For example, we might divide the 50-acre property into 250 1/5-th acre non-overlapping plots. In this case the variable is the cumulative volume of all trees that meet the population definition for merchantable timber on each 1/5-th acre plot. Regardless of whether our population unit is an individual tree or group of trees on a plot, we’ll likely not be able to census the property due to time or effort constraints, but rather we’ll select a sample of population units for which we’ll measure volume. Then, given the sample measurements, we compute a statistic that serves as our estimate for total merchantable timber volume on the property.
10.1.1 Types of variables
It’s useful to connect the R data types introduced in Section 4.2.1 with those of variables we encounter in subsequent chapters. Following Figure 10.1, a variable is initially classified as either quantitative or qualitative.
A quantitative variable has values that give a notion of magnitude; that is, the values are a numerical measure. This numerical measure is either continuous or discrete. A quantitative continuous variable can take an infinite (not countable) number of possible values, i.e., an infinitely fine increment from one value to the next. Said differently, a continuous variable has an arbitrary number of decimal places for a given value. Examples could include height, weight, and volume. In comparison, a quantitative discrete variable can take a finite (countable) number of possible values when moving from one value to the next. The values are often (but not always) integers. Examples include age in whole years, number of trees, and number of fire events.
A qualitative variable (also referred to as a categorical variable) has values that represent different categories. Qualitative variables are either nominal or ordinal. A qualitative nominal variable takes values for which no ordering is possible or implied in the categories. For example, the variable species is nominal because there is no inherent or natural order in the species names. Similarly, sex coded as male or female is nominal because there is no apparent ordering. In contrast, the categories of a qualitative ordinal variable have some natural ordering. For example, the variable tree canopy position can take values suppressed, intermediate, co-dominant, or dominant, where these categories themselves imply an ordering. Further examples are disturbance severity code with categories low, medium, and high, or perhaps forest or ecosystem succession stages. Again, in these examples, there is a natural order to the categories.
10.2 Tools of the trade: Review of notation
Frank Freese was a prolific forest biometrician who worked for the USFS Southern Experiment Station in Asheville, North Carolina, and Forest Products Lab in Madison, Wisconsin. He wrote several primers on applied statistics for practicing foresters. Despite being somewhat dated, Frank’s works are clearly written, fun to read, and offer lots of worked examples that make them enduring resources for those interested in an efficient and accessible introduction to classical statistics and sampling methods. In one of his works entitled Elementary Forest Sampling (Freese 1962), he includes a section called “Tools of the Trade: Subscripts, Summations, and Brackets” that provides a refresher (or perhaps first time look) at more frequently used mathematical expressions in applied statistics and sampling. Following Frank’s lead, we offer a similar section that covers common mathematical notation and its translation to R code.
Throughout the remainder of this book, Greek letters, e.g., \(\mu\), \(\sigma\), \(\alpha\), \(\beta\), \(\eta\)96, represent population parameters, and Roman letters, e.g., \(x\), \(y\), \(z\), represent variables. For example, we might say \(x=5\), which is a scalar (i.e., single value) or \(y=(1,4,5)\) which is a vector of three values. A vector is a collection of one or more values organized such that the values can be referenced using an index—analogous to the R vector data object introduced in Section 4.2. A sample statistic, which is any quantity computed from values in a sample used to estimate population parameters, will be represented using a Roman letter.
10.2.1 Mathematical operators, subscripts, and nested data
You already know many mathematical operators, addition +
, subtraction -
, multiplication *
, division \
, exponentiation ^
, and assignment =
. Like the comparison and logical operators introduced in Section 4.7, mathematical operators are applied to operands. For example, in the addition of 1+2 there are two operands: the left operand is 1 and the right operand is 2. An operator is binary if it has two operands and unary if it has a single operand. Although you might not have thought about this concept in these terms, the familiar -
operator has a unary and a binary form. The unary negation -
form reverses the sign of the operand on its right, and the binary form subtracts the right operand from the left operand.
Subscripts appear as one or more letters to the lower right side of a variable.97 For example, \(x_{i}\) is an example of a variable \(x\) with subscript \(i\). The subscript letter is a placeholder for an integer value used to index elements in a vector. Say \(x=(4,1,5,2,8)\), that is, the variable \(x\) is a vector of length 5 with values 4, 1, 5, 2, and 8. An \(i = 1\) means to reference or retrieve the vector value at position or index 1, so \(x_i\) equals 4. Similarly, for example, when \(i = 2\) then \(x_i\) equals 1, when \(i = 3\) then \(x_i\) equals 5, and so on.
A repeating theme we’ll see later when computing sample statistics is summation of various quantities. Following from the previous example, the sum of the vector \(x\) could be written as \[\begin{equation*} (x_1 + x_2 + x_3 + x_4 + x_5). \end{equation*}\] A more compact way to write this summation is \[\begin{equation*} (x_1 + x_2 + \ldots + x_5), \end{equation*}\] where the \(\ldots\), which is an ellipsis, means do the same thing for \(x_3\) and \(x_4\). An even more compact way to express this summation is \[\begin{equation*} \sum^5_{i=1}x_i, \end{equation*}\] where \(\sum\) is a symbol that means add a \(+\) between the values of \(x_i\) each time \(i\) changes. The \(i=1\) below the summation symbol says subscript \(i\) starts at value 1 and increases by 1 until it equals the number above the summation symbol, which in this case is 5. In some books, you will see the number on top of the summation symbol is omitted, which should be taken to mean increment the subscript to the number of elements in the vector being summed.
Very often we use more than one subscript to reference values of a variable. For example, from Section 1.2.1, recall the FEF dataset comprises 88 trees sampled on 17 plots across two watersheds. More specifically, 44 trees were measured across 8 plots in watershed 1, and 44 trees were measured across 9 plots in watershed 2. We’ll see later that it’s helpful to use a separate subscript to index each level in these kinds of nested datasets.
For now let’s just focus on the data in one of the two watersheds. Let \(x\) represent stem biomass and use subscripts \(i\) for tree and \(j\) for plot. Using this notation, we can uniquely identify any tree biomass using the subscript \(x_{i,j}\). Translating this notation into words, we might say \(x_{i,j}\) is the biomass of the \(i\)-th tree in the \(j\)-th plot. Now, using these subscripts, say we want the total biomass of all trees across all plots, which can be written \[\begin{equation*} \sum^{m}_{j=1}\sum^{n_j}_{i=1}x_{i,j}, \tag{10.1} \end{equation*}\] where \(m\) is the number of plots and \(n_j\) is the number of trees in plot \(j\) (notice the subscript on \(n\) is necessary because there might be a different number of trees on each plot and the the \(j\) says that \(n_j\) is plot specific). The expanded equivalent of this summation is \[\begin{align*} x_{1,1}&+x_{2,1}+\ldots +x_{n_1,1}\nonumber\\ +&x_{1,2}+x_{2,2}+\ldots +x_{n_2,2}\nonumber\\ &\quad\quad\quad\quad\quad\vdots\nonumber\\ +&x_{1,j}+x_{2,j}+\ldots +x_{n_j,j}\nonumber\\ &\quad\quad\quad\quad\quad\vdots\nonumber\\ +&x_{1,m}+x_{2,m}+\ldots +x_{n_m,m}, \tag{10.2} \end{align*}\] where the \(\vdots\) indicates that each omitted row in the expression looks the same as the first two except that they increment the \(j\) index until it reaches \(m\).
Plot (\(j\)) | Tree (\(i\)) | \(x\) | \(y\) |
---|---|---|---|
1 | 1 | 1.2 | 3 |
1 | 2 | 2.4 | 4 |
2 | 1 | 0.4 | 2 |
2 | 2 | 6.3 | 6 |
2 | 3 | 2.2 | 5 |
Consider the example data given in Table 10.1. Using the same notation as the FEF biomass summation example above, these example data comprise \(m=2\) plots (indexed using \(j\)) with multiple trees measured in each plot (indexed using \(i\)). For plot \(j=1\) there are \(n_j=2\) trees measured and for plot \(j=2\) there are \(n_j=3\) trees measured. The columns \(x\) and \(y\) represent two variables measured on each tree.
Using the indexing described above, the sum of the \(x\) values in Table 10.1 is written as \[\begin{align*} \sum^{m}_{j=1}\sum^{n_j}_{i=1}x_{i,j}&=x_{1,1}+x_{2,1}+x_{1,2}+x_{2,2}+x_{3,2}\\ &=1.2+2.4+0.4+6.3+2.2\\ &=12.5. \end{align*}\]
This subscript notation extends in a straightforward way to accommodate any number of different data levels combined with any mathematical operator (we focus on the summation operator here because it’s used most frequently in the methods considered in this book). As an example of extending this notation, consider the watershed level of the FEF data and index watershed using the subscript \(k\) and use \(h\) to represent the number of watersheds. Tree biomass measurements \(x\) can now be indexed as the \(i\)-th tree, within the \(j\)-th plot, within the \(k\)-th watershed. The total biomass is now expressed as \[\begin{equation} \sum^{h}_{k=1}\sum^{m_k}_{j=1}\sum^{n_j}_{i=1}x_{i,j,k}. \tag{10.3} \end{equation}\] Note, in the summation above we added the \(k\) subscript to \(m\) to acknowledge that there might be a different number of plots within each watershed.
A common characteristic of datasets we encounter is observations are nested in one or more levels. The FEF data are a perfect example, tree measurements (observations) are nested in plots, plots are nested in watersheds, and the watersheds are nested in the entire FEF. The Elk County inventory data in Section 1.2.2, have tree measurements (observations) nested in plots, which are nested in the forested property. The PEF data in Section 1.2.4, have tree measurements (observations) nested in year, nested in inventory plot, nested in management unit, and the management units are nested in the entire PEF. The FACE data in Section 1.2.3 have a complex nesting structure, with aspen tree diameter measurements (observations) nested in clone, nested in year, nested in experimental replicate, nested in treatment. As illustrated in (10.3), subscripts provide a useful notation to describe how observations are nested at different levels and the order in which different mathematical operators apply to the observations. This notion of nesting becomes very important in the programming and analysis chapters, where we apply different estimators and seek data summaries at various data levels.
As demonstrated in subsequent chapters, the dplyr
group_by()
and summarize()
workflows, along with other tidyverse functions, are perfectly suited to efficiently analyzing the kind of nested data encounter in practice.
10.2.2 Order of operations
Perhaps at one time you learned the acronym PEMDAS for Parenthesis, Exponents, Multiplication, Division, Addition, and Subtraction, along with its handy mnemonic Please Excuse My Dear Aunt Sally. Given two or more operations in a single expression, the order of the letters in PEMDAS tells us what to calculate first, second, third, and so on, until the calculation is complete. Confusing the order of operations is a common coding error when implementing the estimators considered in subsequent chapters; it’s time well invested reviewing and practicing these foundational rules.
Consider the summation of \(n\) values of a variable \(x\) squared, expressed as \[\begin{equation} \sum^n_{i=1}x_i^2. \tag{10.4} \end{equation}\] The two operations in (10.4) are raising each \(x_i\) to the exponent of 2 (i.e., squared) and summing the \(n\) resulting values of \(x_i^2\). Here our handy acronym PEMDAS reminds us to apply exponents before summation, so we first square each \(x_i\) then sum the resulting \(n\) values. Using the \(n\)=5 \(x\) measurements in Table 10.1, this expression equals \(1.2^2 + 2.4^2 + 0.4^2 + 6.3^2 + 2.2^2 = 51.89\).
Now let’s add some parentheses to (10.4) and see how the order of operations change, \[\begin{equation} \left(\sum^n_{i=1}x_i\right)^2. \tag{10.5} \end{equation}\] Looking at (10.5) and considering PEMDAS, the order of operations is to first sum the \(n\) \(x\) values because they are within the parenthesis then square the scalar result. A scalar is a mathematical term that means a single real number, which is what you’re left with after the summation in (10.5). Using the \(n = 5\) \(x\) measurements in Table 10.1, this expression equals \((1.2 + 2.4 + 0.4 + 6.3 + 2.2)^2 = 12.5^2 = 156.25\).
Following the data structure in Table 10.1, consider the expression \[\begin{equation} \sum^m_{j=1}\left(\sum^{n_j}_{i=1}x_{ij}\right)^2, \tag{10.6} \end{equation}\] where, recall, \(j\) indexes the \(m\) plots and \(i\) indexes the \(n_j\) trees on each plot. There are a few steps to evaluate (10.6). First, for each value of \(j\), sum over the \(n_j\) values of \(x_{i,j}\) then square the resulting scalar. Second, sum the \(m\) resulting scalars. Here’s the expanded form \[\begin{align} \sum^m_{j=1}\left(\sum^{n_j}_{i=1}x_{ij}\right)^2= &\left(x_{1,1}+x_{2,1}+\ldots+x_{n_1,1}\right)^2\nonumber\\ &+\left(x_{1,2}+x_{2,2}+\ldots+x_{n_2,2}\right)^2\nonumber\\ &\quad\quad\quad\quad\quad\vdots\nonumber\\ &+\left(x_{1,m}+x_{2,m}+\ldots+x_{n_m,m}\right)^2. \tag{10.7} \end{align}\] Using data in Table 10.1, expression (10.7) equals 92.17.
You might also encounter two or more variables with subscripts involved in an expression. For example \[\begin{equation} \sum^n_{i=1}x_iy_i = x_1y_1 + x_2y_2 + \ldots + x_ny_n, \tag{10.8} \end{equation}\] which, following PEMDAS, we recognize multiplication of \(x_i\) and \(y_i\) should come before the summation. Using measurements for all \(n = 5\) trees in Table 10.1, expression (10.8) equals 62.8.
Here’s another expression with two variables \[\begin{equation} \left(\sum^n_{i=1}x_i\right)\left(\sum^n_{i=1}y_i\right) = (x_1 + x_2 + \ldots + x_n)(y_1 + y_2 + \ldots + y_n), \tag{10.9} \end{equation}\] where, because of the parentheses, summation of the \(n\) \(x\) values and summation of the \(n\) \(y\) values come before the multiplication of the components involving \(x\) and \(y\). Using data in Table 10.1, expression (10.9) equals 250.
10.2.3 Tools of the trade using R
As noted at the beginning of Section 10.2, a vector is a collection of one or more organized values so that they can be referenced using an index. Given we’re talking about mathematical operations here, it is implied the vector elements are numeric. We saw in Section 2.1.1, a vector is created using the c()
function. We use this function again to make a vector x
using the \(x\) values in Table 10.1.
Recall from Section 4.2, we access vector elements using their position or index in square brackets starting with 1 for the first element and ending with length(x)
(equivalent to 5) for the last element. For example, x[1]
is 1.2, x[2]
is 2.4, and x[5]
is 2.2. Notice these indexes follow the mathematical subscript notation introduced above and are equivalent to \(x_1\), \(x_2\), and \(x_5\).
R applies unary operators to all elements in a vector. In the code below, we see how the unary negation operator changes the sign of all elements.
#> [1] -1.2 -2.4 -0.4 -6.3 -2.2
Note the code above only changes the sign of the printed values, not actually the values in x
. If you want to actually change the values in x
you need to assign the negated vector back to x
.
#> [1] -1.2 -2.4 -0.4 -6.3 -2.2
#> [1] 1.2 2.4 0.4 6.3 2.2
A binary operator between two vectors of the same length will take the left and right operands from each element along the vectors (i.e., called an elementwise operation). This is illustrated in the elementwise multiplication of \(x\) and \(y\) variables from Table 10.1.
#> [1] 3.6 9.6 0.8 37.8 11.0
Here’s a very important concept that we initially touched upon in Sections 4.2.3 and 4.7. For elementwise operations that involve two or more vectors, the vectors must be the same length. If R encounters two vectors of different lengths in a binary operation, it replicates (recycles) the smaller vector until it’s the same length as the longest vector, then it does the operation. For example, say we scale the vector x
by 5, i.e., 5*x
. Recall, R stores single values as a vector of length 1, so the 5 is replicated length(x)
times before the elementwise multiplication. Consider another example where the vector z
(defined in the code below) is shorter than x
. Given x
is the longer vector with length 5, z
is recycled until it matches the length of x
, so z
will be 1,2,1,2,1
. Also, because the length of x
is not a multiple of the original length of z
, the recycled z
is truncated and R kindly prints a warning, but still returns the result, as illustrated below.
#> Warning in x + z: longer object length is not a
#> multiple of shorter object length
#> [1] 2.2 4.4 1.4 8.3 3.2
R follows the PEMDAS order of operations. Take a moment and study R’s precedence of operators manual page accessed via ?Syntax
. The manual page lists operators in precedence groups from highest (applied first) to lowest (applied last). Notice the PEMDAS ordering is reflected in the manual page’s precedence ordering and where these mathematical operators occur relative to other R operators and syntax.
The next lines of code implement (10.4) and (10.5), respectively. Notice the sum()
function is equivalent to the \(\sum\) operator and exponentiation is done relative to the parentheses.
#> [1] 51.89
#> [1] 156.25
Using values from Table 10.1, we implement (10.8) and (10.9) below.
#> [1] 62.8
#> [1] 250
Above, and in and subsequent chapters, we often illustrate ideas using single vectors that represent measurements on a variable. However, in practice, these vectors are typically a column in a data frame. So let’s extend these order of operation examples above to variables in a data frame. We begin by moving the data in Table 10.1 to a data frame via the tibble()
function, which was introduced in Section 6.2.
plots <- tibble(plot_index = c(1, 1, 2, 2, 2),
tree_index = c(1, 2, 1, 2, 3),
x = c(1.2, 2.4, 0.4, 6.3, 2.2),
y = c(3, 4, 2, 6, 5))
Now let’s repeat the last few order of operations examples above using dplyr
functions covered in Chapter 7.98 The code below computes (10.4), (10.5), (10.8) and (10.9) all in one call to summarize()
.
plots %>%
summarize(`sum(x^2)` = sum(x^2),
`sum(x)^2` = sum(x)^2,
`sum(x*y)` = sum(x * y),
`sum(x)*sum(y)` = sum(x) * sum(y))
#> # A tibble: 1 × 4
#> `sum(x^2)` `sum(x)^2` `sum(x*y)` `sum(x)*sum(y)`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 51.9 156. 62.8 250
Going back to subscripts and nested data. We’ll often want to apply the same equation to subsets of data that are indexed by one or more qualitative variables used to indicate group membership. For example, let’s use the plot_index
column in the plots
data frame to apply equations to trees within each plot. This is simply done by usingplot_index
as a grouping variable as illustrated below.
plots %>%
group_by(plot_index) %>%
summarize(`sum(x^2)` = sum(x^2),
`sum(x)^2` = sum(x)^2,
`sum(x*y)` = sum(x * y),
`sum(x)*sum(y)` = sum(x) * sum(y))
#> # A tibble: 2 × 5
#> plot_index `sum(x^2)` `sum(x)^2` `sum(x*y)`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 7.2 13.0 13.2
#> 2 2 44.7 79.2 49.6
#> # ℹ 1 more variable: `sum(x)*sum(y)` <dbl>
Let’s go back and look at (10.6) and its expanded form in (10.7). Recall \(j\) indexes plot and \(i\) indexes tree within plot, \(m\) is the number of plots, and \(n_j\) is the number of trees within each plot. So, for data in Table 10.1, \(m\)=2 and for \(j\)=1 \(n_j\)=2 and for \(j\)=2 \(n_j\)=3. The double summation can be completed using the following, where within_plots
is the squared sum of x
for each plot, then across_plots
is the sum of the within_plots
values.
plots %>%
group_by(plot_index) %>%
summarize(within_plots = sum(x)^2) %>%
summarize(across_plots = sum(within_plots))
#> # A tibble: 1 × 1
#> across_plots
#> <dbl>
#> 1 92.2
As always, it can be instructive to run the code above incrementally and compare the output of each call to summarize()
to the right hand side of (10.7).
10.3 Summary
In this chapter, we covered essential data analysis vocabulary and statistical notation used throughout the remainder of the book. We began our discussion by defining concepts related to populations, samples, parameters, and variables. We describe the typical data analysis progression of defining the population, population units, and population parameters of interest and subsequently measuring one or more variables on a sample of population units which are used to estimate the population parameters of interest. We’ll see more concrete examples of this workflow beginning in Chapter 11 and in much more detail in Chapter 13.
Subsequently, we extended our discussion to quantitative and qualitative variables. Quantitative variables are numeric and defined as discrete (finite increment between values, e.g., an integer) or continuous (infinite increment between values, e.g., a number with decimals). Qualitative variables represent different categories and can be nominal (no order) or ordinal (ordered).
We then shifted our discussion to notation used throughout the remainder of the book. We use Greek letters (e.g., \(\mu\), \(\sigma\)) to represent population parameters and Roman letters (e.g., \(x\), \(y\)) to represent variables. We reviewed mathematical operators and discussed subscripts and their role in summations and nested data. We stressed the need to understand how subscripts reference variable values in a nested dataset, as nested data are extremely common in both forestry and environmental sciences. We finished by discussing the ever-important order of operations. While it might seem conceptually straightforward, we’ve spent precious hours (even days) searching code to find the problem was a simple error that could have been avoided by paying closer attention to the acronym PEMDAS.
With tools of the trade in hand, you are ready for Chapter 11, which develops the statistical foundations to use sample information to make inferences about populations.
10.4 Exercises
With the exception of Exercise 10.10, all exercises should be completed by hand, i.e., not using R. Use the information in Table 10.1 to complete Exercises 10.1 through 10.9.
Exercise 10.1 How many tree measurements of variable \(x\) are there for Plot 1 and Plot 2?
Exercise 10.2 Evaluate the expression \(\sum^{2}_{i=1}x_{ij}\), where \(j=1\), i.e. sum the \(x\) variable values for Plot 1.
Exercise 10.3 Evaluate the expression \(\sum^2_{j=1}\sum^{n_j}_{i=1}x_{ij}\), where \(n_j\) is the total number of trees in each plot.
Exercise 10.4 Evaluate the expression \(\sum^2_{j=1}\sum^{n_j}_{i=1}x_{ij}^2\), where \(n_j\) is the total number of trees in each plot.
Exercise 10.5 Evaluate the expression \(\sum^2_{j=1}(\sum^{n_j}_{i=1}x_{ij})^2\), where \(n_j\) is the total number of trees in each plot.
Exercise 10.6 Evaluate the expression \(\sum^2_{j=1}\sum^{n_j}_{i=1}x_{ij}y_{ij}\), where \(n_j\) is the total number of trees in each plot.
Exercise 10.7 Evaluate the expression \(\sum^2_{j=1}\sum^{n_j}_{i=1}x_{ij}^2y_{ij}\), where \(n_j\) is the total number of trees in each plot.
Exercise 10.8 Is \(\sum^n_{i=1}x_i^2\) equal to \(\left(\sum^n_{i=1}x_i\right)^2\)?
Exercise 10.9 Is \(\sum^n_{i=1}x_iy_i\) equal to \(\left(\sum^n_{i=1}x_i\right)\left(\sum^n_{i=1}y_i\right)\)?
Exercise 10.10 Evaluate this expression first by hand (with the aid of a calculator) then using R. \[ \frac{\left(\exp(14) + 10\right) \times \sqrt{5}}{\ln(4) - 5 \times 10^2}, \] where \(\exp\) is the exponential function and \(\ln\) is the natural log \(i.e., \log_e\).
References
These are the letters mu, sigma, alpha, beta, and eta, respectively.↩︎
Later on in Chapters 13 and ?? we’ll also use subscripts to index vectors of parameters or statistics.↩︎
In this example we’re using non-syntactic names in the
summarize()
and protecting them with the backticks (see Section (syntacticNames) for naming rules). Using non-syntactic names should be avoided, but it’s useful in this example to match the mathematical expression with its resulting value.↩︎