Effective Sample Distance 2: CIs for Specific Values

Confidence intervals for specific effective sampling distances

Advanced

Author

Trent L. McDonald

Published

September 24, 2025

Modified

September 26, 2025

Introduction

A previous tutorial, Effective Sample Distance 1, demonstrated confidence interval estimation for average effective sample distance (ESD). Confidence intervals for average ESD provide researchers with a reasonable ESD range and can be important for study planning or recognizing changes in detection probabilities. By definition, average ESD amalgomates ESD values over all covariate values. If, for example, detection differs between two observers, average ESD will fall somewhere between the individual ESDs.

Effective Sample Distance

Effective Sample Distance (ESD) is the distance at which observers miss nearby targets as often as they find targets farther away. It is the distance at which missed targets offset included targets, and can be thought of as the effective distance of 100% detection. ESD plays a key role in density estimation. ESD can be off-transect distance, which we call Effective Strip Width (ESW), or radial distance from an observer, which we call Effective Detection Radius (EDR).

While useful in some cases, average ESD is not interesting in other cases. In some cases researchers would like ESD associated with specific values of detection covariates. For example, researchers might compare effective distances among observers, or in shrub and forest habitat. Rdistance’s effectiveDistance() function makes ESD point estimation easy when its newdata parameter is a data frame containing prediction covariate values.

But, what about a reasonable range for ESD point estimates? Variation in ESD would be useful if, for example, researchers plan transects in both shrub and forest habitat and they desire an estimate of the proportion of their study area they “effectively” cover. In these cases, confidence intervals on specific ESD values are needed, and the question is, “How do I do that?” How do we place a confidence interval on ESD associated with specific covariate values?

Unfortunately, confidence interval estimation for specific ESD is not easy in Rdistance (this was pointed out to the author in a recent email from an Rdistance user). The good news is that such confidence intervals are feasible. The not-so-good news is that these confidence intervals require a customized bootstrap loop like the one in this tutorial (Section 3, Listing 2).

Purpose

Demonstrate bootstrap confidence interval estimation for ESD associated with specific covariate values.

Note

The customized bootstrap loop used in this tutorial is not built into Rdistance. To use this method, analysts must cut and paste code from Listing 2 and Listing 3 (Section 3) into a local text file (e.g., in RStudio, press Ctrl-Shift-n to create a fresh .R script, and paste code there) and either run it line-by-line (Ctrl-Enter) or all at once (Ctrl-Shift-s).

Note

At the time of writing, this tutorial used a development version of Rdistance, i.e., version 4.1.0 (2025-08-21). The current CRAN version (4.0.5; 2025-04-10) works equally well with only minor output differences.

1 Base Distance Function Estimation

This tutorial follows Part 1 by fitting a distance function to Rdistance’s Brewer’s sparrow data that includes observer.

library(Rdistance)

Loading required package: units

udunits database from C:/Users/trent/AppData/Local/R/win-library/4.5/units/share/udunits/udunits2.xml

Rdistance (v4.1.0)

data("sparrowDf") # pull pre-defined Rdistance data frame to workspace

tenHectares <- units::set_units(10, "ha") # pretend study area size
whi <- units::set_units(200, "m") # right cutoff

The following statements estimate and summarize the distance function.

dfuncFit <- sparrowDf |>
  dfuncEstim(dist ~ observer + groupsize(groupsize)
             , likelihood = "hazrate"
             , w.hi = whi) 

abundFit <- dfuncFit |> 
  abundEstim(area = tenHectares
             , ci = NULL)

summary(abundFit)

Call: dfuncEstim(data = sparrowDf, dist ~ observer +
   groupsize(groupsize), likelihood = "hazrate", w.hi = whi)
Coefficients:
              Estimate     SE         z           p(>|z|)      
(Intercept)    3.97923469  0.1289595  30.8564643  4.587821e-209
observerobs2   0.19457534  0.1687260   1.1532029   2.488271e-01
observerobs3   0.05063578  0.1408215   0.3595741   7.191656e-01
observerobs4  -0.37709138  0.1588333  -2.3741329   1.759022e-02
observerobs5  -0.10615066  0.1478300  -0.7180590   4.727209e-01
k              3.25594658  0.3646960   8.9278381   4.344146e-19

Message: Success; Asymptotic SE's
Function: HAZRATE  
Strip: 0 [m] to 200 [m] 
Average effective strip width (ESW): 67.04909 [m] (range 47.674 [m] to 82.80148 [m]) 
Average probability of detection: 0.3352454 (range 0.23837 to 0.4140074)
Scaling: g(0 [m]) = 1
Log likelihood: -1642.054 
AICc: 3296.35

     Surveyed Units: 36000 [m] 
   Individuals seen: 372 in 354 groups 
 Average group size: 1.050847 
   Group size range: 1 to 3 
Density in sampled area: 7.900855e-05 [1/m^2]
Abundance in 1e+05 [m^2] study area: 7.900855

For good measure, we usually plot the estimated function on top of a histogram of observed distances. The following code demonstrates that Rdistance’s plot method allows users to specify arbitrary histogram bins to, for example, hightlight the steeper sections where detection drops quickly.

binCuts <- c(0, seq(20,100,by=5), 150, 200)
binCols <- c("green4", rep("red4", 16), rep("dodgerblue",2))
plot(dfuncFit
      , newdata = data.frame(
          observer = c("obs1", "obs2", "obs3", "obs4", "obs5")
        )
      , nbins = binCuts
      , col = binCols
      , lwd.dfunc = 1
      , vertLines = F)

Figure 1: An estimated distance function relating Brewer’s sparrow observation distances (histogram bars) to individual observers (colored lines).

2 ESD For Specific Covariate Values

The ESD (an Effective Strip Width or ESW in this case) for each observer can be computed by calling effectiveDistance and specifying a newdata argument. The newdata argument is a data frame containing coavariate values at which to predict effective distances. Usage of newdata in Rdistance is identical in purpose as usage of newdata in base R’s lm, glm, gam, etc. (Note that we already used a newdata argument in the plot statement above to produce the five lines in Figure 1.)

The following code computes point estimates of effective distance for each observer.

Listing 1: Point estimates of ESD by observer.

observers <- data.frame(observer = c("obs1", "obs2", "obs3", "obs4", "obs5"))
observerESD <- effectiveDistance(dfuncFit
                               , newdata = observers)
observerESD <- cbind(observers, data.frame(esw = observerESD))
observerESD

  observer          esw
1     obs1 68.82074 [m]
2     obs2 82.80148 [m]
3     obs3 72.24249 [m]
4     obs4 47.67400 [m]
5     obs5 62.11996 [m]

ESD varies from 47.7 (obs4) to 82.8 (obs2) in this example. We compute confidence intervals for these numbers in Section 3 (Listing 2 and 3).

To illustrate ESD, the following plot shows observer-specific estimated distance functions and delineates each curve’s ESD.

Warning

Creating Figure 2, with its custom axis ticks and colors, is a bit complicated and is not really appropriate for R newbies. The main purpose for including this plot is to show the location of each curve’s ESD location and probability of detection at the ESD. Probability of detection at ESD is obtained by calling Rdistance’s predict method.

# Reorder by ESD for plotting
ord <- order(observerESD$esw)
observerESD <- observerESD[ord, ] 
observers <- observers[ord,,drop=FALSE]

# Plot distance functions and identify ESD
myCols <- rainbow(nrow(observerESD))
plot(dfuncFit
      , newdata = observers
      , plotBars = FALSE
      , col = myCols
      , vertLines = F
      , xaxt = "n"
      , xlab = "Effective Strip Width [m]")
axis(1
   , tick = TRUE
   , at=c(0, observerESD$esw, 200)
   , labels = FALSE
   )
axis(1
   , tick = FALSE
   , at=c(0, 200)
   )
dfuncHeights <- predict(dfuncFit
                     , type = "dfuncs"
                     , newdata = observers
                     , distances = observerESD$esw
                     ) # height of curves @ ESD
for( i in 1:nrow(observerESD) ){
  axis(1
     , at = round(observerESD$esw[i])
     , line = (i %% 2) - 0.5
     , tick = FALSE
     , col.axis = myCols[i]
     )
  lines(rep(observerESD$esw[i], 2)
      , c(0,dfuncHeights[i,i])
      , col = myCols[i]
  )
  lines(c(0, observerESD$esw[i])
      , rep(dfuncHeights[i,i], 2)
      , col = myCols[i]
  )
}

Figure 2: Detection functions associated with observers in the example, and the probability of detection at their effective sampling distances.

Aside

In Figure 2, the probability of detection (height of the curve) at ESD varies slightly among observers. Probability of detection at ESD varies slightly because ESD is the area under each curve from left truncation to right truncation distance (here, 0 to 200) and this interval is fixed. To balance the number of detections at distances less than the ESD and the number of detections beyond ESD, the probability of detection at ESD must change slightly because the interval 0 to 200 is does not change. If we could integrate from 0 to infinity, curve heights at ESD would be equal; but, this makes no sense because ESD does not exist (i.e., is itself infinite) in this case.

3 Confidence Intervals

Confidence interval construction for the ESD point estimates associated with each observer requires a custom bootstrap loop (Listing 2), followed by summarization (Listing 3). The general algorithm for the bootstrap loop is,

Resample transects with replacement until we obtain the same number as in the original data set.
Fit the distance function to bootrap data.
Compute ESD at the specific covariate values.
Store ESD values and repeat the above many times.
Compute CI’s by the percentile or bias-corrected method.

Note

The ability of compute CIs for specific ESDs will likely be added to a future Rdistance release, assuming interest remains high.

In practice, the custom bootstrap loop proceeds as follows (Readers are encouraged run Listing 2 one-line-at-a-time and inspect the intermediate results):

Make a data frame containing the covariate values of interest (newDf below).
Make a data frame containing the original ESD estimates and their associated covariate values (origEsts below).
Write a bootstrap function that will be applied to every bootstrap data set. There are many ways to write this function. The function implemented below (estESW) takes the bootstrap identifier, random row indices, a data frame containing the covariate values of interest, and the original fitted object. It computes and returns a data frame containing ESD for each observer.
Construct a data frame containing “bootstrap indices” (bsIndices below). This data frame randomly samples, with replacement, the row indices of the original data, which in turn determines the rows in each bootstrap iteration (recall, each row in the original data frame represents one transect).
Use dplyr::group_modify to evaluate the bootstrap function on each bootstrap sample. dplyr::group_modify produces a giant data frame containing x rows for each of R bootstrap iterations (for a total of R $\times$ x rows), where x is the number of rows returned by the bootstrap function.

Warning

Because ‘observer’ contains 5 levels (4 coefficients), bootstrap estimation is slow. 500 iterations took well over an hour on an otherwise fast desktop. Planned updates to ‘Rdistance’ should speed calculation considerably. Until then, patience is required.

Listing 2: R code implementing the custom bootstrap loop for computing confidence intervals on specific ESD values.

# Covariate values to predict at ----
#   'observer' set in a previous code block
newDf <- observers

# The original point estimates ----
#   'observerESD' set in a previous code block
origEsts <- observerESD |> 
  dplyr::rename(origESW = esw)

# The 'bootstrap function' ----
#   A function applied to each BS sample 
estESW <- function(indexDf
                 , key
                 , data
                 , newData
                 , fit){

    # the BS sample
    bsdf <- data[indexDf$rowIndex,] 

    # estimate distance function using BS data
    dfunc.bs <- Rdistance::dfuncEstim(
                  data = bsdf
                , formula = formula(fit)
                , likelihood = fit$likelihood
                , w.lo = fit$w.lo
                , w.hi = fit$w.hi
                # + other options, e.g., x.scl
                )
                
    # estimate ESD for specific covariates, and return
    eswBS <- Rdistance::effectiveDistance(
                 object = dfunc.bs
               , newdata = newData
                )
    eswBS <- cbind(newData, data.frame(ESW = eswBS))
    return(eswBS)
 }
      
# Prepare bootstrap indices ----
R <- 500                         # number of BS interations
nDigits <- ceiling(log10(R + 1)) # max digits in index
nDataRows <- nrow(sparrowDf)     # number of transects
id <- rep(1:R, each = nDataRows) 
bsIndices <-  data.frame(
           id = paste0("Bootstrap_",
                       formatC(id
                             , format = "f"
                             , digits = 0
                             , width = nDigits
                             , flag = "0"))
         , rowIndex = sample( 1:nDataRows
                         , size = R*nDataRows
                         , replace = TRUE
                        ))
# In bsIndicies: 'id' is BS iteration number; 'rowIndex' is
# row in original data data set

# Apply 'bootstrap function' to ID groups using dplyr::group_modify ----
eswB <- bsIndices |>
  dplyr::group_by(id) |>
  dplyr::group_modify(.f = estESW      # BS function
                    , data = sparrowDf # the original
                    , newData = newDf  # covariates
                    , fit = dfuncFit   # the original
  )

When finished, data frame eswB contains one value of id for each bootstrap iteration, and each bootstrap iteration contains five predicted ESD values, one for each observer.

head(eswB, 2*nrow(observers))

# A tibble: 10 × 3
# Groups:   id [2]
   id            observer  ESW
   <chr>         <chr>     [m]
 1 Bootstrap_001 obs4     54.1
 2 Bootstrap_001 obs5     54.4
 3 Bootstrap_001 obs1     61.7
 4 Bootstrap_001 obs3     38.5
 5 Bootstrap_001 obs2     76.0
 6 Bootstrap_002 obs4     50.9
 7 Bootstrap_002 obs5     62.0
 8 Bootstrap_002 obs1     87.7
 9 Bootstrap_002 obs3     75.8
10 Bootstrap_002 obs2     99.3

We compute bootstrap confidence intervals, after re-grouping (by observer in this case), by calling functions that return either the lower or upper endpoint of the interval. The following code demonstrates computation of both the percentile and bias-corrected bootstrap intervals. In this case, the author leans toward reporting bias-corrected bootstrap intervals because the distribution of ESD is not very skewed (see e.g., x <- dplyr::filter(eswB, observer == "obs4"); hist(x)).

Listing 3: Calculation of percentile and bias-corrected confidence interval endpoints on results of the bootstrap loop.

# Regroup BS data by covariate values ----
eswB <- eswB |> 
  dplyr::group_by(dplyr::across(dplyr::all_of(names(newDf))))

# Merge in original point estimates b/c bias corrected ----
# intervals require the originals 
eswB <- eswB |> 
  dplyr::left_join(origEsts, by = names(newDf))

# Compute both percentile and bias-corrected intervals ----
eswCI <- eswB |>
  dplyr::summarise(
     pt_eswLo95 = quantile(ESW, p = 0.025)       # Percentile Low
   , pt_eswHi95 = quantile(ESW, p = 0.975)       # Percentile High
   , bc_eswLo95 = Rdistance::bcCI(x.bs = ESW
                                , x = origESW[1]
                                , ci = 0.95)[1]  # BiasCorr Low
   , bc_eswHi95 = Rdistance::bcCI(x.bs = ESW
                                , x = origESW[1]
                                , ci = 0.95)[2]  # BiasCorr High
                                ) 
                                
finalAnswer <- origEsts |>
  dplyr::full_join(eswCI, by = names(newDf))
  
finalAnswer

  observer      origESW   pt_eswLo95    pt_eswHi95   bc_eswLo95    bc_eswHi95
1     obs4 47.67400 [m] 38.71526 [m]  60.36355 [m] 38.89587 [m]  60.37731 [m]
2     obs5 62.11996 [m] 51.97999 [m]  82.38067 [m] 51.54298 [m]  80.86837 [m]
3     obs1 68.82074 [m] 55.40769 [m]  94.27039 [m] 57.01351 [m]  99.39565 [m]
4     obs3 72.24249 [m] 38.10893 [m]  89.67545 [m] 37.63629 [m]  88.85406 [m]
5     obs2 82.80148 [m] 63.56921 [m] 117.34634 [m] 63.46182 [m] 117.34316 [m]

Interval endpoints differ by less than 3 [m] between methods in this case (compare columns pt_esw??95 to bc_esw??95). The author deems these differences immaterial and opts for the bias-corrected version. The final, nicely formatted, intervals for observer ESD appear in Table 1.

finalAnswer <- finalAnswer |> 
  dplyr::select(dplyr::all_of(names(origEsts))
              , dplyr::starts_with("bc_")) |> 
  dplyr::mutate(ESW = paste0(
                 round(origESW, 1)
               , " ("
               , round(bc_eswLo95, 1)
               , ", "
               , round(bc_eswHi95, 1)
               , ")" )) |> 
  dplyr::select(observer, ESW) |> 
  dplyr::rename(Observer = observer
              , `ESW (95% CI)` = ESW)

knitr::kable(finalAnswer, align = "c")

Table 1: Final point estimates and 95% bias-corrected bootstrap confidence intervals for observer effective strip width, sorted from shortest to longest.

Observer	ESW (95% CI)
obs4	47.7 (38.9, 60.4)
obs5	62.1 (51.5, 80.9)
obs1	68.8 (57, 99.4)
obs3	72.2 (37.6, 88.9)
obs2	82.8 (63.5, 117.3)

In this example, Observer 4’s confidence interval does not overlap that of Observer 2. While this is not a formal test of ESD differences, it is pretty close. A formal (bootstrap) test for differences among ESD requires bootstrap sampling under the null hypothesis of equality and is outside the scope of this tutorial (perhaps it will be the topic of a future tutorial).

Final Notes

The bootstrap method demonstrated here is fairly general. Analysts can place confidence intervals on other derived parameters of distance functions by changing the output of the bootstrap function (estESW above). For example, the bootstrap function could return:

A linear combination of distance function coefficients
The probability of detection at a particular distance
Mean ESD over sub-sets of the covariate values

To illustrate the last point, suppose detections were collected using multiple walking transect types (e.g., two vs. one observer; trials vs. bushwhacking; slow vs. fast). Analysts could modify the bootstrap function to average ESD over two or more types while fitting all types in the base distance function.