27 Nov 2021

Sensitivity Testing

The current industry protocol for durability testing of vascular stents and frames involves testing many implants simultaneously at a range of different stimulus magnitudes (typically strain or stress). The test levels are spread out like a grid across the stimulus range of interest. Each implant is tested to failure or run-out at its specified level and a model is fit to the data using methods similar to those described in Meeker and Escobar.¹ An example of a completed durability test of this type is shown below.² Of interest to the designer is the distribution of stimulus levels that would be expected to cause a failure at a specified cycle count like 400 million.

This methodology works well if you have a large budget and is accepted by FDA. However, blanketing the parameter space with enough samples to fit a robust model can be very costly and requires enough lab equipment to do many tests in parallel. I recently learned that there is another way that has the potential to significantly reduce test burden and allows for some unique advantages. I will attempt to describe it below.

Sensitivity testing represents a different paradigm in which samples are tested sequentially rather than in parallel. In such testing, the outcome of the first test informs the stimulus level of the second test and so on. When an implant fails at a given stimulus, the next part is tested at a lower level. When it passes, the stimulus on the next is increased. In this way, the latent distribution of part strength is converged upon with each additional test providing improved precision.

The key questions that must be answered to execute a sequential sensitivity test:

“How to specify the test levels for subsequent runs in a principled and efficient way?”

“How to estimate the underlying failure strength distribution when the threshold for failure is not directly assessed on each run?”

Neyer was the first to present a D-Optimal answer to the first question.³ Implementation of the algorithm for determining sequential levels that minimize the test burden and converge on the true underlying strength distribution quickly is not trivial and historically has remained unavailable outside of pricey commercial software.⁴ Fortunately, generous and diligent researchers have recently created an open source tool in R called “gonogo” which can execute sequential testing according to the D-optimal Neyer algorithm (or its improved direct descendants).⁵

In the remainder of this post I will show how to use gonogo to determine the underlying strength distribution of hypothetical vascular implant and visualize uncertainty about the estimate.

Load Libraries
Accessing gonogo Functions
Simulating A Sequential Sensitivity Experiment
- Simulate the Strength of Each Part
- See Test Levels and Input Test Results
Confidence, Reliability, and Visualization
Conclusion and Thanks
Session Info

Load Libraries

library(tidyverse)
library(gt)
library(plotly)
library(ggrepel)
library(here)
library(gonogo)
library(ggalt)

Accessing gonogo Functions

The functions in gonogo can be accessed in 2 ways:

Install and load the gonogo package using the instructions listed HERE

Download and source the gonogo script from Jeff Wu’s site HERE

I will be using the code sourced from Jeff’s site as the functions are more readily editable.

source(here("gonogoRiley.R"))

Simulating A Sequential Sensitivity Experiment

There are some nice built-in simulation features in gonogo using the gonogoSim function. For our walkthrough though we’ll simulate the data manually to have better visibility of what’s happening under the hood.

Suppose we have a part budget of n=25 stents from manufacturing and our goal is to estimate the underlying distribution of stress amplitude that causes fatigue fracture. First, let’s simulate the failure strength of the 25 parts using a known distribution (which is only available in the simulation). We’ll assume failure stress amplitude for these parts is normally distributed with a mean of 600 MPa and standard deviation of 13 MPa.

Simulate the Strength of Each Part

set.seed(805)
stent_strength_tbl <- tibble(part_strength_mpa = rnorm(n = 25, mean = 600, sd = 13))

stent_strength_tbl %>%
  gt_preview()

	part_strength_mpa
1	602.0110
2	585.2761
3	603.3943
4	569.8526
5	601.5328
6..24
25	600.9657

See Test Levels and Input Test Results

The gonogo() function will initiate the test plan for sequential testing that prescribes test levels based on the outcome of each run. This operation happens in the console as shown in the screenshot below.

We’ll use the Neyer option by setting the “test” argument equal to 2. One key feature of sequential testing is that high and low initial guesses for the mean must be provided at the onset of testing along with an initial guess for the standard deviation. These guesses help set the scale. We’ll provide vague guesses that would be reasonable given our domain knowledge of cobalt-based alloys and their fatigue properties.

If we were really running the test on physical parts we’d want to set up a test at the prescribed stimulus level, then report the result to the console by inputing 2 numbers: the level you actually ran the test at (should be the same as what was recommended) and the outcome (1 for failure or 0 for survival) separated by a single space. After the results from the first test are entered, the recommended level of the next test is presented. When all runs are complete, the results are stored in the assigned object.

# z <- gonogo(mlo = 500, mhi = 700, sg = 10, test = 2, reso = 1) #input results in console

The resulting object z is a list that has a lot of information. The following code will pull the run history and clean it up nicely with tools and formatting from gt package.

z$d0 %>%
  gt() %>%
  cols_align("center") %>%
  opt_row_striping(row_striping = TRUE) %>%
  tab_header(
    title = "Run History of Sensitivity Test of n=25 Stents",
    subtitle = "X and EX are Stimulus Levels, Y = Outcome (1 = part failed, 0 = part survived)"
  ) %>%
  cols_width(everything() ~ px(100)) %>%
  tab_style(
    style = list(
      cell_text(weight = "bold")
    ),
    locations = cells_column_labels(columns = everything())
  )

Run History of Sensitivity Test of n=25 Stents
X and EX are Stimulus Levels, Y = Outcome (1 = part failed, 0 = part survived)
X	Y	COUNT	RX	EX	TX	ID
600	0	1	600	600.0000	600	B0
650	1	1	650	650.0000	650	B1
625	1	1	625	625.0000	625	B3
612	1	1	612	612.5000	612	B3
606	1	1	606	606.0000	606	B3
590	0	1	590	589.7220	590	B4
593	0	1	593	592.5733	593	B4
611	0	1	611	610.7146	611	B4
617	1	1	617	616.8731	617	II1
599	1	1	599	599.4648	599	II2
590	0	1	590	590.4793	590	II2
616	1	1	616	615.7384	616	II2
593	1	1	593	592.9877	593	II2
584	0	1	584	583.6982	584	II2
585	0	1	585	585.3147	585	II2
615	1	1	615	615.2118	615	II2
587	0	1	587	587.4391	587	II2
613	1	1	613	613.3351	613	II2
612	1	1	612	612.3336	612	II2
589	0	1	589	589.3771	589	II2
611	1	1	611	611.0667	611	II2
590	0	1	590	590.4844	590	II2
610	1	1	610	610.0713	610	II2
591	0	1	591	591.3679	591	II2
609	1	1	609	609.2816	609	II2

Gonogo has some really convenient built-in visualization tools. The run history can be plotted using gonogo’s ptest function with argument plt=1.

ptest(z, plt = 1)

## NULL

The values we really care about are the estimates of the mean and sd of the latent failure strength distribution. Numerical MLE’s for these parameters can be quickly extracted as follows:

z$musig

## [1] 600.383131   7.894673

We see that these are quite close to the true parameters of mean = 600 MPa, sd = 13 MPa! That really impressive for only 25 parts and vague initial guesses.

I like to occasionally look under the hood and see how the outputs are generated. I don’t always understand it but it helps me learn. In this case, the calculations to extract an estimate are pretty simple. gonogo is using a glm with probit link function. The results are replicated here with manual calculations.

xglm <- glm(Y ~ X, family = binomial(link = probit), data = z$d0)

ab <- as.vector(xglm$coef)
muhat <- -ab[1] / ab[2]
sighat <- 1 / ab[2]

tibble(
  true_parameter = c(600, 13),
  gonogo_estimate = c(muhat, sighat)
) %>%
  mutate(gonogo_estimate = gonogo_estimate %>% round(1)) %>%
  gt()

true_parameter	gonogo_estimate
600	600.4
13	7.9

Confidence, Reliability, and Visualization

We just identified the MLE’s for the latent strength distribution. But anytime we’re working with medical devices we have to be conservative and respect the uncertainty in our methods. We don’t want to use the mean of the distribution to describe a design. Instead, we identify a lower quantile of interest and then put a confidence interval on that to be extra conservative.⁶ gonogo has some great tools and functions for helping us achieve these goals. I prefer to use the tidyverse framework for my data analysis and visualization so I combine the built in functions from gonogo with some helper functions that I wrote to keep things tidy. I present them here with some description of what they do but not much explanation of how they work. Sorry.

I will note however that gonogo provides options for obtaining multiple forms of confidence intervals: Fisher Matrix, GLM, and Likelihood-Based. For additional information on how it calculates each type, refer to THIS DOCUMENTATION. In the tests I ran, GLM and LR based CIs performed almost the same; FM was different. I use GLM below for my purposes.

In the following work, I assume we care about the 95% confidence bound on the .10 quantile. gonogo calculates all intervals as 2-sided, so to get the 95% confidence bound (1-sided) we request the 90% 2-sided limits.

Extract a tibble of 2-sided, pointwise confidence limits about all quantiles

This first function takes the output from the gonogo function and calculates 2-sided .90 confidence bounds on the various quantiles. The lims() function is from gonogo.

# helper function

conf_limits_fcn <- function(tbl, conf = .9) {
  conf_limits_tbl_full <- lims(ctyp = 2, dat = tbl, conf = conf, P = seq(from = .01, to = .99, by = .01)) %>%
    as_tibble() %>%
    rename(
      ciq_lower = V1,
      estimate_q = V2,
      ciq_upper = V3,
      ip_lower = V4,
      estimate_p = V5,
      cip_upper = V6
    )
}

rileysim_conf_tbl <- conf_limits_fcn(z$d0)

rileysim_conf_tbl %>%
  gt_preview()

	ciq_lower	estimate_q	ciq_upper	ip_lower	estimate_p	cip_upper
1	566.0095	582.0174	590.1744	0.000057	0.01	0.127107
2	569.7538	584.1695	591.6789	0.000283	0.02	0.165152
3	572.1145	585.5349	592.6478	0.000714	0.03	0.193273
4	573.8819	586.5620	593.3851	0.001369	0.04	0.216580
5	575.3128	587.3975	593.9914	0.002260	0.05	0.236954
6..98
99	610.9205	618.7489	633.9138	0.884141	0.99	0.999923

Extract a point estimate for the lower 2-sided confidence limit for the .10 quantile.

We present results to FDA using the terminology “confidence and reliability”. By extracting the lower end of the 2-sided 90% confidence band on the .10 quantile, we obtain an estimate of the 95/90 tolerance bound i.e. a bound above which we would expect 90% or more of the population to lie, with 95% confidence. This would be a conservative estimate of the endurance limit for this stent design.

lower_conf_fcn <- function(tbl, percentile) {
  lower_scalar <- tbl %>%
    mutate(qprobs = qnorm(estimate_p)) %>%
    arrange(qprobs) %>%
    filter(estimate_p == percentile) %>%
    mutate(ciq_lower = ciq_lower %>% round(1)) %>%
    pluck(1)

  lower_qnorm <- tbl %>%
    mutate(qprobs = qnorm(estimate_p)) %>%
    arrange(qprobs) %>%
    filter(estimate_p == percentile) %>%
    pluck(7)

  lower_tbl <- tibble(
    estimate_q = lower_scalar,
    estimate_p = percentile,
    qprobs = lower_qnorm
  )

  return(lower_tbl)
}

lower_conf_fcn(rileysim_conf_tbl, percentile = .1) %>% pluck(1)

## [1] 580.2

Plot CDF

Now some plotting functions to help visualize things. First, we use our confidence limit dataframe from above to construct a cdf of failure strength. With some clever use of geom_label_repel() and str_glue() we can embed our data right onto the plot for easy communication. The estimated endurance limit from our test sequence is 580 MPa.

plot_cdf_fcn <- function(tbl, percentile = .1, units = "MPa") {
  tbl %>%
    ggplot(aes(x = ciq_lower, y = estimate_p)) +
    geom_line(size = 1.3) +
    geom_line(aes(x = ciq_upper, y = estimate_p), size = 1.3) +
    geom_line(aes(x = estimate_q, y = estimate_p), linetype = 2) +
    geom_ribbon(aes(xmin = ciq_lower, xmax = ciq_upper), alpha = .1, fill = "purple") +
    geom_point(data = lower_conf_fcn(tbl, percentile = percentile), aes(estimate_q, estimate_p), color = "purple", size = 2) +
    geom_label_repel(
      data = lower_conf_fcn(tbl, percentile = percentile),
      aes(estimate_q, estimate_p),
      label = str_glue("1-sided, 95% Confidence Bound:\n{lower_conf_fcn(rileysim_conf_tbl, percentile = .1) %>% pluck(1)} {units}"),
      fill = "#8bd646ff",
      color = "black",
      segment.color = "black",
      segment.size = .5,
      nudge_y = .13,
      nudge_x = -25
    ) +
    geom_segment(data = tbl %>%
      filter(estimate_p == percentile), aes(x = ciq_lower, xend = ciq_upper, y = estimate_p, yend = estimate_p), linetype = 3) +
    theme_bw() +
    labs(
      x = "Stress (MPa)",
      y = "Percentile",
      caption = "Dotted line is estimate of failure strength distribution\nConfidence bands are 2-sided 90% (i.e. 1-sided 95 for lower)",
      title = "Sensitivity Testing of Stent Durability",
      subtitle = str_glue("placeholder")
    )
}

a <- plot_cdf_fcn(tbl = rileysim_conf_tbl, percentile = .1)

a +
  labs(subtitle = str_glue("Estimates: Mean = {muhat %>% round(1)} MPa, Sigma = {sighat %>% round(1)} MPa"))

Normal Probability Plot

Engineers in my industry have an unhealthy obsession to normal probability plots. Personally I don’t like them, but Minitab presents them by default and engineers in my industry will look for data in this format. As part of this project I taught myself how to convert distributional data from a CFD into a normal probability plot. I show the helper function here and the result.

norm_probability_plt_fcn <- function(tbl, percentile, units = "MPa") {
  conf_tbl <- tbl %>%
    mutate(qprobs = qnorm(estimate_p)) %>%
    arrange(qprobs)

  probs_of_interest <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99)

  probs_of_interest_tbl <- conf_tbl %>%
    filter(estimate_p %in% probs_of_interest)

  conf_tbl %>%
    ggplot() +
    geom_line(aes(x = ciq_lower, y = qprobs), size = 1.3) +
    geom_line(aes(x = ciq_upper, y = qprobs), size = 1.3) +
    geom_line(aes(x = estimate_q, y = qprobs), linetype = 2) +
    geom_ribbon(aes(xmin = ciq_lower, xmax = ciq_upper, y = qprobs), alpha = .1, fill = "purple") +
    geom_point(data = lower_conf_fcn(tbl, percentile = percentile), aes(estimate_q, qprobs), color = "purple", size = 2) +
    geom_label_repel(
      data = lower_conf_fcn(tbl, percentile = percentile),
      aes(estimate_q, qprobs),
      label = str_glue("1-sided, 95% Confidence Bound:\n{lower_conf_fcn(rileysim_conf_tbl, percentile = .1) %>% pluck(1)} {units}"),
      fill = "#8bd646ff",
      color = "black",
      segment.color = "black",
      segment.size = .5,
      nudge_y = 1,
      nudge_x = -20
    ) +
    geom_segment(data = conf_tbl %>%
      filter(estimate_p == percentile), aes(x = ciq_lower, xend = ciq_upper, y = qprobs, yend = qprobs), linetype = 3) +
    scale_y_continuous(limits = range(conf_tbl$qprobs), breaks = probs_of_interest_tbl$qprobs, labels = 100 * probs_of_interest_tbl$estimate_p) +
    theme_bw() +
    labs(
      x = "Stress (MPa)",
      y = "Percentile",
      caption = "Dotted line is estimate of failure strength distribution\nConfidence bands are 2-sided 90%",
      title = "Sensitivity Testing of Stent Durability",
      subtitle = str_glue("TBD_2")
    )
}

b <- norm_probability_plt_fcn(rileysim_conf_tbl, percentile = .1)

b +
  labs(subtitle = str_glue("Estimates: Mean = {muhat %>% round(1)} MPa, Sigma = {sighat %>% round(1)} MPa"))

Conclusion and Thanks

There you have it! A way to determine failure strength distributions using principled, sequential testing techniques guided by gonogo. If you’ve made it this far I really appreciate your attention. I would like to thank Paul Roediger for producing this amazing toolkit and for graciously offering your time to help me troubleshoot and work through some challenges on this project earlier this year. I’ll always be grateful to the open source community for making these things possible.

Session Info

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggalt_0.4.0     gonogo_0.1.0    here_1.0.0      ggrepel_0.8.2  
##  [5] plotly_4.9.2.1  gt_0.2.2        forcats_0.5.0   stringr_1.4.0  
##  [9] dplyr_1.0.7     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3    
## [13] tibble_3.1.4    ggplot2_3.3.5   tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.0           lubridate_1.7.9.2  ash_1.0-15         RColorBrewer_1.1-2
##  [5] httr_1.4.2         rprojroot_2.0.2    tools_4.0.3        backports_1.2.0   
##  [9] bslib_0.3.0        utf8_1.2.2         R6_2.5.1           KernSmooth_2.23-18
## [13] DBI_1.1.0          lazyeval_0.2.2     colorspace_2.0-2   withr_2.4.2       
## [17] tidyselect_1.1.1   compiler_4.0.3     extrafontdb_1.0    cli_3.0.1         
## [21] rvest_0.3.6        xml2_1.3.2         labeling_0.4.2     bookdown_0.21     
## [25] sass_0.4.0         scales_1.1.1       checkmate_2.0.0    proj4_1.0-10      
## [29] digest_0.6.28      rmarkdown_2.11     pkgconfig_2.0.3    htmltools_0.5.2   
## [33] extrafont_0.17     dbplyr_2.0.0       fastmap_1.1.0      highr_0.9         
## [37] maps_3.3.0         htmlwidgets_1.5.4  rlang_0.4.11       readxl_1.3.1      
## [41] rstudioapi_0.13    jquerylib_0.1.4    generics_0.1.0     farver_2.1.0      
## [45] jsonlite_1.7.2     magrittr_2.0.1     Rcpp_1.0.7         munsell_0.5.0     
## [49] fansi_0.5.0        lifecycle_1.0.1    stringi_1.7.4      yaml_2.2.1        
## [53] MASS_7.3-53        grid_4.0.3         crayon_1.4.1       haven_2.3.1       
## [57] hms_0.5.3          knitr_1.34         pillar_1.6.2       reprex_0.3.0      
## [61] glue_1.4.2         evaluate_0.14      blogdown_0.15      data.table_1.14.0 
## [65] modelr_0.1.8       vctrs_0.3.8        Rttf2pt1_1.3.8     cellranger_1.1.0  
## [69] gtable_0.3.0       assertthat_0.2.1   xfun_0.26          broom_0.7.8       
## [73] viridisLite_0.4.0  ellipsis_0.3.2

https://www.wiley.com/en-us/Statistical+Methods+for+Reliability+Data%2C+2nd+Edition-p-9781118115459 ↩︎
https://www2.lbl.gov/ritchie/Library/PDF/2020%20Cao-Nitinol%20HCF.pdf ↩︎
https://en.wikipedia.org/wiki/Neyer_d-optimal_test ↩︎
Neyer provided a commercial software called Sentest to facilitate practitioners who wanted to use D-optimal methods without developing their own software↩︎
https://arxiv.org/ftp/arxiv/papers/2011/2011.11177.pdf ↩︎
Since we only care about a 1-sided confidence interval, the 1-sided confidence bound on a specified quantile becomes analogous to a tolerance interval↩︎

Durability Testing of Stents Using Sensitivity-Based Methods in R