A Google search for ‘Gaussian Process Regression’ returns some intimidating material for a non-statistician. After filtering away the obscure stuff I’ll never understand and digging around within the code that makes GPR happen, I’m proud to say that I feel I’ve gotten my arms around the basics of GPR. The only way to confirm if this is true or not is for me to try to explain it - so that’s what I plan to do in this post.
When testing is costly or resource intensive, it’s not uncommon for management to ask an engineer “what are the chances that we pass?”. The answer will depend on the team’s collection of domain knowledge around the product and requirement but also in how the question is interpreted. It will also be sensitive to sample size considerations. In this post, I will show how to conduct an analysis to inform the response from both a Frequentist and Bayesian perspective.
Frequentist statistical methods, despite their flaws, are generally serviceable for a large suite of practical problems faced by engineers during product development of medical devices. But even in domains where simple models usually do the trick, there remain instances where a Bayesian approach is the best (and perhaps only logical) way to tackle a problem.
In the rest of this post, I will lay out a technical use-case and associated modeling workflow that is based on a real business problem encountered in the medical device industry.
Tolerance intervals are used to specify coverage of a population of data. In the frequentist framework, the width of the interval is dependent on the desired coverage proportion and the specified confidence level. They are widely used in the medical device industry because they can be compared directly vs. product specifications, allowing the engineer to make a judgment about what percentage of the parts would meet the spec taking into account sampling uncertainty.
Measurements taken from patient anatomy are often correlated. For example, larger blood vessels might tend to have less curvature. Additionally, data are rarely Gaussian, favoring skewed shapes with some very large values and a lower bound of zero. These properties can make simulation and inference hard. In this post I will walk through a workflow for an engineering problem that might be presented in my industry.
I've heard it said that common statistical tests are just linear models.1 It turns out that Gage R&R, a commonly used measurement system analysis (MSA), is no different. In this post I'll attempt to provide some background on Gage R&R, describe the underlying model, and then walk through a method for simulation that can be useful for things like power analysis or visualization of uncertainty.
What is Gage R&R?
Whether you are building bridges, baseball bats, or medical devices, one of the most basic rules of engineering is that the thing you build must be strong enough to survive its service environment. Although a simple concept in principle, variation in use conditions, material properties, and geometric tolerances all introduce uncertainty that can doom a product. Stress-Strength analysis attempts to formalize a more rigorous approach to evaluating overlap between the stress and strength distributions.
It’s time to get our hands dirty with some survival analysis! In this post, I’ll explore reliability modeling techniques that are applicable to Class III medical device testing. My goal is to expand on what I’ve been learning about GLM’s and get comfortable fitting data to Weibull distributions. I don’t have a ton of experience with Weibull analysis so I’ll be taking this opportunity to ask questions, probe assumptions, run simulations, explore different libraries, and develop some intuition about what to expect.
Like many engineers, my first models were based on Designed Experiments in the tradition of Cox and Montgomery. I hadn’t seen anything like a causal diagram until I picked the The Book of Why which explores all sorts of experimental relationships and structures I never imagined.1 Colliders, confounders, causal diagrams, M-bias - these concepts are all relatively new to me and I want to understand them better. In this post I will attempt to create some simple structural causal models (SCMs) for myself using the Dagitty and GGDag packages and then show the potential effects of confounders and colliders on a simulated experiment adapted from here.
In this post I’ll be attempting to leverage the parsnip package in R to run through some straightforward predictive analytics/machine learning. Parsnip provides a flexible and consistent interface to apply common regression and classification algorithms in R. I’ll be working with the Cleveland Clinic Heart Disease dataset which contains 13 variables related to patient diagnostics and one outcome variable indicating the presence or absence of heart disease.1 The data was accessed from the UCI Machine Learning Repository in September 2019.
I’ve never really worked much with Poisson data and wanted to get my hands dirty. I thought that for this project I might combine a Poisson data set with the simple Bayesian methods that I’ve explored before since it turns out the Poisson rate parameter lambda also has a nice conjugate prior (more on that later). Poisson distributed data are counts per unit time or space - they are events that arrive at random intervals but that have a characteristic rate parameter which also equals the variance.
When doing comparative testing it can be tempting to stop when we see the result that we hoped for. In the case of null hypothesis significance testing (NHST), the desired outcome is often a p-value of < .05. In the medical device industry, bench top testing can cost a lot of money. Why not just recalculate the p-value after every test and stop when the p-value reaches .05? The reason is that the confidence statement attached to your testing is only valid for a specific stopping rule.
Suppose our team is preparing to freeze a new implant design. In order to move into the next phase of the PDP, it is common to perform a suite of formal “Design Freeze” testing. If the results of the Design Freeze testing are acceptable, the project can advance from Design Freeze (DF) into Design Verification (DV). DV is an expensive and resource intensive phase culminating in formal reports that are included in the regulatory submission.
As engineers, it is not uncommon to be asked to determine whether or not two different configurations of a product perform the same. Perhaps we are asked to compare the durability of a next-generation prototype to the current generation. Sometimes we are testing the flexibility of our device versus a competitor for marketing purposes. Maybe we identify a new vendor for a raw material but must first understand whether the resultant finished product will perform any differently than when built using material from the standard supplier.