Method-induced uncertainty can occur during
Identification. To ascertain the origins of uncertainty in data, domain knowledge is indispensable. Domain experts can assess technical and documentation errors, explain experimental design, distinguish erroneous outliers from meaningful extreme values, and outline data-generating processes.
Modeling: Once identified and characterized, data-generating processes are (often mathematically) modelled. Models may e.g. describe whether missing values occur systematically, whether measurement noise is correlated to observed signals, how uncertainty in parameters can be taken into account, and how uncertainty is propagated within a system. Such models might be simple but at the same time complex enough to account for the relevant characteristics. Uncertainty is often represented by probability distributions (the choice of which again induces uncertainty). If there are several competing models, a most appropriate candidate is chosen via objective (but uncertain) model selection criteria.
Analysis: Complex statistical analyses can often be performed by stochastic or simulation-based principles and algorithms, e.g. Monte Carlo estimation and Bayesian computing, especially Markov chain Monte Carlo methods. In numerically complex settings, stochastic search algorithms aim to identify global optima. Asymptotics are known for all mentioned approaches; however, in real applications (with non-infinite sample sizes), uncertainty remains concerning e.g. sampling biases or local optima.
Representation: Once the core data analysis is completed, results are communicated e.g. to cooperation partners from different research fields or to the public. Such reports should include a comprehensive assessment of uncertainty, but this aspect if often neglected due to lack of capacity, common language or data literacy competencies. Different understandings of reported results can again be considered as uncertain data, which leads back to the first phase of the cycle.