Robust Genomic Modelling Using Expert Knowledge about Additive, Dominance and Epistasis Variation

A major challenge with modelling non-additive genetic variation is that it is hard to separate it from additive and environmental variation. In this paper, we describe how to alleviate this issue and improve genomic modelling of additive and non-additive variation, by leveraging expert knowledge available about the relative magnitude of the sources of phenotypic variation. The method is Bayesian and uses the recently introduced penalized complexity and hierarchical decomposition prior frameworks, where priors can be speciﬁed and visualized in an intuitive way, and can be used to induce parsimonious modelling. We evaluate the potential impact of these priors in plant breeding through a simulated and a real case study of wheat breeding. We compare diﬀerent models and diﬀerent priors with varying amounts of expert knowledge. The simulated case study showed that the proposed priors and expert knowledge improved the robustness of the genomic modelling and the selection of the genetically best individuals in a breeding program. We observed this improvement in both variety selection on genetic values and parent selection on additive values, but the variety selection beneﬁted the most. In the real case study expert knowledge increased prediction accuracy in certain cases. In these cases the standard maximum likelihood approach did not ﬁnd an optimal estimate for the variance components. Finally, we discuss the importance of expert-knowledge priors for genomic modelling and breeding, and point to future research areas of easy-to-use and parsimonious priors in genomic modelling.


Introduction
Plant breeding programs are improving productivity of a range of crops and with this addressing the global and rising hunger problem that impacts 820 million people across the world (FAO et al., 2019). One of the most important food sources in the world is wheat (Shewry and Hey, 2015), however, recent improvements in wheat yield are smaller than the projected requirements (Ray et al., 2013) and might become more variable or even decrease due to climate change (Asseng et al., 2015). This trend is in stark contrast to the United Nation's Sustainable Development Goals that aim to end hunger and malnutrition by 2030 (General Assemby of the United Nations, 2015). Breeding has contributed significantly to the improvement of wheat yields in the past (e.g., Mackay et al., 2011;Rife et al., 2019), and the recent adoption of genomic selection could enable further significant improvements (Gaynor et al., 2017;Belamkar et al., 2018;Sweeney et al., 2019).
Breeding programs generate and evaluate new genotypes with the aim to improve key characteristics such as plant height, disease resistance and yield. Nowadays, a key component in breeding is genomic modelling, where we aim to reduce environmental noise in phenotypic observations and associate the remaining variation with variation in individual genomes. We use these associations to estimate genetic values for phenotyped or even non-phenotyped individuals and with this identify the genetically best individuals (Meuwissen et al., 2001). Improving this process involves improving the methods for disentangling genetic variation from environmental variation.
Genetic variation can be decomposed into additive and non-additive components (Fisher, 1918;Falconer and Mackay, 1996;Lynch et al., 1998;Mäki-Tanila and Hill, 2014). Additive variation is defined as variation of additive values, which are sums of allele substitution effects over the unobserved genotypes of causal loci. Statistically, the allele substitution effects are coefficients of multiple linear regression of phenotypic values on causal genotypes coded in an additive manner. Non-additive variation is defined as the remaining genetic variation not captured by the additive values. It is commonly partitioned into dominance and epistasis values. Dominance values capture deviations from additive values at individual loci. Epistasis values capture deviations from additive and/or dominance values at combinations of loci. Statistically, dominance and epistasis values capture deviations due to allele interactions at individual loci and combinations of loci, respectively. Modelling interactions between two loci at a time gives additive-by-additive, additive-bydominance and dominance-by-dominance epistasis. Modelling interactions between a larger number of loci increases the number of interactions.
Estimates of genetic values and their additive and non-additive components have different applications in breeding (Acquaah, 2009). Breeders use estimates of additive values to identify parents of the next generation, because additive values indicate the expected change in mean genetic value in the next generation under the assumption that allele frequencies will not change. Breeders use estimates of genetic values to identify individuals for commercial production, because genetic values indicate the expected phenotypic value. Estimates of genetic values are particularly valuable in plant breeding where individual genotypes can be effectively cloned. While genomic modelling currently focuses on additive values (Meuwissen et al., 2001;Varona et al., 2018), the literature on modelling non-additive variation is growing (Oakey et al., 2006;Wittenburg et al., 2011;Muñoz et al., 2014;Bouvet et al., 2016;Martini et al., 2017;Vitezica et al., 2017;Varona et al., 2018;de Almeida Filho et al., 2019;Santantonio et al., 2019;Tolhurst et al., 2019;Martini et al., 2020). Notably, modelling non-additive variation has been shown to improve the estimation of additive values in certain cases (Varona et al., 2018).
A challenge with modelling non-additive variation is that it is difficult to separate non-additive variation from additive and environmental variation even when large datasets are available (e.g., Misztal, 1997;Zhu et al., 2015;de los Campos et al., 2019). Further, pervasive linkage and linkage disequilibrium are challenging the decomposition of genetic variance into its components (Gianola et al., 2013;. This suggests that genomic models should be robust. In this paper, robust is defined as a model that does not estimate spurious non-additive values. One way to handle partially confounded sources of variation is to include expert knowledge on their absolute or relative sizes. This might be particularly important for small datasets as the data at hand will not enable sufficient separation of effects. There is information about the relative magnitude of the sources of phenotypic variation that has been collated since the seminal work of Fisher (1918). The magnitude of genetic variation for a range of traits is well known (e.g., Houle, 1992;Falconer and Mackay, 1996;Lynch et al., 1998). Data and theory indicate that the majority of genetic variation is captured by additive values (Hill et al., 2008;Mäki-Tanila and Hill, 2014), while the magnitude of variation in dominance and epistasis values varies considerably due to a range of factors. For example, there is no dominance variation between inbred individuals by definition. Further, model specification has a strong effect on the resulting estimates (e.g., Huang and Mackay, 2016;Martini et al., 2017;Vitezica et al., 2017;Martini et al., 2020). With common model specifications, additive values capture most of the genetic variation because they capture the main effects (in the statistical sense), while dominance and epistasis values capture interaction deviations from the main effects (Hill et al., 2008;Mäki-Tanila and Hill, 2014;Hill and Mäki-Tanila, 2015;Huang and Mackay, 2016).
A natural way of including expert knowledge in analyses is through Bayesian modelling. In a Bayesian setting, expert knowledge on the different sources of variation can be included in the model through prior distributions, see Gianola and Fernando (1986); Sorensen and Gianola (2007) for an introduction to Bayesian methods in animal breeding and quantitative geneteics, respectively. Prior distributions reflect beliefs and uncertainties about unknown quantities of a model and should be elicited from an expert in the field of interest (O'Hagan et al., 2006;Farrow, 2013). However, the ability to use external information with the Bayesian approach is seldomly exploited or even discouraged. One reason is the lack of intuitive methods that could utilize the elicited information in an analysis. Here, we show how to use the recently developed penalized complexity (PC) prior framework  to facilitate robust modelling, and the hierarchical decomposition (HD) prior framework (Fuglstad et al., 2020) to incorporate the wealth of knowledge about the magnitude of genetic and environmental variation as well as of additive and non-additive variation in an intuitive way. An advantage of these prior frameworks over the standard priors used in quantitative genetics (e.g., Sorensen and Gianola, 2007) is that both use simple rules to choose priors on the scale of the expert knowledge.
The aim of this paper is to introduce the penalized complexity and hierarchical decomposition prior frameworks for robust genomic modelling and prediction and to evaluate the potential impact of using these priors along with expert knowledge in plant breeding. We first describe the genomic model and how to incorporate the expert knowledge in this model using the penalized complexity and hierarchical decomposition prior frameworks. To test our proposed priors, we use a simulated case study of wheat yield. The simulated case study has 100 individuals, and we evaluate robustness as well as parameter and genetic value estimation compared to a prior implementing opposite (i.e. wrong) prior beliefs, default priors omitting expert knowledge and standard maximimum likelihood estimation. Then we apply the method to a publicly available wheat yield dataset with 1,739 individuals from 11 different trials in 6 locations in Germany with varying amounts of observed phenotypes from Gowda et al. (2014) and Zhao et al. (2015). We use crossvalidation in the analysis of this dataset to assess the performance of phenotype prediction accuracy when using the proposed priors in the model. A description of the simulated and real wheat breeding case studies, model fitting and analysis follows. Our key focus is to demonstrate how an analyst can take advantage of expert knowledge from literature or domain experts to enable robust genomic modelling of additive and non-additive variation. This focus involves specifying and visualizing the expert knowledge in an intuitive way. We then present the results and discuss the relevance of our work.

Genomic model
We model observed phenotypic values of n individuals y = (y 1 , . . . , y n ) with the aim to estimate their genetic values and their additive and non-additive components. To this end we use the genomic information about the individuals contained in the observed single nucleotide polymorphism (SNP) matrix Z, where row i contains SNP marker genotypes for individual i coded additively with 0, 1, 2. We let Z a be the column-centered Z where we have removed markers with low minor allele frequency, and let Z d be the column-centered matrix obtained from Z after setting heterozygote genotypes to 1 and homozygote genotypes to 0.
We model the phenotypic observation y i of individual i as where µ is an intercept, g i is genetic value and e i environmental residual. We model the environmental residual as an independently and identically distributed Gaussian random variable, e = (e 1 , . . . , e n ) ∼ N (0, σ 2 e I n ), where σ 2 e is the environmental variance and I n is the n × n identity matrix. Let a = (a 1 , . . . , a n ), d = (d 1 , . . . , d n ) and x = (x 1 , . . . , x n ) be vectors of the additive, the dominance and the epistasis values for the individuals, respectively. We consider the simple additive model where g i = a i (Model A) and the comprehensive additive and non-additive model where g i = a i + d i + x i (Model ADX ). We model these values as a ∼ N (0, σ 2 a A), d ∼ N (0, σ 2 d D) and x ∼ N (0, σ 2 x X), where σ 2 a , σ 2 d and σ 2 x are the additive, dominance and epistasis variances, respectively. We specify the covariance matrices as A = Z a Z T a /S a , D = Z d Z T d /S d and X = A A/S x (we consider only additive-by-additive epistatis), where is the Hadamard product (Henderson, 1985;Horn, 1990;Gianola and de los Campos, 2008;Vitezica et al., 2017). To incorporate our expert knowledge in a unified way, we scale the covariance matrices with S a , S d , and S x according to Sørbye and Rue (2018). The idea of such scaling is not new, see Legarra (2016), Vitezica et al. (2017) and Fuglstad et al. (2020) for details. Finally, the phenotypic variance is

Expert knowledge about variance components
As highlighted in the introduction, there is prior information about the relative magnitude of the genetic and environmental variation and the relative magnitude of the additive, dominance and epistasis variation that can guide the construction of prior distributions. We can summarize this expert knowledge in a hierarchical manner: EK1 Proportion of genetic to phenotypic variation is R g g+e = 1 − R e g+e EK2 Proportion of non-additive to genetic variation is R d+x Values for the relative magnitudes R * will vary between study systems and traits in line with the expert knowledge. In this study we follow the fact that many complex traits in agriculture are under sizeable environmental effect and that additive effects capture most genetic variation. With this in mind we assume 15 · 0.67 =∼ 0.10 and R x g =∼ 0.15 · 0.33 =∼ 0.05). We emphasize that we use this information to construct prior distributions, i.e., these relative magnitudes are only taken as a guide and not as the exact magnitude of variance components. We further emphasize that these are our assumptions based on the cited literature in the introduction and practical experience with analysis of a range of datasets. These assumptions will likely vary between study systems and should be developed via prior elicitation process between an analyst and expert. Further, Fuglstad et al. (2020) show that the prior for the first partition (R g g+e ) is not very influential. We present two approaches for constructing a prior that includes EK1, EK2 and EK3: independently for each variance parameter -component-wise (comp) prior -and jointly for all variance parameters -treebased (tree) model-wise prior. We contrast these two priors, including expert knowledge (denoted with a *), against default priors that do not use any expert knowledge. The default priors are priors from the literature, which are not specific for the problem at hand. For the component-wise expert knowledge prior we need additional information on the expected magnitude of the phenotypic variance V p . We summarize the resulting priors in Table 1 and describe them in detail in the next sections.
We do not include expert knowledge about the intercept because it is typically well-identified from the data, and we specify the nearly translation-invariant prior µ ∼ N (0, 1000). For the same reason, we do not include expert knowledge about the total variance (corresponding to the phenotypic variance) in the model-wise priors, and we use the non-informative and scale-invariant Jeffreys' prior σ 2 p ∼ 1/σ 2 p . We use the penalized complexity (PC) prior framework of Simpson et al. (2017) where the prior is intuitive and shrinks the parameters of a complex model to a simpler (base) model unless the data provides other information. In this framework, the prior on a parameter is controlled by two elements: a preferred parameter value and an idea on how strongly we believe in this preferred value. This parameter could be either a standard deviation or variance, a proportion of variances, or other parameters such as correlations (Guo et al., 2017).
In our component-wise prior we use a PC prior for each variance parameter in the model. These PC priors shrink towards variance equal to 0, i.e., towards a simpler model without the corresponding effect. The PC prior for a standard deviation is denoted as σ ∼ PC-SD 0 ( √ V , α) and the analyst has to specify an upper variance value V and a tail probability α such that the upper-tail probability P(σ > √ V ) = α. We use an Table 1: Summary of the prior distributions, on standard deviations a and proportions b,c . The dependence of the tree-based (model-wise) priors on tree structure is suppressed in the notation.

Component-wise
Tree-based (model-wise) Additive and non-additive describes a penalized complexity (PC) prior for a standard deviation where P(σ > √ V ) = α and a preference for the standard deviation being 0. b p ∼ PC-P 0 (R * ) describes a PC prior for a variance proportion that has median equal to R * and a preference for the variance proportion being equal to 0. c p ∼ PC-P M (R * ) describes a PC prior for a variance proportion with median R * and a preference for the variance proportion being equal to the median R * , and 75% probability in [logit(R * ) − 1, logit(R * ) + 1] around the median on logit-scale.
d The values 0.968 and 0.01 are taken from Simpson et al. (2017). Vp.
expert-expected value for V , and set α = 0.25 so the prior distributions are weakly-informative towards the expert knowledge, but prefers variance equal to zero unless the data informs otherwise. See examples below. The PC prior for a variance parameter corresponds to an exponential distribution on standard deviation.
In our model-wise prior we use the hierarchical decomposition (HD) prior framework of Fuglstad et al. (2020). The HD prior is constructed bottom-up following a tree structure, where we use expert knowledge to attribute the total variance in the model top-down to the different effects. We can do this with splits at one or more levels in the tree structure where we attribute the proportion of variance p to different effects. See Fuglstad et al. (2020) for examples. We can choose to be ignorant about splits or to provide expert knowledge. We specify ignorance between splits among the m model components by using a symmetric Dirichlet prior p ∼ Dirichlet(m).
In the HD prior framework, we can use expert knowledge in two different ways. We can prefer one effect to the others, or we can prefer a specific proportion of variance attributed to each effect. The use of PC priors will then shrink the model towards the preferred effect or preferred attribution of variance to the effects.
For a variance proportion with shrinkage towards one of the two split components, we denote the PC prior as p ∼ PC-P 0 (R), where R encodes the expert knowledge about the relative magnitude of variances such that R is the median (P(p > R) = 0.5)). For a variance proportion with shrinkage towards a specific proportion of variance assigned to each component, we denote the PC prior as p ∼ PC-P M (R), where R has the same interpretation as for PC-P 0 (R), but in addition encodes the preferred variance attribution (i.e., shrinkage towards the median; see examples below). For PC-P M (R), we need to specify how concentrated the distribution is on logit-scale in the interval [logit(R) − 1, logit(R) + 1] around the median (see Fuglstad et al. (2020) for details). We allocated 75% probability to this interval.
The PC prior for a variance proportion depends on the structure of the two components through their covariance matrices. We omit this in the notation for simplicity, and to emphasise that we chose to make the marginal priors on the proportions independent of each other. As the PC prior on proportions depends on the covariance matrix structure, it is application specific, and the priors do not correspond to common families of distributions such as the exponential or normal distributions (see Fuglstad et al. (2020) for more details).

Component-wise prior
We first calculate the empirical phenotypic variance (total variance) V p from a separate dataset, which is a trial study or a study believed to exhibit similar phenotypic variance as the study at hand. In the simulated case study, we use V p = 1.86. This empirical phenotypic variance is combined with the expert knowledge in EK1, EK2 and EK3 to compute expected variances for the model effects. Following EK1 we expect genetic variance around R g g+e V p and environmental variance around R e g+e V p . Following EK2 and EK3 we expect additive variance around R a g R g g+e V p , dominance variance around R d g R g g+e V p and epistasis variance around R x g R g g+e V p . We use this expert knowledge in the PC prior by setting the upper variance values. For example, for Model A we get σ a ∼ PC-SD 0 R g g+e V p , 0.25 and σ e ∼ PC-SD 0 R e g+e V p , 0.25 (Table 1). Combining the above expert knowledge procedure with the two different genomic models (Model A and Model ADX) gave us settings we denote as A-comp* and ADX-comp*. We have contrasted these settings with a default PC prior proposed by Simpson et al. (2017) with U = 0.968 and α = 0.01 on all variances, which gave us settings denoted as A-comp and ADX-comp. Preliminary analyses showed that the inference for ADX-comp is not robust in the sense that it did not avoid estimating spurious non-additive effects and we do not present results from this setting. The priors for A-comp and A-comp are plotted in Figure 1 and for ADX-comp* in Figure 2, using R g g+e = 0.25, R d+x g = 0.15, R x d+x =∼ 0.33 and V p = 1. If V p takes another value, we simply rescale the x-and y-axes; the shape of the prior stays the same.

Tree-based model-wise prior
To incorporate the expert knowledge in a model-wise way we utilize the expert knowledge in a hierarchical manner following the tree structure in Figure 3. We can now assume the scale of the phenotype is unknown. The arrows in the tree structure indicate how the variance will be distributed in the model, meaning that the amount of variance to e.g. dominance and epistasis will depend on the variance partitioning on the higher levels. To construct the hierarchical decomposition (HD) prior we follow the tree structure from the bottom and continue in a conditional fashion upwards in the tree (Fuglstad et al., 2020). We put a marginal prior on the lower splits in the tree and condition on them when we move upwards. This means that the priors are dependent on choices made further down in the tree, e.g., the prior on the broad-sense heritability at the top will depend on the priors further down. However, in this bottom-up approach, the prior for each split can only employ knowledge from the current split and below, e.g., the choices of priors on the partitioning of the genetic effects (lower in the tree) do not use any knowledge about broad-sense heritability or its prior (higher in the tree) by design. The prior on the total variance is conditionally independent of the priors on the proportions. Note that in the joint distribution, we have dependencies between all the splits in the tree. In this study we assumed that at the lower levels the model shrinks towards the expert knowledge EK2 and EK3 (PC-P M ). Further, at the top level model shrinks the genetic effect unless the data indicates otherwise to reduce overfitting (PC-P 0 ). Note that we could have chosen different assumptions. To obtain a prior Figure 3: Tree structure describing the expert knowledge on the decomposition of phenotypic variance used in the HD prior. Edge labels indicate the expert motivated relative magnitude of variances used as a median proportion of the variance in the parent node attributed to each child node. For the top split we use the PC-P 0 prior (shrinking to environmental noise), while for splits two and three we use the PC-P M prior (shrinking to a combination of the effects).
fulfilling our assumptions, we follow four steps: 1. we use a PC-P M prior for the proportion of epistasis to non-additive variance with median of R x d+x (EK3), 2. we use a PC-P M prior for the proportion of non-additive to genetic variance with median of R d+x g (EK2), 3. we use a PC-P 0 prior for the proportion of genetic to phenotypic variance with the median of R g g+e (EK1), and 4. we achieve scale-independence through a scale-invariant prior for phenotypic variance (defined above).
This differs from the component-wise prior where we need to use a dataset to inform about the scale of variances, and that we now use parameters on the same scale as the expert knowledge. This construction gives the joint prior π(σ 2 p , p g g+e , p d+x We omit the conditioning because we chose to use independent priors for the proportions, where only the expert knowledge from lower levels are used and not the values of the proportions, following Fuglstad et al. (2020). We show this prior in Figure  4 for R g g+e = 0.25, R d+x g = 0.15 and R x d+x =∼ 0.33 and denote this setting as ADX-tree*. Note that the model-wise priors with expert knowledge are dependent on the covariance matrices of the modelled effects and are therefore dataset specific (Fuglstad et al., 2020), and the plots of these priors thus pertain to one specific dataset. The spike at p = 1 for p g g+e in Figure 4 is an artifact of the parameterization chosen for visualization and does not induce overfitting; see Fuglstad et al. (2020) for details.
We explored the influence of alternative expert knowledge. In addition to the previously stated values for EK1, EK2 and EK3 we also tested R g g+e = 0.25 (so R e g+e = 0.75), R d+x g = 0.95 (so R a g = 0.05), and R x d+x =∼ 0.89 (so R d d+x =∼ 0.11, R d g =∼ 0.95 · 0.11 =∼ 0.10 and R x g =∼ 0.95 · 0.89 =∼ 0.85). The Figure 4: The HD prior used in the ADX-tree* a setting with the proportion of genetic to phenotypic variance p g g+e , non-additive to genetic variance p d+x g , and epistasis to non-additive variance p x d+x . We use R g g+e = 0.25, R d+x g = 0.15, and R x d+x =∼ 0.33. This is a dataset specific prior.
a Additive and non-additive model with model-wise expert knowledge prior.

Figure 5:
The prior for the proportion of genetic to phenotypic variance p g g+e for the A-tree* a (left) and A-tree b (right) settings. We use R g g+e = 0.25. A-tree* is a dataset specific prior.
a Additive model with model-wise expert knowledge prior. b Additive model with model-wise default prior.
constructions follows the description above but with these medians instead. We denote this setting as ADXtree-opp, as it expresses almost opposite or "wrong" beliefs compared to ADX-tree* setting, and show the prior in Figure S1 in File S1 in the Supplemental materials. For Model A the genetic variance is not decomposed to different sources and the distribution of the phenotypic variance can be visualized using the top split in Figure 3. We use the expert knowledge in EK1 to construct a prior for the proportion of genetic to phenotypic variance and denote this setting as A-tree*. We show this prior in Figure 5. We compared the model-wise prior with expert knowledge to a prior with no expert knowledge by constructing an HD prior using the exchangeable Dirichlet prior on the proportion of phenotypic variance attributed to each of the sources of variance following Fuglstad et al. (2020). For Model A we use a uniform prior, which is a special case of the symmetric Dirichlet(m) prior with m = 2, on the proportion of genetic to phenotypic variance p g g+e and denote this setting as A-tree (see Figure 5). For Model ADX we use a Dirichlet(4) prior on the proportions, and denote this ADX-tree. This prior does not induce shrinkage towards any of the effects, and assumes that each model effect contributes equally to the phenotypic variance, which due to the lack of expert knowledge was not robust (robust in the sense that spurious non-additive values are not estimated). We do not show results from this setting. The tree structure and prior for ADX-tree can be seen in Figures S2 and S3 in File S1, respectively.

Simulated case study
We applied the described genomic model (1) with different priors to a simulated case study that mimics a wheat breeding program to investigate the properties of the different models and prior distributions. We simulated the breeding program using the R package AlphaSimR (Faux et al., 2016;Gaynor, 2019) and closely followed breeding program descriptions of Gaynor et al. (2017) (see their Figure 3) and Selle et al. (2019).
Specifically, we simulated a wheat-like genome with 21 chromosomes, selected at random, 1, 000 SNP markers and 1, 000 causal loci from each chromosome and used this genome to initiate a breeding program with breeding individuals. Every year we have used 50 fully inbred parents to initiate a new breeding cycle with 100 random crosses. In each cross we have generated 10 progenies and selfed them to generate 1, 000 F2 individuals, which were selfed again to generate 10, 000 F3 individuals. We reduced the 10, 000 F3 individuals in four successive selection stages (headrow, preliminary yield trial, advanced yield trial and elite yield trial) with 10% selection intensity in each stage. In the headrow stage, we simulated a visual selection on a phenotype with the heritability of 0.03. For the preliminary, advanced and elite yield trials we respectively simulated selection on phenotype with heritability 0.25, 0.45 and 0.62. We used the 50 individuals with the highest phenotype values from the last three selection stages as parents for the next breeding cycle.
We ran the simulation for 30 years. At year 1, we set the following variances for the population of the 50 parents: additive variance of 1.0, dominance variance of 0.5, and epistasis variance of 0.1. We set the environmental variance to 6.0 for all stages and years. We ran the simulation for 20 years as a "burn-in" to obtain realistic breeding data under selection. We then ran the simulation for another 10 years with selection on phenotype and evaluated the fit of the models at the third selection stage. We did not use the models for selection. At this stage we had 100 individuals each with five replicates and the goal was to select the 10 genetically best individuals for the fourth, last, stage. Prior to fitting the models we removed the SNP markers with minor allele frequency below 5%.

Real case study
In addition we applied the described genomic model (1) to the publicly available Central European wheat grain yield data from Gowda et al. (2014) and Zhao et al. (2015). In short, the data consists of 120 female and 15 male parent lines, which were crossed to create 1,604 hybrids. The parents and hybrids were phenotyped for grain yield in 11 different trials in 6 locations in Germany. The number of observed phenotypes for the parents and hybrids vary between the trials, i.e., some datasets have more observed phenotypes than others, ranging from 834 to 1,739 (see Table S1 in File S1 in the Supplemental materials). The parents and hybrids have genotype data for 17,372 high-quality SNP markers.
In the real case study we analyzed the performance of the model-wise priors using expert knowledge (tree*) for the the additive model (A), and the additive and non-additive model (ADX). The componentwise expert knowledge priors require intuition on the total variance, and we do not include them in the real case study. We used the same expert knowledge about the magnitude of genetic variance components in ADX-tree* as in the simulation study: R x d+x =∼ 0.33 and R d+x g =∼ 0.15. We have however used higher value for the broad-sense heritability, R g g+e = 0.75, in line with Reif et al. (2011) -later stage trials tend to have higher heritablity than early stage trials. Again, we emphasize that these values are only used to construct prior distributions and are not taken as literal proportions. The resulting priors can be seen in Figure S4 in File S1.

Model fit and analysis
We fitted the models with a Bayesian approach through the R package RStan (Carpenter et al., 2017;Stan Development Team, 2019) that uses the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. Sampling methods such as Markov Chain Monte Carlo and Hamiltonian Monte Carlo have guaranteed asymptotic accuracy as the number of drawn samples go to infinity. However, in an applied context with finite computational resources, it is hard to assess this accuracy. Betancourt (2016) developed a diagnostic metric for the Hamiltonian Monte Carlo, called divergence, that indicates the robustness of a model fit to a specific dataset. This metric indicates whether the sampler is able to transition through the posterior space effectively or not, where in the latter case the results might be biased (we show an example on this in the results).
We also fitted Model A and Model ADX with the maximum likelihood (ML) approach using the lowstorage BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm through the R package nloptr (Nocedal, 1980;Liu and Nocedal, 1989;Johnson, 2020). These approaches do not use priors. We denote them as A-ML and ADX-ML and use them as a base-line for comparison because maximum likelihood is a common approach in the literature. Inference for ADX-ML was, just like for ADX-tree and ADX-comp, not robust, and we do not present results from this setting. At last, we compared the model results to the performance of selection based solely on phenotype.
For the simulated case study, we ran the breeding program simulation 4,000 times and fitted the model and prior settings in each of the last 10 years of simulation (40,000 model fits in total), and evaluated the fits against the true (simulated) values. For each model fit we evaluated: (i) robustness, (ii) the accuracy of selecting the genetically best individuals, (iii) the accuracy of estimating the different genetic values and (iv) the accuracy of estimating the variance parameters.
We measure how robust the model is, i.e., to which degree it avoids estimating spurious non-additive effects, in robustness of inference. For the robustness of inference of the Bayesian approach with Stan we used the proportion of analyses that had robust inference (which we define as at least 99% samples where no divergent transitions were observed) for each model and prior setting. For the robustness of inference of the maximum likelihood approach we used the proportion of analyses where the optimizer algorithm converged.
For the accuracy of selecting the genetically best individuals we ranked the best 10 individuals based on the estimated genetic value or estimated additive value, and counted how many were among the true genetically best 10 individuals based on the true genetic value or true additive value. We used the posterior mean of the effects as estimated values for ranking. Selection on the genetic value indicated selection of new varieties, while the selection on the additive value indicated selection of new parents.
For the accuracy of estimating the different genetic values (total genetic, additive, dominance and epistasis values) we used Pearson correlation and continuous rank probability score (CRPS, Gneiting and Raftery, 2007). With the correlation we measured how well posterior means of genetic values correlated with true values (high value is desired). This metric works with point estimates and ignores uncertainty of inferred posterior distributions of each individual genetic value. With the CRPS we measured a combination of bias and sharpness of the posterior distribution compared to true values (low value is desired). Specifically, CRPS integrates squared difference between the cumulative estimated posterior distribution and the true value over the whole posterior distribution (Gneiting and Raftery, 2007). In the case of phenotypic selection, we have a phenotype value for selection candidates, which is a point estimate of the genetic value, and its CRPS then reduces to the mean absolute error between the true genetic values and the phenotype.
For the accuracy of estimating the variance parameters we compared estimated variance components to true values. We collected posterior medians and divided them by true genetic variances for each of the 10 years from the simulated breeding program (a value close to 1 is desired). This is not done for phenotype selection.
We analyzed the real case study with the same models and the model-wise priors. Here, we focused on the ability of predicting observed phenotypes based on cross-validation. We performed 5-fold cross-validations five times for each of the 11 trials independently. For each fold in each cross-validation, we predicted the held-out phenotypes (their posterior distribution involving linear predictor and environmental variation), and calculated the correlation between the point predictions and the observed phenotypes, and the CRPS using the phenotype posterior prediction distributions and the observed phenotypes available for each trial. We note that phenotype posterior predictions involve environmental variation, which does not affect point predictions and correlations, but affects the CRPS as the whole distribution of the prediction is involved in the calculations. We also looked at the posterior medians of the model variances. Of note, in contrast to the simulation study the genetic effects are unknown for real data, so that we cannot assess their estimation accuracy.

Data and code availability
We provide code to simulate the data described in the simulation case study (File S2 in the Supplemental materials). We also provide example code to fit the models presented in this paper together with an example dataset (File S3). In the real case study we used data from Gowda et al.

Simulated case study
In the simulated case study the model-wise priors and expert knowledge improved the robustness of the non-additive models and the selection of the genetically best individuals, but did not improve the accuracy of estimating different genetic values for all individuals or for variance components.
The model-wise priors combined with expert knowledge improved robustness of inference of the nonadditive model to the level of robustness of the additive model and phenotypic selection. We show this in Table 2 with the proportion of robust model fits by model and prior setting. Phenotypic selection does not depend on a model fit to a dataset and therefore had the highest inference robustness by definition. This maximum level of robustness was matched by fitting the simple additive model and model-wise prior with and without using expert knowledge (A-tree* and A-comp). This high model robustness was followed closely by fitting the more complicated non-additive model with model-wise prior and expert knowledge (ADX-tree*) as well as fitting the simple additive model with the standard maximum-likelihood approach (A-ML) and with the Bayesian approach using component-wise priors with and without expert knowledge (A-comp* and A-comp). The proportion of robust model fits then started to decrease for the other settings. For the non-additive model the standard maximum-likelihood approach (ADX-ML) was improved by using the Bayesian approach with either the component-wise prior with expert knowledge (ADX-comp*) and the model-wise prior with wrong/opposite expert knowledge (ADX-tree-opp). Literature default priors used with the non-additive model (ADX-tree and ADX-comp) had the lowest robustness of inference.   The reason for deteriorated inference robustness of some model and prior settings is that genetic and environmental effects can be partially confounded, which limits the exploration of the posterior when using the Bayesian approach or limits convergence of mode-seeking algorithms when using the maximum likelihood approach. We show the partial confounding with images of the covariance matrices for additive, dominance, epistasis and environmental sources of variation for one dataset in Figure S5 in File S1 in the Supplemental materials. In Figure S6 we show joint posterior samples for the epistasis and environmental variance for model ADX with model-wise priors with and without expert knowlede (ADX-tree* and ADX-tree) for one dataset. We see that without a robust model (this includes both the likelihood and the prior), the posterior becomes difficult to explore, and this is also supported by the divergence diagnostics ( Table 2). The posterior of the model-wise prior without expert knowledge (ADX-tree) is bimodal and the sampler has not been able to sample with convergence due to confounding.
Due to the lack of robust inference with the ADX-comp, ADX-tree and ADX-ML (see Table 2) settings we do not present their results in the following. Note that Table 2 includes all model abbreviations used.
The model-wise priors exploiting expert knowledge improved the selection of the genetically best individuals significantly, and the model choice was important for different breeding aims. We show this in Figure 6 where we display the accuracy of selecting individuals with the highest genetic value (variety selection, Figure 6a) and with the highest additive value (parent selection, Figure 6b). The settings with the non-additive model and expert knowledge (ADX-comp* and ADX-tree*) performed significantly better for selection of new varieties than the others, which followed in order from A-tree, A-tree*, A-comp*, A-comp, A-ML to ADX-tree-opp (see Table 2 for abbreviations). The differences between these settings were small, but they all performed significantly better than sole phenotype selection, which is sensitive to environmental noise. For the selection of new parents the simpler additive model performed the best and model-wise priors improved its performance (A-tree, A-tree* and A-comp*). Wrong expert knowledge harmed the parent selection (ADX-tree-opp), but it still outperformed sole phenotype selection. In the following we focus only on results for the non-additive model with model-wise expert knowledge prior (ADX-tree*) and component-wise expert knowledge prior (ADX-comp*), the additive model with model-wise default prior (A-tree) and the maximum likelihood approach (A-ML), and phenotype selection, and show the result of the remaining settings in the File S1 of the Supplemental materials ( Figures S7-S9). While using the model-wise priors and expert knowledge significantly improved the selection of the genetically best individuals compared to the maximum-likelihood approach, it did not significantly improve  Figure 7 with the correlation between the true values and corresponding posterior means and in Figure 8 with the continuous rank probability score (CRPS) between the true values and corresponding posterior distributions. We show this for the genetic, additive, dominance and epistasis values separately. While there was a tendency of more favourable correlation and CRPS for certain model and prior settings, the variation between replicates was much larger than variation between the model and prior settings. The model-wise prior tended to perform better than the component-wise prior, expert knowledge tended to perform better than the default noninformative prior knowledge and use of prior knowledge via the Bayesian approach tended to perform better than the maximum likelihood. All models performed better in estimating the genetic and additive values, especially in the terms of CRPS, than the phenotype selection.
Variance component estimates varied considerably around the true values for all models and prior settings, but the estimates from the Bayesian inference showed slightly larger biases and smaller variances than maximum likelihood estimates. We show this in Figure 9 with the ratio of estimated to true variances (value close to 1 is desired and values below/above 1 denote underestimation/overestimation). Of the model and prior settings in Figure 9, A-ML was the closest to the true value on average in estimating the genetic variance, but also had the largest variation between replicates. Bayesian analysis with A-tree reduced variance between replicates, but did not improve bias. ADX-comp* and ADX-tree* further increased the bias (underestimation) compared to A-ML and A-tree, and the model-wise prior (ADX-tree*) tended to perform better than the component-wise prior (ADX-comp*). When estimating dominance and epistasis variance, ADX-tree* tended to perform better than ADX-comp*. Estimates for epistasis variance were considerably more underestimated than for the dominance variance.

Real case study
The Bayesian approach with model-wise expert knowledge priors performed at least as good as or better than the maximum likelihood (ML) approach. We show this in Figure 10 with the predictive ability of phenotypes measured with correlation and continuous rank probability score (CRPS) from three trials in Seligenstadt (Sel13 and Sel12) and Hadmersleben (Had12) over the five 5-fold cross-validations. These trials had phenotype observations for 1,739 (Sel13), 834 (Sel12) and 1,738 (Had12) parents and hybrids. We include correlation and CRPS for all 11 trials in Figures S10 and S11 in File S1 in the Supplemental   materials. The maximum likelihood approach was equally good as the Bayesian approach in the Sel13 trial where all phenotypes were observed for the parents and hybrids, but in the Sel12 trial, which consists of only 834 out of 1,739 observed phenotypes, the maximum likelihood approach had worse predictive ability for the additive model (A), and slightly worse for the non-additive model (ADX). In the Had12 trial with practically no unobserved phenotypes, the maximum likelihood approach is outperformed by the Bayesian approach for the non-additive model due to overfitting (see Figure 11).
We explored reasons for the bad performance of A-ML in the Sel12 trial. The maximum likelihood optimizer returned a converge error message for two of the total 25 folds (we removed these model fits from all the results). However, Figure 10 indicates that the optimizer also diverged without giving an error message in other folds. An investigation of the variance estimates showed that the optimizer got "stuck" at the lower boundary values (−20 for the environmental and −50 for the other variances on a logarithmic scale). We gave 0 as initial value for the intercept and log-variances for both the Bayesian and maximum likelihood approach, however, the latter did not converge. Figure 11 shows the variance estimates from the Sel13, Sel12 and Had12 trials. We see that for Sel13, the approaches are in agreement on the variance estimates. With a dataset with many unobserved phenotypes (Sel12), the additive model fitted with the maximum likelihood approach (A-ML) estimated the environmen-  tal log-variance at -20, and in compensation severely overestimated the additive variance. The non-additive model fitted with maximum likelihood (ADX-ML) had the same underestimation of the environmental variance for some folds, but compensated with non-additive effects. This indicates overfitting and means that predictions from such are based solely on additive or non-additive effects, and no environmental effects. ADX-ML was also underestimating the environmental variance for the data from Had12, and compensated this variance with the dominance and epistasis effects. We reran the maximum likelihood optimizer with initial values set to posterior medians from the corresponding Bayesian models. In this case, the maximum likelihood approach was not outperformed by the Bayesian approach (see Figures S10 and S11 in File S1).
Looking at Figures S10 and S11, we see that the trend is the same for most of the trials; for datasets where we have observed most of the phenotypes for the parents and hybrids, the maximum likelihood and Bayesian approaches are in general performing equally, and we gain predictive accuracy by including nonadditive effects, but as soon as there are many unobserved phenotypes, such as for Boh12 and Sel12 (see Table S1 for information about all trials), the maximum likelihood approach is deteriorating. For the Had12, Had13 and Hhof13 trials, which has few unobserved phenotypes but still has poor predictive abilities for the non-additive model (ADX), the maximum likelihood approach has problems with overfitting (see Figure  S12). The model underestimates the environmental variance and attributes this variation to the dominance and epistasis effects. In addition to the additive (A) and non-additive (ADX) models, we looked at a model with additive and dominance effects only (AD), and the results did not differ from the full non-additive model ( Figures S10-S12).

Discussion
In this study we have introduced new priors for robust genomic modelling of additive and non-additive variation based on the penalized complexity prior  and hierarchical decomposition prior (Fuglstad et al., 2020) frameworks. In the simulated case study, the new priors enabled straightforward use of expert knowledge, which in turn improved the robustness of genomic modelling and the selection of the genetically best individuals in a wheat breeding program. However, it did not improve the overall accuracy of estimating genetic values for all individuals or for variance components. In the real case study, the new priors improved the prediction ability, especially for trials with fewer observations, and they reduced overfitting. These results highlight three points for discussion: (i) expert-knowledge priors for genomic modelling and prediction, (ii) the importance of priors for breeding and (iii) limitations of our work.

Expert-knowledge priors for genomic modelling and prediction
Genomic modelling is challenging due to inherent high-dimensionality and pervasive correlations between loci and therefore requires substantial amounts of information for robust estimation. Most genomes harbour millions of segregating loci that are highly or mildly correlated. While estimating additive effects at these loci is a challenging task in itself (e.g., Visscher et al., 2017;Young, 2019), estimating dominance and epistasis effects is an even greater challenge (e.g., Misztal, 1997;Zhu et al., 2015;de los Campos et al., 2019). One challenge in estimating the interactive dominance and epistasis effects is that they are correlated with the main additive effects and all these effects are further correlated across nearby loci (Mäki-Tanila and Hill, 2014;Hill and Mäki-Tanila, 2015;Vitezica et al., 2017). Information to estimate all these locus effects and corresponding individual values has to inherently come from the data, but could also come in a limited extent from the expert knowledge. There is a wealth of expert knowledge in genetics (e.g., Houle, 1992;Falconer and Mackay, 1996;Lynch et al., 1998), however, this expert knowledge is seldom used because it is not clear how to use it in a credible and a consistent manner.
We showed how to use the expert knowledge about the magnitude of different sources of variation by leveraging two recently introduced prior frameworks Fuglstad et al., 2020). While the penalized complexity priors are parsimonious and intuitive, they require absolute prior statements when used in a component-wise approach, which are challenging to elicit for multiple effects. The hierarchical decomposition framework imposes a tree structure according to a domain model, and the intuitive penalized complexity prior can be used in the hierarchical decomposition prior framework to ensure robust modelling. This model-wise approach enables the use of relative prior statements, which are less challenging to elicit than the absolute prior statements. The presented priors therefore pave a way for a fruitful elicitation dialogue between a geneticist and a statistician (Farrow, 2013). In particular, the hierarchical decomposition prior framework provides both a method for prior construction and a platform for communication among geneticists and statisticians. The model-wise expert knowledge prior must naturally be adapted to each model, as it depends on the model structure, but using the tree structures makes this adaption intuitive and should as such help with prior elicitation (O'Hagan et al., 2006;Farrow, 2013). Further, the graphical representation allows a description of a joint prior in a visual way with minimal statistical jargon (Figure 3).
An example of using such expert knowledge was the choices of a median for the broad-sense heritability of 0.25 in the simulated and 0.75 in the real case study. However, as Figures 4 and S4 show, the priors do not differ tremendously. This shows that the prior proposed in this study is vague and not restricted by the value chosen for the median. Perhaps there is even scope for more concentrated priors, should such information be available.
The hierarchical decomposition prior framework enabled us to use expert knowledge on relative additive and non-additive variation. In the simulated case study this information improved the robustness of inference of the Bayesian approach over the maximum likelihood approach and improved the selection of the genetically best individuals. This improvement was due the additional information that alleviated the strong confounding between the non-additive (particularly epistasis) and environmental variation.
The hierarchical decomposition prior framework is also useful when expert knowledge is only available on parts of the model. For example, an expert may not have a good intuition about the level of broad-sense heritability, say for a new trait, but will likely have a good intuition on how the genetic variance decomposes into additive, dominance and epistasis components, simply due to the model specification (Hill et al., 2008;Mäki-Tanila and Hill, 2014;Hill and Mäki-Tanila, 2015;Huang and Mackay, 2016). In those cases, we can use weakly-informative default priors on the parts of the model where expert knowledge is missing, and priors based on expert knowledge for the rest of the model. The component-wise specification of expert knowledge with the standard (Sorensen and Gianola, 2007) or the penalized complexity  priors is infeasible in this context, and does not admit a simple visualization of the implied assumptions on the decomposition of the phenotypic variance. Further, the component-wise specification of expert knowledge is particularly challenging when phenotypic variance is unknown or when collected observations are influenced by a range of effects which can inflate sample phenotypic variance. The model-wise approach with the hierarchical decomposition prior can address these situations.
There exists previous work on penalized estimation of genetic covariances (e.g., Meyer et al., 2011;Meyer, 2016Meyer, , 2019 that also uses Bayesian principles and scale-free penalty functions to reduce variation of the estimates from small datasets and for large numbers of traits. Our proposed priors and expert knowledge also reduced variation of estimates in the simulated case study. However, our estimates were biased, which is expected given the small sample size and that the Bayesian approach induced bias towards a lower variance (e.g, Sorensen and Gianola, 2007). It is worth noting that the maximum likelihood estimates of genetic variance also were largely underestimated, which we believe is due to the small sample size and a large number of parameters to estimate. Further, for some datasets we could not obtain the maximum likelihood estimates, while priors robustified the modelling. The real case study also showed that using expert knowledge increases the inference robustness in datasets with a large amount of unobserved phenotypes, and reduces overfitting. We saw this improvement in both the Bayesian approach and the maximum likelihood approach where we used the results from the Bayesian models as initial values for the optimization algorithm. However, the latter approach requires specific expert knowledge on the size of the variances, which in the same way as the component-wise expert knowledge priors, is difficult to elicit from experts in the field. We note, however, that genomic models are inherently misspecified by trying to estimate the effect of causal loci through correlated marker loci (Gianola et al., 2009;de los Campos et al., 2015). Also, linkage and linkage disequilibrium are challenging the decomposition of genetic variance into its components (Gianola et al., 2013;. Indeed, our variance estimates were not very accurate in the simulated case study. Future research could expand the hierarchical decomposition prior framework to other settings. For example, to multiple traits or modelling genotype-by-environment interactions, which are notoriously noisy, and we aim to find parsimonious estimates (e.g., Meyer, 2016Meyer, , 2019Tolhurst et al., 2019). Also, expand to model macro-and micro-environmental effects (e.g., Selle et al., 2019) and to model multiple layers of sparse, yet high-dimensional, "omic" data from modern biological experiments using network-like models (Damianou and Lawrence, 2013).

Importance of priors for breeding
Robust genomic modelling of non-additive variation is important for breeding programs. There is substantial literature indicating sizeable non-additive genetic variation (e.g., Oakey et al., 2006;Muñoz et al., 2014;Bouvet et al., 2016;Varona et al., 2018;de Almeida Filho et al., 2019;Santantonio et al., 2019;Tolhurst et al., 2019), but robust modelling of this variation is often challenging. We have shown how to achieve this robust modelling with the proposed priors and expert knowledge. We evaluated this approach with a simulated wheat breeding program where we assessed the ability to select the genetically best individuals on their genetic value (variety selection) and additive value (parent selection). The results showed that the proposed priors and the expert knowledge improved variety and parent selection. We observed more improvement in the variety selection, which is expected because there is more variation in genetic values than its first-order approximation additive values. However, this additional non-additive variation is hard to model due to a small signal from the data relative to environmental variation and confounding with the environmental variation. This confounding is expected. As pointed by one of the reviewers, we obtain the epistasis covariance matrix using the Hadamard product of the additive covariance matrix with itself, and such repeated Hadamard multiplication converges to an identity matrix, i.e., to the covariance matrix of the environmental effect. Both the simulated and real case studies showed that including non-additive effects requires some sort of penalization to avoid overfitting environmental noise as non-additive genetic effects. The proposed priors and the expert knowledge helped us to achieve this.
Importantly, all models improved upon sole phenotypic selection in the simulated case study, which shows the overall importance of genomic modelling. While the differences between the different models and priors were small, the improved genomic modelling can contribute to the much needed improvements in plant breeding (Ray et al., 2013;Asseng et al., 2015). Also, even a small improvement in the variety selection has a huge impact on production, because varieties are used extensively (Acquaah, 2009). In the terms of model complexity, the answer to whether to use the additive model or the non-additive model depended on the aim of the analysis. The non-additive model was the best in selecting the genetically best individuals on genetic value, whereas the additive model performed best in selecting the genetically best individuals on additive value. The reason for this is likely the small sample size and large number of parameters to estimate with the non-additive model (Varona et al., 2018). In the real case study adding non-additive effects to the model improved the phenotypic prediction accuracy beyond the additive model, and the expert knowledge helped us to avoid overfitting, which shows the advantage of the expert knowledge.
Of note is the observation that the proposed priors and the expert knowledge improved the selection of the genetically best individuals, but not the estimation of the different genetic values. We did not expect this difference. In principle both of these metrics are important, but for breeding the ability to select the genetically best individuals is more important (de los Campos et al., 2013). A possible explanation for the difference between the two metrics is that the top individuals deviated more from the overall distribution and the overall metrics do not capture well the tail-behaviour.
The importance of the proposed priors and the expert knowledge will likely vary with the stage and size of a breeding program, and as the real case study shows, the importance of priors increases with the decreasing amount of observations. Prior importance is known to decrease as the amount of data increases (Sorensen and Gianola, 2007), but the required amount of data for accurate estimation of non-additive effects is huge compared to the size of most breeding programs. Therefore the proposed penalized complexity and hierarchical decomposition priors could be helpful also in large breeding programs as they enforce shrinkage according to the expert knowledge unless the data indicates otherwise, reducing the risk of estimating spurious effects.

Limitations of our work
The aim of this paper was to describe the use of the expert knowledge to improve genomic modelling, which we achieved through two recently introduced prior frameworks Fuglstad et al., 2020), and demonstrated their use in a simulated and a real case study of wheat breeding. There are multiple other possible uses of the proposed priors in genomic modelling and prediction. The simulated case study is small with only 100 individuals in the advanced stages of a wheat breeding program. A small number of individuals and a limited genetic variation at this stage made a good case study to test the importance of priors, and we show that using our approach can be beneficial beyond the standard genomic model. We have also chosen this stage for computationally simplicity and speed because we evaluate the robustness of estimation over many replicates. Studies with more than 100 individuals may also be interesting, but is beyond the scope of this paper. Finally, we could have tested more prior options, in particular the shrinkage of the non-additive values towards the additive values, i.e., the PC-P 0 versus the PC-P M prior. More research is needed in the future to see how the expert knowledge can improve genetic modelling further.
Interesting areas for future research are also in other breeding domains with the recent rise in volumes of individual genotype and phenotype data, which provide power for estimating dominance and epistasis values (e.g., Alves et al., 2020;Joshi et al., 2020). The ability to estimate the non-additive values would be very beneficial in breeding programs that aim to exploit biotechnology (e.g., Gottardo et al., 2019). Finally, an exciting area for estimating non-additive individual values is in the area of personalized human medicine (de los Campos et al., 2010;Mackay and Moore, 2014;Sackton and Hartl, 2016;de los Campos et al., 2018;Begum, 2019).
The proposed priors are novel and require further computational work to facilitate widespread use. The penalized complexity priors  are increasingly used in the R-INLA software (Rue et al., 2009, while hierarchical decomposition priors (Fuglstad et al., 2020) have been implemented with the general purpose Bayesian software Stan (Carpenter et al., 2017;Stan Development Team, 2019).
However, this implementation is technical and Stan is slow for genomic models. We have used it simply as a standard Bayesian engine and note that there is active development on speeding it up (Margossian et al., 2020). We are currently developing an R package that facilitates the prior specification and estimation using Stan and INLA. This will ease the use of the prior, and INLA will speed up the computations. The goal is to provide a simple graphical interface that can be used both by non-statisticians and statisticians. However, further work is needed to enable Bayesian treatment of large genomic models fitted to datasets with hundreds of thousands of individuals.

Conclusion
In conclusion, we have presented how to use the expert knowledge on relative magnitude of genetic variation and its additive and non-additive components in the context of a Bayesian approach with two novel prior frameworks. We believe that when modelling a phenomenon for which there exists a lot of knowledge, we should employ methods that are able to take advantage of this resource. We have demonstrated with a simulated and a real case study that such methods are important and helpful in the breeding context, and they might have potential also in other areas that use genomic modelling.