Understand posterior estimates • Pv3Rs

Summary of contents

In this vignette we document the output of compute_posterior() when the statistical model underpinning it is well specified and misspecified. Specifically, we document the output of compute_posterior() when the model is well specified but

data are missing,
data are uninformative,
data are limited to only one episode,
data are incomparable across episodes,
data are limited to only one marker.

When the model is well specified and there are data on many markers, we show how the maximum probability of recrudescence / reinfection depends on

per-episode genotype counts (i.e., multiplicities of infections, MOIs),
episode count,
the position of a recurrence in a sequence of episodes,

and thus comment on how the above dependencies impact the

interpretation of uncertainty.

Finally, we summarise results on the output of compute_posterior() when the model is misspecified because of

meiotic siblings,
half siblings,
parent child-like siblings,
genotyping errors and de novo mutations [To-do].

For the more detailed result, see [development files of the Pv3Rs source package] (https://github.com/aimeertaylor/Pv3Rs/tree/main/DevFiles/RelationshipStudy). We have not characterised model misspecification due to inbred parasites. Allele frequency estimates encode population-level relatedness locus-by-locus (Mehra et al. 2025). Allele frequency estimates are plugged into the statistical model underpinning Pv3Rs. Providing they are computed from a sample drawn from the parasite population from which trial participants also draw, there is no need to compensate further for elevated relatedness locus-by-locus on the population level. However, inter-locus dependence (linkage disequilibrium) is liable to generate overconfident posterior probabilities. Population structure (e.g., household effects) could lead to the misclassification of reinfection as relapse / recrudescence. In our view, recurrence classification in the presence of population structure is best understood using complementary population genetic and sensitivity analyses.

Missing data

When data are entirely missing, compute_posterior() returns the prior.

fs = list(m1 = c("A" = 0.5, "B" = 0.5)) # Allele frequencies
y = list(enrol = list(m1 = NA), recur = list(m1 = NA)) # Missing data
suppressMessages(compute_posterior(y, fs))$marg # Posterior
#>               C         L         I
#> recur 0.3333333 0.3333333 0.3333333

When data are missing but the user provides MOIs that are incompatible with recrudescence, compute_posterior() returns the prior re-weighted to the exclusion of recrudescence.

suppressMessages(compute_posterior(y, fs, MOIs = c(1,2)))$marg 
#>       C   L   I
#> recur 0 0.5 0.5

Uniformative data

When data are entirely uninformative because there is no genetic diversity, compute_posterior() returns the prior.

fs = list(m1 = c("A" = 1)) # Unit allele frequency: no genetic diversity
y = list(list(m1 = "A"), recur = list(m1 = "A")) # Data: not missing
suppressMessages(compute_posterior(y, fs))$marg # Posterior
#>               C         L         I
#> recur 0.3333333 0.3333333 0.3333333

Data on only one episode

Homoallelic data without user-specified MOIs: prior return

When only one episode has homoallelic data and the user does not specify an MOI > 1, compute_posterior() returns the prior.

fs = list(m1 = c("A" = 0.5, "B" = 0.5)) # Allele frequencies
y <- list(enrol = list(m1 = NA), recur1 = list(m1 = "A")) # No enrolment data
suppressMessages(compute_posterior(y, fs))$marg # Posterior
#>                C         L         I
#> recur1 0.3333333 0.3333333 0.3333333

Heteroallelic data: prior proximity

When only one episode has heteroallelic data, the MOI under the model necessarily exceeds one because the model assumes no false positivity. The posterior is close to the prior but not equal to it because the data are slightly informative: for relationship graphs with intra-episode siblings, heteroallelic data limit summation over identity-by-descent partitions to partitions with at least two cells for the episode with data. The lower bound on the cell count increases with the number of distinct alleles observed.

# Allele frequencies 
fs = list(m1 = c('A' = 0.25, 'B' = 0.25, 'C' = 0.25, 'D' = 0.25)) 

# MOI at enrolment increases with number of observed alleles:  
yMOI2 <- list(enrol = list(m1 = c('A','B')), recur = list(m1 = NA))
yMOI3 <- list(enrol = list(m1 = c('A','B','C')), recur = list(m1 = NA))
yMOI4 <- list(enrol = list(m1 = c('A','B','C','D')), recur = list(m1 = NA))
ys <- list(yMOI2, yMOI3, yMOI4)
do.call(rbind, lapply(ys, function(y) {
  suppressMessages(compute_posterior(y, fs))$marg
})) 
#>               C         L         I
#> recur 0.3292683 0.3414634 0.3292683
#> recur 0.3260870 0.3478261 0.3260870
#> recur 0.3248359 0.3503282 0.3248359

# MOI increases with external input:
MOIs <- list(c(2,1), c(2,2), c(3,2)) # user-specified MOIs
y <- list(enrol = list(m1 = c('A','B')), recur = list(m1 = NA)) # Data 
do.call(rbind, lapply(MOIs, function(x) {
  suppressMessages(compute_posterior(y, fs, MOIs = x))$marg
})) 
#>               C         L         I
#> recur 0.3292683 0.3414634 0.3292683
#> recur 0.3250000 0.3500000 0.3250000
#> recur 0.3334508 0.3330983 0.3334508

Rare homoallelic data with user-specified MOIs exceeding 1: prior departure

If the data are homoallelic, but the user specifies MOIs greater than one, and the observed allele is rare, the posterior departs from the prior because rare intra-episode allelic repeats are more probable given relationship graphs with intra-episode relatedness.

y <- list(enrol = list(m1 = 'A'), recur = list(m1 = NA)) # Homoallelic data
MOIs <- list(c(2,1), c(3,2), c(5,1)) # Different MOIs with first MOI > 1
fs = list(m1 = c("A" = 0.01, "B" = 0.99)) # Rare observed allele

#> NULL

The per-graph likelihood is only appreciable for relationship graphs where all distinct parasite genotypes in the first episode are siblings

Data are incomparable across episodes

When there are data on multiple episodes but no comparable data across episodes, compute_posterior() behaves similarly to when data are limited to one episode:

it returns the prior when data are homoallelic without user-specified MOIs > 1,
its output remains close to the prior when data are heteroallelic,
its output departs from the prior when data are homoallelic and rare with user-specified MOIs > 1.

# Allele frequencies
fs = list(m1 = c("A" = 0.01, "B" = 0.99), 
          m2 = c("A" = 0.01, "B" = 0.99)) 

# Data with an incomparable homoallelic call
y_hom <- list(enrol = list(m1 = "A", m2 = NA), 
              recur = list(m1 = NA, m2 = "A"))

# Data with an incomparable heteroallelic call
y_het <- list(enrol = list(m1 = c("A", "B"), m2 = NA), 
              recur = list(m1 = NA, m2 = c("A", "B")))


suppressMessages(compute_posterior(y_hom, fs))$marg # Prior return 
#>               C         L         I
#> recur 0.3333333 0.3333333 0.3333333
suppressMessages(compute_posterior(y_het, fs))$marg # Prior proximity
#>               C         L         I
#> recur 0.3370787 0.3595506 0.3033708
suppressMessages(compute_posterior(y_hom, fs, MOIs = c(2,2)))$marg # Prior departure
#>               C         L        I
#> recur 0.5142862 0.2183909 0.267323

Data on only one marker

When data on a single marker are comparable across episodes, the output of compute_posterior() depends on the observation type and the frequencies of the observed alleles. For example,

A rare match is informative: it quashes the posterior probability of reinfection.
A partial rare match is also informative, quashing the posterior probability of reinfection.
A match with a common allele is not very informative: all states are possible a posteriori.
A partial match with a common allele is not very informative: all states are possible a posteriori.
A mismatch is informative: it quashes the posterior probability of recrudescence.

The latter demonstrates the sensitivity of recrudescence inference to the assumption under the model that there are no genotyping errors: if a genotyping error generates a single mismatch, the posterior probability of recrudescence is quashed.

#> NULL

Data on only two markers

When data on a second marker are added and the second observation is consistent with the first the strength of inference generally increases (see arrows in or entering regions of posterior probability greater than 0.5); an exception being the addition of a common match to a common partial match, which is not very informative (central arrow). When a mismatch is added to a rare match (diamond), relapse becomes the most probable state. This demonstrates the requirement of multiple markers for relapse inference.

#> NULL

Data on many markers

Given non-zero prior probabilities, as the data increase with the number of markers genotyped, posterior probabilities for a given recurrence converge to either

relapse with posterior probability one when the data suggest the episode of interest is linked to previous episodes by regular sibling relationships
recrudescence with posterior probability less than one when the data suggest the episode of interest is linked to the previous episode by clonal relationships
reinfection with posterior probability less than one when the data suggest the episode or interest is not linked to previous episodes by regular sibling or clonal relationships

Recrudescence / reinfection probabilities necessarily converge to uncertain values because genetic data compatible with recrudescence / reinfection are also compatible with relapse.

Posterior bounds

Because we assume a priori that relationship graphs compatible with a given recurrence state sequence are uniformly distributed, and because relationship graphs compatible with sequences of recrudescence / reinfection are a subset of those compatible with sequences of relapse, bounds induced by the prior on non-marginal posterior probabilities can be computed a priori as a function of the prior on the recurrence state sequences and the prior on the relationship graphs; see vignette("understand-graph-prior", "Pv3Rs").

Because the probability mass function of a uniformly distributed discrete random variable depends on the size of its support, the prior on uniformly distributed graphs (and thus bounds on the posterior induced by the prior) depends on the size of the graph space. Graph space size depends on constitute graph size (all graphs within a space are the same size). Graphs have as many vertices as parasite genotypes within and across episodes. As such, graph size depends on episode counts and per-episode MOIs, which vary across trial participants.

Using rare matched and mismatched data on 100 markers, we demonstrate the dependency on per-episode MOIs of prior-induced posterior bounds for a single recurrence. We also demonstrate how maximum marginal probabilities of recrudescence / relapse depend on episode counts by adding recurrences with no recurrent data. However, these maxima are based on knowledge that there are no recurrent data on all but the first recurrence. In other words, they are not computable a priori.

marker_count <- 100 # Number of markers
ms <- paste0("m", 1:marker_count) # Marker names 
all_As <- sapply(ms, function(t) "A", simplify = F) # As for all markers
all_Bs <- sapply(ms, function(t) "B", simplify = F) # Bs for all markers
no_data <- sapply(ms, function(t) NA, simplify = F) # NAs for all markers
fs <- sapply(ms, function(m) c("A" = 0.01, "B" = 0.99), simplify = FALSE)

Increasing per-episode MOIs

MOIs <- list(c(1,1), c(2,1), c(2,2)) 
y_match <- list(enrol = all_As, recur = all_As)
y_mismatch <- list(enrol = all_As, recur = all_Bs)

#> NULL

The posterior probabilities of recrudescence increase with increasing graph size when the MOIs are balanced. The posterior probabilities of reinfection increase with increasing graph size.

Digression: counts of intra-episode siblings

To increase per-episode MOIs, we can do as above and increase user-specified MOIs without changing the data, or we can increase the allelic diversity of the data, s.t. MOIs modelled under Pv3Rs increase without user-specified MOIs. Either way, we get the same maximum probabilities:

all_ACs <- sapply(ms, function(t) c("A", "C"), simplify = F) # A&C for all 
all_BEs <- sapply(ms, function(t) c("B", "E"), simplify = F) # A&C for all 
all_ACDs <- sapply(ms, function(t) c("A", "C", "D"), simplify = F) # A&C for all 
fs <- sapply(ms, function(m) c("A" = 0.01, "B" = 0.01, "C" = 0.01, 
                               "D" = 0.01, "E" = 0.96), simplify = FALSE)

hom_match_data <- list(enrol = all_As, hom_match_data = all_As)
het_match_data <- list(enrol = all_ACDs, het_match_data = all_ACs)
hom_mismatch_data <- list(enrol = all_As, hom_mismatch_data = all_Bs)
het_mismatch_data <- list(enrol = all_ACDs, het_mismatch_data = all_BEs)

rbind(suppressMessages(compute_posterior(hom_match_data, fs, MOIs = c(3,2))$marg), 
      suppressMessages(compute_posterior(het_match_data, fs)$marg), 
      suppressMessages(compute_posterior(hom_mismatch_data, fs, MOIs = c(3,2))$marg),
      suppressMessages(compute_posterior(het_mismatch_data, fs)$marg))
#>                           C          L             I
#> hom_match_data    0.8514851 0.14851485 9.084162e-231
#> het_match_data    0.8514851 0.14851485  0.000000e+00
#> hom_mismatch_data 0.0000000 0.05494505  9.450549e-01
#> het_mismatch_data 0.0000000 0.05494505  9.450549e-01

But the graph likelihoods change:

For the homoallelic data with elevated user-specified MOIs, all intra-episode parasites in maximum likelihood graphs are siblings. For the heteroallelic data, all intra-episode parasites in maximum likelihood graphs are strangers. Both have the same maximum probabilities of recrudescence.

One upshot of this is that maximum probabilities could be increased by increasing counts of intra-episode siblings. Arguably, maximum probabilities should not increase with more than two intra-episode siblings because siblings are not independent. In practice, when MOIs under the Pv3Rs model are based on realistic MOI estimates, groups of three or more siblings are modelled as groups of two; see vignette("intra-episode-siblings", "Pv3Rs").

END OF DIGRESSION

Increasing episode counts

Recurrences can be added without adding data

ys_match <- list("1_recurrence" = list(enrol = all_As, 
                                       recur1 = all_As),
                 "2_recurrence" = list(enrol = all_As, 
                                       recur1 = all_As,
                                       recur2 = no_data),
                 "3_recurrence" = list(enrol = all_As, 
                                       recur1 = all_As,
                                       recur2 = no_data,
                                       recur3 = no_data))

ys_mismatch <- list("1_recurrence" = list(enrol = all_As, 
                                          recur1 = all_Bs),
                    "2_recurrence" = list(enrol = all_As, 
                                          recur1 = all_Bs,
                                          recur2 = no_data),
                    "3_recurrence" = list(enrol = all_As, 
                                          recur1 = all_Bs,
                                          recur2 = no_data,
                                          recur3 = no_data))

The marginal probability that the first recurrence is a recrudescence / reinfection depends on the total number of recurrences:

#> NULL

An example of how recrudescence probabilities converge to their maxima over 1, 2, 5 and 100 markers:

#> NULL

Note that the above maxima are not bounds imposed by the prior: they are based on knowledge that there are no recurrent data on all but the first recurrence; see vignette("understand-graph-prior", "Pv3Rs").

Position in a sequence

Because the distribution over relationship graphs is not invariant to different orderings of states in a sequence (more graphs are compatible with a reinfection at the beginning versus the end of a sequence of three episodes, for example), the position of a recurrence in a sequence has a bearing on posterior estimates.

This effect is negligible when all episodes have the same data on many markers:

y <- list(enrol = all_As, 
          recur1 = all_As, 
          recur2 = all_As, 
          recur3 = all_As)
suppressMessages(compute_posterior(y, fs))$marg
#>                C         L            I
#> recur1 0.7720588 0.2279412 5.585675e-23
#> recur2 0.7720588 0.2279412 5.316220e-23
#> recur3 0.7720588 0.2279412 5.044004e-23

Instead, consider sequences of episodes with observations (Os) and episodes with no data (Ns):

ys_match <- list("NOO" = list(enrol = no_data, 
                              recur1 = all_As,
                              recur2 = all_As),
                 "ONO" = list(enrol = all_As, 
                              recur1 = no_data,
                              recur2 = all_As),
                 "OON" = list(enrol = all_As, 
                              recur1 = all_As,
                              recur2 = no_data))

ys_mismatch <- list("NOO" = list(enrol = no_data, 
                                 recur1 = all_As,
                                 recur2 = all_Bs),
                    "ONO" = list(enrol = all_As, 
                                 recur1 = no_data,
                                 recur2 = all_Bs), 
                    "OON" = list(enrol = all_As, 
                                 recur1 = all_Bs,
                                 recur2 = no_data))

#> NULL

Highly informed recurrences

The first recurrence in the sequence OON (upward purple 1 triangle) has a slightly lower probability of recrudescence than second recurrence in the sequence NOO (upward green 2 triangle) despite both recurrences having the same rare observations that match the directly preceding episode.

The first recurrence in the sequence OON (downward purple 1 triangle) has a slightly higher probability of reinfection than the second recurrence in the sequence NOO (downward green 2 triangle) despite both recurrences having the same observations that mismatch the directly preceding episode.

Weakly and uninformed recurrences

Unsurprisingly, the posterior of the first recurrence in the sequence NOO is close to the prior (it has no preceding data), likewise for the second recurrence in the sequence of OON (it has no data). More surprisingly, the posterior of the first recurrence of the sequence ONO is not close to the prior, despite having no data. On closer inspection, this is not so surprising for reasons explained below.

For match data (upward triangles), strong evidence for a clonal edge between episodes one and three in ONO is incompatible with all recurrence sequences ending with reinfection and reinfection followed by recrudescence, informing strongly the second recurrence and weakly the first recurrence. By comparison, strong evidence for a clonal edge between episodes two and three in NOO is incompatible with all sequences of recurrences ending with reinfection, informing the second recurrence only; likewise, strong evidence for a clonal edge between episodes one and two in OON is incompatible with all sequences of recurrences starting with reinfection, informing the first recurrence only.

epsilon <- .Machine$double.eps # Very small probability
names(which(suppressMessages(compute_posterior(y = ys_match[["ONO"]], fs))$joint < epsilon))
#> [1] "IC" "CI" "LI" "II"
names(which(suppressMessages(compute_posterior(y = ys_match[["NOO"]], fs))$joint < epsilon))
#> [1] "CI" "LI" "II"
names(which(suppressMessages(compute_posterior(y = ys_match[["OON"]], fs))$joint < epsilon))
#> [1] "IC" "IL" "II"

On close inspection the upward orange 1 triangle makes intuitive sense: if you know a first recurrence is followed by a second MOI = 1 recurrence that is a clone of the MOI = 1 enrolment episode, it is very unlikely the second recurrence is a clone drawn from a new mosquito as it would be if it were a recrudescence of a reinfection. As such, the data on the enrolment episode and the second recurrence tell us something about the first recurrence, and thus the posterior estimate of the first recurrence deviates from the prior even though the first recurrence has no data.

The explanation for mismatch data (downward triangles) in the sequence ONO is similar to that for match data: strong evidence for a stranger edge between episodes one and three is incompatible with a double recrudescence, so the posterior of the first recurrence (downward orange 1 triangle) deviates from the prior despite the first recurrence having no data.

epsilon <- .Machine$double.eps # Very small probability
names(which(suppressMessages(compute_posterior(y = ys_mismatch[["ONO"]], fs))$joint < epsilon))
#> [1] "CC"
names(which(suppressMessages(compute_posterior(y = ys_mismatch[["NOO"]], fs))$joint < epsilon))
#> [1] "CC" "LC" "IC"
names(which(suppressMessages(compute_posterior(y = ys_mismatch[["OON"]], fs))$joint < epsilon))
#> [1] "CC" "CL" "CI"

Interpreting uncertainty

Uncertain posterior estimates can be uncertain for two reasons that are not mutually exclusive:

more data are needed
the states are not fully identifiable

It is also important to bear in mind the fact that, under the Pv3Rs model, trial participants with different epside counts and per-episode MOIs have different maximum probabilities of recrudescence / reinfection. For example, if we use a common threshold of 0.8 to classify probable reinfection assuming recurrence states are equally likely a priori, we can discount a priori all trail participants with a single monoclonal recurrence following a monoclonal enrolment episode because their posterior reinfection probabilities will never exceed 0.75, even if their data are highly informative of reinfection:

y <- list(enrol = all_As, recur = all_Bs)
fs <- sapply(ms, FUN = function(m) c("A" = 0.5, "B" = 0.5), simplify = FALSE)

# Using the default uniform prior on recurrence states
suppressMessages(compute_posterior(y, fs))$marg
#>       C    L    I
#> recur 0 0.25 0.75

# Using a non-uniform prior on recurrence states
prior <- as.matrix(data.frame("C" = 0.25, "L" = 0.25, "I" = 0.5))
suppressMessages(compute_posterior(y, fs, prior))$marg
#>       C         L         I
#> recur 0 0.1428571 0.8571429

And that, for a given trial participant, the probability of recrudescence / reinfection depends on its position within a sequence, especially if one or more episodes has no data.

Meiotic siblings

Methods

We simulated data and generated results for an initial infection containing two or three meiotic siblings and a recurrent

stranger
clone
regular sibling
meiotic sibling

Posterior probabilities are computed assuming recurrence states are equally likely a priori.

Results

When the initial infection contains two meiotic siblings compute_posterior() is well behaved with maximum likelihood on the true relationship graph (not shown). Posterior probabilities converge to

probable reinfection when the recurrent parasite is a stranger,
probable recrudescence when the recurrent parasite is a clone,
certain relapse when the recurrent parasite is a regular sibling,
certain relapse when the recurrent parasite is a meiotic sibling.

When the initial infection contains three meiotic siblings and the recurrent parasite is either a stranger or a clone, posterior probabilities converge correctly to probable reinfection and probable recrudescence, respectively, but with maximal values given by graphs over relationships between two, not three, parasites in the initial infection.

Relationship graphs (not shown) are wrong for two reasons:

all the relationship graphs have two, not three, parasites in the initial infection because there are at most two alleles per marker. Although technically incorrect, this behaviour is arguably desirable (see digression above).
the highest likelihood relationship graphs have stranger parasites in the initial episode because prevalence data from three or four meiotic siblings are identical to bulk data from the parents who are strangers.

We can force the graphs to have three distinct parasites in the initial episode by specifying external MOIs. Doing so recovers maximum likelihood on true relationship graphs. However, the correct MOIs are unknowable in practice: a collection of siblings from two parents can only ever be as diverse as the two parents.

When the initial infection contains three meiotic siblings and the recurrent parasite is a regular or meiotic sibling, posterior probabilities converge to probable recrudescence with maximum likelihood on graphs with a clonal edge to the sibling relapse, and either two stranger parasites in the initial episode when no external MOIs are specified, or three sibling parasites in the initial episode when the correct MOIs (unknowable in practice) are provided externally.

Half siblings

Methods

We simulated data for three half siblings: two in an initial episode, the third in a recurrence:

child of parents 1 and 2 in the initial episode
child of parents 1 and 3 in the initial episode
child of parents 2 and 3 in the recurrence

We explored two scenarios: one where all parental parasites draw from the same allele distribution. Another with admixture where parent 1 draws alleles disproportionally to parents 2 and 3. The admixture scenario is improbable. We explore it because it is a worse case scenario: it is contrived to maximally hamper relapse classification.

Results

In general, when parental parasites draw from the same allele distribution, the frequency of certain relapse increases erratically with the number of markers genotyped.

For this particular case where there are three equifrequent alleles per marker, we can show theoretically that the system behaves erratically because a small perturbation to the ratio of observations (all alleles match across the three half siblings, all alleles are different, intra-episode alleles match, inter-episode alleles match) can lead to a large deviation in the odds of relapse versus reinfection; see vignette("understand-half-sibs", "Pv3Rs"). The purple, red, and green trajectories below have higher than the expected 0.5 intra-to-inter match ratios; moreover, the purple trajectory’s intra-to-inter match ratio consistently exceeds $0.5\times \text{log}_2(\dfrac{2}{5})$ , a condition found theoretically to concentrate posterior probability on reinfection under certain conditions; again, see vignette("understand-half-sibs", "Pv3Rs").

When intra-episode parasites systematically share rare alleles (admixture AND imbalanced allele frequencies), the frequency of probable reinfection increases erratically with the number of markers genotyped.

In both scenarios, the likelihood of the true graph with siblings within and across episodes is quashed as soon as distinct alleles from all three parents are observed at a given marker. In general,

when all parasites draw from the same allele distribution, the maximum likelihood graphs are the two with one inter-episode sibling edge.
when intra-episode parasites systematically share rare alleles, the maximum likelihood graph is the one with one intra-episode sibling edge.

Parent-child like siblings

Methods

We simulated data for three parent-child like siblings:

child of selfed parent 1 in the initial episode
child of parents 1 and 2 in the initial episode
child of selfed parent 2 in the recurrence

Aside: the alternative with both children of selfed parents in the initial episode is equivalent to the meiotic case above with three mieotic siblings in the initial episode because prevalence data from three meiotic siblings is equivalent to prevalence data from two stranger parents and leads to probable recrudescence rather than certain relapse with the number of markers genotyped.

As for half-siblings, we explored two scenarios: one where all parental parasites draw from the same allele distribution. Another with admixture where parent 1 draws alleles disproportionally to parents 2 and 3. The admixture scenario is improbable. We explore it because it is a worse case scenario: it is contrived to maximally hamper relapse classification.

Results

When parental parasites draw from the same allele distribution, the frequency of certain relapse increases erratically with the number of markers genotyped.

In general, when all parasites draw from the same allele distribution, the maximum likelihood graph is the true graph with siblings within and across episodes. Meanwhile, when intra-episode parasites systematically share rare alleles, the maximum likelihood graph is the one with one intra-episode sibling edge.