On the equivalence of conglomerability and disintegrability for unbounded random variables

We extend a result of Dubins (Ann Probab 3:89–99, 1975) from bounded to unbounded random variables. Dubins showed that a finitely additive expectation over the collection of bounded random variables can be written as an integral of conditional expectations (disintegrability) if and only if the marginal expectation is always within the smallest closed interval containing the conditional expectations (conglomerability). We give a sufficient condition to extend this result to collections of random variables that have finite expected value and whose conditional expectations are finite and have finite expected value.

that, for a specific pair X and X ′ , the sets of all x and x ′ values that lead to this paradox form sets of probability 0. However, Kadane et al. (1986) illustrates how one can make the paradox occur with positive probability by considering more than countably many random variables at a time.
In contrast to the countably additive theory, (see Krauss 1968;Dubins 1975) finitely additive conditional probabilities can be fully defined given each non-empty subset of Ω, while satisfying the following generalization of the product rule: For all A, B, and C such that B ∩ C = ∅, P(A ∩ C|B) = P(C|B)P(A|B ∩ C).
In addition, for every subset B of Ω and every finite partition A 1 ,...,A n of Ω, P(B) is equal to the finitely-additive expectation of the random variable that takes the value P(B|A) for each ω ∈ A i and all i = 1,...,n. However, this property (called disintegrability and defined precisely in Definition 9) does not extend to all infinite partitions. Here is an elementary illustration, due to Dubins (1975) and discussed further by Kadane et al. (1996).

Define the partition
for all i . So if X(j, i) = P(B|A i ) for all (j, i) ∈ Ω, the finitely-additive expectation of X is 0, which differs from P(B) = 1/2.
The concept of disintegrability is relevant to understanding some otherwise anomalous features of Bayesian statistical inference that arise when using so-called improper priors. These are instances of the so-called marginalization paradoxes of Dawid et al. (1973). As Kadane et al. (1996, [Section 5]) explains, an improper prior, e.g. Lebesgue measure over the whole real line, corresponds to a merely finitely additive prior probability on the real line. Each unit interval has equal probability, i.e. probability 0. Even when the formal posterior computed from the improper prior turns out to be countably additive, the joint (finitely additive) probability may fail to be disintegrable in the partition determined by the data.
In Example 1, we also see that the conditional probabilities P(B| A i ) have the property that there exists ǫ > 0 such that P(B) > P(B| A i ) + ǫ for every i . (3) De Finetti (1930) says that such a probability fails conglomerability in the partition. (See Definition 9 for a more precise definition.) Schervish et al. (1984) show that each merely finitely additive probability fails conglomerability in some denumerable partition, which is not possible for countably additive probabilities. As we saw above, for nondenumerable partitions, the countably additive theory imposes disintegrability on the definition of conditional probability. Consequently the theory is not able to define P(A|B) for arbitrary sets A and B, as we saw in the Borel Paradox. Dubins (1975) established an important equivalence between conglomerability and disintegrability of finitely additive expectations, which de Finetti (1974) calls (coherent) previsions (see Definition 2 below). Dubins showed that, with respect to the collection of bounded random variables, by replacing countably additive probability and conditional probability with the more general concepts of (finitely additive) expectations and conditional expectations, then a finitely additive expectation function is disintegrable in a partition if and only if its conditional expectations are conglomerable in that partition. It follows easily from (3) that, when a finitely additive expectation fails conglomerability in a partition π , then it fails to be disintegrable in π . The converse inference is the heart of Dubins' result.
In this paper we extend Dubins' result to particular collections of unbounded random variables. In Sect. 2 we review de Finetti's concept of coherent previsions and conditional previsions, and describe a theory of finitely additive integrals/expectations for unbounded random variables. In Sect. 3 we review Dubins' result and discuss how to extend conglomerability to unbounded random variables. In Sect. 4 we give conditions under which the equivalence between disintegrability and conglomerability extends to unbounded random variables. In particular, we restrict attention to random variables for which previsions and conditional previsions are finite. Example 5 illustrates how the equivalence can fail if our conditions are not met. We offer a concluding discussion in Sect. 6.

Background
Let Ω be a fixed non-empty set, and define a random variable to be a real-valued function X defined on Ω. We require that X (ω) be finite for all ω ∈ Ω, but not necessarily bounded. All of the collections of random variables discussed in the definitions and results of this paper are allowed to include unbounded random variables unless explicitly stated otherwise.
The concept of coherent prevision on a collection of random variables was introduced by de Finetti (1974).
Definition 2 Let U be a collection of random variables defined on Ω. A function P : U → R is called a prevision. We say that P is incoherent if there exists a finite subset {X 1 , . . . , X n } of U and scalars α 1 , . . . , α n and > 0 such that, for all ω ∈ Ω, If P is not incoherent, we say that P is coherent.
An equivalent, and sometimes more convenient, way to define coherent prevision is to say that P is coherent if, for every finite subset {X 1 , . . . , X n } of U and all scalars α 1 , . . . , α n , It is not difficult to see that this is equivalent to Definition 2. As defined above, a coherent prevision P must assume only finite values, otherwise (5) would be impossible and/or undefined. There are ways to generalize the concept of coherent prevision to allow infinite values. (See Berti et al. 2001;Crisma and Gigante 2001;Crisma et al. 1997;Schervish et al. 2014b for some of these generalizations). Such generalizations play no role in the results of this paper. Our theorems apply only to sets of random variables for which all previsions (and conditional previsions) are finite.
There are many coherent previsions on the set X of bounded random variables, and each of them is a finitely additive probability when restricted to the collection of indicator functions of subsets of Ω. That is, using the standard notation of letting the name of an event stand for its indicator function, P(Ω) = 1, P(A∪B) = P(A)+ P(B) when A ∩ B = ∅, and P(A) ≥ 0 for all A ⊆ Ω. By finite additivity and linearity of coherent prevision, if X = n i=1 α i A i is a simple function (one that assumes only finitely many distinct values) the prevision of X equals n i=1 α i P(A i ). This resembles the formula for the integral of a simple function in the usual measure theoretic derivation. To carry the resemblance further, the value of P for every bounded X is uniquely determined from the finitely additive probability by means of the fact that The first equation in (6) is the same way that the Lebesgue integral of a nonnegative function X is defined in terms of the integrals of simple functions. Indeed P can be expressed as a finitely additive integral. Generalizing from the definition of Daniell integral in Royden (1968, [Chapter 13]), we can call a coherent prevision a finitely additive Daniell integral. Definition 3 applies to both bounded and unbounded functions. It also applies to functions defined on arbitrary spaces and to finitely additive set functions that are not probabilities.
Definition 3 Let L be a linear space of functions defined on a common space Γ such that L contains all constants. Let L be a linear functional defined on L that satisfies L(X ) ≥ 0 whenever X (ω) ≥ 0 for all ω. Then L is called a nonnegative linear functional or a finitely additive Daniell integral over L. We can write L(X ) = Ω X (ω)L(dω). If L is a nonnegative linear functional such that L(1) = 1, we call L a finitely additive expectation.
The finitely additive Daniell integral is equivalent to the integral as developed by Dunford and Schwartz (1958, [Chapter III]) for bounded random variables while remaining equivalent to the notion of coherent prevision (finitely additive expectation) for unbounded random variables. (See Proposition 1 below.) For unbounded random variables, not all real-valued coherent previsions admit an integral representation in the sense of Dunford and Schwartz (1958). On the other hand, all real-valued coherent previsions admit a representation as a finitely additive Daniell integral. For discussion of the problem, see Berti et al. (2001), Berti and Rigo (2000), Berti and Rigo (2002), Schervish et al. (2008), Seidenfeld et al. (2009).
Here is an example of a finitely additive expectation that is not countably additive. This example will be used later to illustrate our main result.
Example 2 Let P be a countably additive probability on a measurable space (Ω, B) such that Ω has infinitely many elements, and let X be the collection of all bounded measurable random variables. It is not difficult to show that the only coherent prevision for each bounded random variable is P(X ) equal to its countably additive expected value. Also, let Y be a random variable that is unbounded above and bounded below. Suppose that Y has finite expected value y by the usual countably additive definition. We will show that P(Y ) = p is coherent with the previsions of all of the bounded random variables if and only if p ≥ y. For the "only if" direction, note that y = sup bounded X ≤Y P(X ), and coherence requires that P(X ) ≤ P(Y ) for every bounded X ≤ Y . It follows that y ≤ P(Y ) is necessary for coherence. For the "if" direction, suppose that p ≥ y. We prove that P extends to a nonnegative linear functional L on the span L of the bounded random variables and Y with L(1) = 1. Every element of L has a unique representation as αY + X for some real α and some bounded random variable X . Define L(αY + X ) = αp + P(X ), which satisfies L(1) = 1 and is clearly linear and well-defined. To see that L is nonnegative, note that αY + X ≥ 0 only if α ≥ 0, in which case, L(αY + X ) ≥ αy + L(X ), which equals the countably additive integral of αY + X , which in turn is nonnegative.
To see that P is not countably additive when p > y, let {X n } ∞ n=1 be a countable sequence of bounded random variables that increase monotonically to Y , such as But the left-hand side is p − P(X 1 ) while the right-hand side is y − P(X 1 ), which are not equal.
Although the space L in Definition 3 may contain unbounded functions, L must assume only finite values since it is a linear functional. (de Finetti, 1974, Section 3.9) proves the following result which makes clear the connection between finitely additive expectation and coherent prevision.

Proposition 1 P is a finite coherent prevision on a set U of random variables if and only if there exists a finitely additive expectation L on a linear space L that contains
U and all constants such that L(X ) = P(X ) for every X ∈ U.
Coherent conditional prevision can be defined in a manner similar to coherent prevision.

Definition 4 Let Q be a collection of pairs (X, h)
where X is a random variable and h is a nonempty subset of Ω. A function P(·|·) : Q → R is called a conditional prevision. We say that P is coherent if, for every finite subset {(X 1 , h 1 ), . . . , (X n , h n )} of Q and scalars α 1 , . . . , α n ω ∈ Ω, If P is not coherent, we say that P is incoherent.
By comparing Definitions 2 and 4, it is easy to see that P(X ) must be the same as P(X |Ω) for every random variable X . This makes coherent prevision the special case of coherent conditional prevision in which, for every pair (X, h) ∈ Q, we have h = Ω. Hence, whenever we refer to a prevision P(X ) we mean that (X, Ω) ∈ Q and P(X ) = P(X |Ω).
When P(h) = 0 coherence (as defined in Definition 4) is insufficient to ensure that P(X |h) has even the most basic intuitive properties. For example, if P(h) = 0 then P(h|h) = 1 is coherent. To insure that conditional probabilities behave as much like probabilities as possible, De Finetti introduces an additional assumption in (1974, [Appendix 16]) as do Regazzini (1987), and Crisma and Gigante (2001). For example, Gigante (2001), de Finetti (1974), Regazzini (1987) choose a stronger definition of coherent conditional prevision, which we call DRCG-coherence in Definition 5: Definition 5 Let Q be a collection of pairs (X, h) where X is a random variable and h is a nonempty subset of Ω. A conditional prevision P defined on Q is called DRCG-coherent if, for every finite subset {(X 1 , h 1 ), . . . , (X n , h n )} of Q and scalars α 1 , . . . , α n ω ∈ Ω, where On the other hand, Dubins (1975) makes a weaker assumption that we generalize here for use with unbounded random variables.
Definition 6 Let π be a partition of Ω, and let P be a finite coherent conditional prevision defined on a set Q. We say that P contains a π -strategy if, for each h ∈ π : 1. L h = {X : (X, h) ∈ Q} is a nonempty linear space that contains h and all constants, 2. P(·|h) is a finitely additive expectation on L h , and 3. P(X |h) = c for each random variable X that equals the constant c on the whole set h.
Instead of the third condition in Definition 6, Dubins (1975) assumes P(h|h) = 1, which is equivalent to the third condition when all random variables are bounded. Theorem 1, our main result, assumes that P contains a π -strategy. It is straightforward to show that, for every partition π , every finite DRCG-coherent prevision either contains a π -strategy or can be extended so as to contain a π -strategy. Rather than strengthen the definition of coherence, we prefer to add assumptions to theorems as needed. In Schervish (2014b, [Example 2]) we illustrate our reason for this preference.

Conglomerability and disintegrability
We turn now to precise definitions of conglomerability and disintegrability. Let π be a partition of Ω. That is, π is a collection {h : h ∈ π } of mutually disjoint subsets of Ω such that their union is Ω. The conditional prevision of each random variable X given each element h of π is denoted P(X |h). In order to make sense out of the loose phrase "the integral of conditional expectations," we need to be precise about what it means to integrate a conditional prevision. In this section, all integrals are intended in the sense of Definition 3.

The Integral of a conditional expectation
Nonnegative linear functionals behave in many ways like the countably additive Lebesgue integral. One property that they share is the following transformation property that we use.

Lemma 1
Let L be a linear space of real-valued functions on Ω that includes all constants, let be a set, and let : Ω → be a function. Let L be a nonnegative linear functional defined on L. Let V be a linear space of real-valued functions on that includes all constants and such that, Definition 7 We call L in Lemma 1 the integral induced from L by .
Our primary use of Lemma 1 is to give meaning to the concept of integrating over a partition. In particular, we will give a precise meaning to "integral of conditional expectations." To that end, let π be a partition and let X be a random variable such that P(X |h) is defined and finite for all h ∈ π . Define X π (h) = P(X |h), for all h ∈ π , ( 8 ) (ω) = that unique h ∈ π such that ω ∈ h, P(X |π )(ω) = X π ( (ω)) (10) = P(X |h), for that unique h ∈ π such that ω ∈ h, so that X π is a real-valued function on π , P(X |π) is a real-valued function on Ω, and P(X |π) = X π ( ). Let L be a linear space of random variables that contains all constants, the domain of P, and P(X |π). Assuming that it is possible to extend P to L in such a way that all previsions are finite, then we can apply Lemma 1 with = π , V equal to the linear span of the constants and the random variable X π , L equal to the extension of P, and as defined in (9). Let P π denote the integral induced from P by , denoted L in Lemma 1. With the notation just introduced, Lemma 1 implies that if any of the four terms in (11) is finite. In summary, we have the following: Definition 8 Let π be a partition of Ω, and let X be a random variable such that P(X |h) is defined and finite for all h ∈ π and P[P(X |π)] is defined and finite. Then (11) is called the integral of the conditional expectations given π .

Precise definitions
Definition 9 Let W be a collection of random variables. Let π be a partition, and let P be a finite coherent conditional prevision on a set Q that contains both W × π and W ×{Ω}. We say that P is conglomerable in π with respect to W if, for each X ∈ W, inf h∈π P(X |h) ≤ P(X ) ≤ sup h∈π P(X |h).
We say that P is disintegrable in π with respect to W if, for each X ∈ W, In view of (11), there is an alternative way to express that P is disintegrable in a partition.
Proposition 2 P is disintegrable in π with respect to W if and only if, for each X ∈ W, P(X ) = P[P(X |π)] = P π (X π ).
When the conclusion of Proposition 2 holds, Schervish et al. (2014a) says that P satisfies the Law of Total Previsions in π .
Readers of Dubins (1975) will note that the definition of conglomerable in Definition 9 looks different from the corresponding definition that Dubins gave in Dubins (1975). Specifically, the definition in Dubins (1975) is that P is conglomerable in π with respect to the collection W if for all X ∈ W, P(X |h) ≥ 0 for all h ∈ π implies P(X ) ≥ 0.
Definition 9 is a straightforward generalization of the definition that de Finetti (de Finetti 1974, p. 143) gives for indicators of events. Definition 9 and (14) are equivalent when W = X , the collection of all bounded random variables. The proof relies on the fact that X is a linear space and contains all constants. The two definitions are not necessarily equivalent for every collection that is not a linear space and/or does not contain all constants.
Example 3 Consider the same situation as Example 1. Let W be the collection of all nonnegative bounded random variables. Hence, W is not a linear space. Because each X ∈ W is nonnegative, it follows that P(X |A i ) ≥ 0 for all i and P(X ) ≥ 0. Hence, (14) holds. On the other hand, let X (( j, i)) = j for all j = 0, 1 and i = 1, 2, . . .. Then P(X |A i ) = 1 for all i while P(X ) = 1/2 and P is not conglomerable by Definition 9.
In order to maintain the spirit of Dubins' definition when W is not a linear space or does not contain all constants, we need to strengthen (14).
Definition 10 Let W be a collection of random variables. Let π be a partition, and let P be a finite coherent conditional prevision on a set Q that contains both W × π and W × {Ω}. We say that P is D-conglomerable in π with respect to W if the following is true. For all X ∈ W and all real c, • P(X |h) ≤ c for all h ∈ π implies P(X ) ≤ c, and • P(X |h) ≥ c for all h ∈ π implies P(X ) ≥ c.
We now show that Definition 10 is equivalent to conglomerability from Definition 9 for finite previsions.

Lemma 2 Let W be a collection of random variables and let π be a partition. Let P be a finite coherent conditional prevision on a set Q that contains both W × {Ω} and W × π . Then P is conglomerable in π with respect to W if and only if P is D-conglomerable in π with respect to W.
Proof For the "if" direction, suppose that P is D-conglomerable in π with respect to W. Let X ∈ W, and let c 1 = inf h∈π P(X |h) and c 2 = sup h∈π P(X |h). Finiteness of P implies that c 1 < ∞ and c 2 > −∞. If c 1 = −∞, the second bullet in Definition 10 is vacuous. If c 1 is finite, then P(X |h) ≥ c 1 for all h ∈ π and Definition 10 says that P(X ) ≥ c 1 . Similarly if c 2 = ∞, the first bullet in Definition 10 is vacuous. If c 2 is finite, then P(X |h) ≤ c 2 for all h ∈ π so that P(X ) ≤ c 2 . Hence (12) holds.
For the "only if" direction, suppose that P is conglomerable in π with respect to W. Let X ∈ W. Then (12) holds. Let c be a real number. If P(X |h) ≥ c for all h ∈ π , then c ≤ inf h∈π P(X |h) ≤ P(X ) by (12). Similarly, if P(X |h) ≤ c for all h ∈ π , then c ≥ sup h∈π P(X |h) ≥ P(X ).

Extending the equivalence of conglomerability and disintegrability to unbounded variables
Lemma 3 shows that disintegrability implies conglomerability for arbitrary collections.
Lemma 3 Let W be a collection of random variables. Let π be a partition, and let P be a finite coherent conditional prevision on a set Q that contains both W × π and W × {Ω}. Suppose that P is disintegrable in π with respect to W. Then P is conglomerable in π with respect to W.
Proof Let X ∈ W, and define X π by (8). Because P π is coherent, By disintegrability, P(X ) = P π (X π ), hence (15) implies (12). Since the above argument applies to all X ∈ W, P is conglomerable in π with respect to W.
In light of Lemma 3, every set W of random variables, falls into one of three classes relative to P and a partition π .
Definition 11 Let W be a collection of random variables. Let π be a partition, and let P be a finite coherent conditional prevision on a set Q that contains both W × π and W × {Ω}. We say that -W is of Class 0 relative to P and π if P is neither conglomerable nor disintegrable in π with respect to W. -W is of Class 1 relative to P and π if P is conglomerable in π with respect to W but P is not disintegrable in π with respect to W. -W is of Class 2 relative to P and π if P is both conglomerable and disintegrable in π with respect to W. Theorem 1 of Dubins (1975) can be reexpressed as saying that, for each partition π and each coherent conditional prevision P, the collection X of bounded random variables is either of Class 0 or of Class 2 but never of Class 1 relative to P and π . In Example 5, we give an example of a coherent prevision P, a partition π , and a collection Y of random variables such that X ⊂ Y and Y is of Class 1 relative to P and π . The following result is a straightforward consequence of the class definitions.
Proposition 3 Let W be a collection of random variables. Let π be a partition, and let P be a finite coherent conditional prevision on a set Q that contains both W × π and W × {Ω}. If W is of Class 0 relative to P and π , then every superset of W is also of Class 0. If W is of Class 2 relative to P and π , then every subset of W is also of Class 2.
Our extension of Dubins' theorem gives a sufficient condition for a collection W of random variables to be not of Class 1. Berti and Rigo (1992, [Theorem 3.1]) prove a similar theorem for bounded random variables.

Theorem 1 Let W be a collection of random variables. Let π be a partition, and let P be a finite conditional prevision on a set Q.
Assume that P contains a π -strategy and that Q contains both W × π and W × {Ω}. Assume that (P(X |π), Ω) ∈ Q, and that for every X ∈ W, X − P(X |π) ∈ W.
Then, with respect to the collection W, P is conglomerable in π if and only if P is disintegrable in π .
Proof Because P(X |π )(ω) = P(X |h) for all ω ∈ h, P[P(X |π)|h] = P(X |h) for all h ∈ π and all X ∈ W. By linearity of P(·|h), we get P[X − P(X |π)|h] = 0, hence We have assumed that X − P(X |π) ∈ W. If P is conglomerable in π with respect to W, then P[X − P(X |π)] = 0, from which it follows that P(X ) = P[P(X |π)], so that P is disintegrable in π with respect to W. If P is disintegrable in π then P is conglomerable in π by Lemma 3.
The key assumption in Theorem 1 is (16). For an arbitrary collection W, define The following results (the second of which is trivial) help to distinguish some collections of random variables by their classes.
Lemma 4 Let W be a collection of random variables. Let π be a partition, and let P be a finite conditional prevision on a set Q. Assume that P contains a π -strategy and that Q contains both W × π and W × {Ω}. Then 1. W + satisfies (16), 2. W is of Class 2 relative to P and π if and only if W + is also of Class 2, and 3. If W is not of Class 2 relative to P and π , then W + is of Class 0.
Proof For part 1, let X ∈ W so that X − P(X |π) ∈ W + . Also X π = [P(X |π)] π , hence [X − P(X |π)] π is identically 0, and For part 2, the "if" direction is immediate from Proposition 3. For the "only if" direction, note that for every Y ∈ W − , Y π is identically 0 and P(Y ) = 0 if W is of Class 2. For part 3, Theorem 1 says that W + is either of Class 0 or Class 2. If W is not of Class 2, then no superset of it, such as W + , can be of Class 2. Hence W + must be of Class 0.
Proposition 4 Let W be a collection of random variables. Let π be a partition, and let P be a countably additive prevision on a set Q that contains both W × π and W × {Ω}. If every element of π has positive probability, then W is of Class 2 relative to P and π .
One subtle point concerning Proposition 4 is that P can be a countably additive prevision on the collection of all bounded random variables but fail to be countably additive on a collection that includes unbounded random variables. Examples 2, 4, and 5 illustrate this circumstance. As such, these examples illustrate how a probability can be both conglomerable and disintegrable in every partition relative to the class of bounded random variables, but not so with respect to larger classes that include unbounded random variables.

Examples
In this section, we extend Example 2 to illustrate both situations in which the conditions of Theorem 1 are satisfied and situations in which the conditions are not satisfied. Example 4 contains examples of Theorem 1 in which a collection W of random variables is of Class 0 relative to P and π 1 for a partition π 1 , while W is of Class 2 relative to P and π 2 for a different partition π 2 . Example 5 contains three different collections of random variables (one of which is the W from Example 4) that are of Classes 0, 1, and 2 relative to the same partition. In particular, the conditions of Theorem 1 fail for the collection that is of Class 1.
Example 4 We will make use of the construction described in Example 2. Let Ω be the set of ordered pairs of integers from 1 on up. Define P({(x, y)}) = 2 −x−y for all x, y ≥ 1. Then P is countably additive as a probability on Ω. Let Y be the unbounded random variable Y (x, y) = y for all x, y, whose countably additive expected value is 2. We saw in Example 2 that a necessary and sufficient condition for P(Y ) = p to be coherent with the previsions for the bounded random variables is p ≥ 2. In this example, we choose P(Y ) = 4, and extend P to the linear span of Y and bounded random variables as in Example 2. Consider the following two partitions: where h x = {(x, y) : y ≥ 1} for each x ≥ 1, and g y = {(x, y) : x ≥ 1} for each y ≥ 1. It is straightforward to show that P(h x ) = 2 −x for all x ≥ 1, and P(g y ) = 2 −y for all y ≥ 1. Hence, for all x and y, P(g y |h x ) = 2 −y and P(h x |g y ) = 2 −x . For each bounded random variable X , P(X ) must equal its countably additive expected value.
In order to satisfy the conditions of Theorem 1, we will assign finite coherent previsions and finite coherent conditional previsions given h x , g y for all x and y and given π 1 and π 2 to all random variables in the linear span W of the set Y = X {Y, P(Y |π 1 ), P(Y |π 2 )}, where X is the set of all bounded random variables. Because each nonempty subset of Ω has positive probability, P will contain both a π 1 -strategy and a π 2 -strategy so long as P is coherent.
so that P(Y |π 1 ) will be defined as soon as we choose coherent values for P(Y h x ) for all x ≥ 1. Note that P(Y h x )2 x might be a bounded or an unbounded sequence depending on what values we choose for P(Y h x ). We don't need Y h x ∈ W for each x, but we do need P(Y |π 1 ) ∈ W. So we need to choose coherent values for P(Y h x ) for all x.
An argument similar to the one given in Example 2 shows that, for each x, P(Y h x ) must be assigned a value greater than or equal to 2 −x+1 . Let r x = P(Y h x ) − 2 −x+1 for all x, so that for all x, Y h x , and Y h x ≥ 0 for all x, it follows that the following are necessary conditions for (17) to be coherent assignments: Because 2 + r x 2 x ≥ 0 for all x, we have with equality in (19) required if r x 2 x is bounded. To verify that r x 2 x can be either bounded or unbounded while satisfying all of the other conditions above, consider the following two examples: for all x, either r x = 2 −x or r x = 0.6 x . We show next that (18) and (19) are also sufficient for coherence. Assume that (18) and (19) hold. Let Note that v ≥ 0 with v = 0 if r x 2 x is bounded. The most general gamble allowed by (17) and the previsions from Example 2 is where x i , . . . , x n ≥ 1 are distinct integers, and {α x,y : x, y ≥ 1} is a bounded collection of numbers. When ω = (x, y), the value of (20) is where t = n i=1 γ i r x i , and q = ∞ x=1 ∞ y=1 2 −x−y α x,y , both finite numbers. If, for every choice of n, β, δ, γ 1 , . . . , γ n , and {α x,y : x, y ≥ 1}, there exists a pair (x, y) such that (21) is nonnegative, then the previsions are coherent. If β > 0, then (21) is unbounded above along sequences with fixed x and unbounded y. Similarly, if r x 2 x is unbounded, then δ > 0 makes (21) unbounded above along sequences with fixed y and unbounded x. Hence, for the rest of the proof we will assume that β ≤ 0 and either r x 2 x is bounded (in which case v = 0) or r x 2 x is unbounded and δ ≤ 0. The weighted average of (21) with weights 2 Since either δ ≤ 0 or v = 0 implies that −vδ ≥ 0, the right-hand side of (22) is at least −2β −t. If t ≤ −2β, then at least one value of Z (x, y) must be nonnegative, because a weighted average of the Z (x, y) values is nonnegative. For the remainder of the proof, assume that t > −2β. Since all r x i ≥ 0, n i=1 r x i ≤ 2, and t = n i=1 γ i r x i , there must be at least one γ i that is greater than −β. Let x 0 be an x i such that γ 0 ≡ γ i > −β.
Now, (γ 0 + β)y is unbounded above as y → ∞ because γ 0 + β > 0, and everything else in (23) We will choose examples of r x that satisfy the above conditions while keeping r x 2 x bounded. This makes P(Y |π 1 ) bounded and W the linear span of Y = X {Y }.
For i = 1, 2, it follows from Theorem 1 that P is conglomerable in π i if and only if P is disintegrable in π i . To see whether P is disintegrable in π i , we need to check whether (13) holds for every random variable in W. Because P is countably additive on the indicator functions, (13) holds for each bounded X in every partition. If (13) holds for X 1 and X 2 , it holds for α 1 X 1 + α 2 X 2 for all real α 1 , α 2 because P(·|h) is linear. So, P is disintegrable in a partition if and only if (13) holds for Y .
Since Y = P(Y |π 2 ), Y satisfies (13) in π 2 so that P is disintegrable in π 2 with respect to W and W is of Class 2 relative to π 2 . In order for Y to satisfy (13) in π 1 , we need i.e., ∞ x=1 r x = 2. For example, if r x = 3.5 × 3 −x , then u = 3.75 and P is not disintegrable in π 1 . If r x = 4 × 3 −x , then P is disintegrable in both partitions.
We conclude this section with an example that involves three collections of random variables X ⊂ Y ⊂ W such that X and W satisfy the conditions of Theorem 1 while Y does not. Also, X is of Class 2, Y is of Class 1, and W is of Class 0.
Example 5 Let X , Y , π 1 , Y, and W be as in Example 4. Define r x = P(Y h x )−2 −x+1 as in Example 4 so that P(Y |h x ) = 2 + r x 2 x . We saw in Example 4 that r x ≥ 0 and ∞ x=1 r x ≤ 2 were necessary and sufficient for the stated previsions to be coherent. Let r x = 3.5 × 3 −x as in one of the cases at the end of Example 4. Then P(Y |π 1 ) is bounded, W satisfies the conditions of Theorem 1, and P[P(Y |π 1 )] = 3.75. Here, W is of Class 0 relative to P and π 1 . Because P is countably additive on X , we know that X is of Class 2 relative to P and π 1 . Because we have not included Y − P(Y |π 1 ) in Y, we see that Y does not satisfy (16). To see that Y is of Class 1, note that Hence, Y is of Class 1 relative to P and π 1 .

Discussion
Conglomerability and disintegrability are familiar concepts in the countably additive theory of probability, although the names may not be as familiar as the concepts. The law of total probability or "tower property" of conditional expectations is essentially disintegrability, namely that the mean of a conditional mean is the marginal mean. With disintegrability taken for granted, conglomerability is simply an instance of the property of countably additive expectations that the mean of a random variable lies in the closed convex hull of its range. Of course, the countably additive theory guarantees disintegrability by allowing the conditional probabilities of events to change with the partition on which one conditions. The well-known Borel paradox is a classic example of how this happens. In the countably additive theory Kadane et al. (1996) illustrates how pervasive the Borel paradox is. If one insists on P(X |h) having a meaning for every random variable X and every nonempty event h, then not even the countably additive theory can guarantee disintegrability in every partition. The finitely additive theory of probability avoids the Borel paradox, but at the price of having its conditional probabilities fail conglomerability. Dubins (1975) shows that for coherent previsions over the linear space of all bounded random variables, conglomerability in a partition is equivalent to disintegrability in that same partition. In this paper we extend the concept of a coherent prevision to a collection of unbounded random variables with finite previsions and finite conditional previsions. We show that a finitely additive version of the Daniell integral gives an extension of disintegration to this collection of unbounded variables. When coherent previsions are defined for a linear space of such random variables and they contain a π -strategy, conglomerability and disintegrability of previsions in π are equivalent conditions. As a final note, it is important to keep in mind that the concepts of conglomerability and disintegrability are defined with respect to a collection of random variables. The larger the collection of random variables, the more conditions of the form (12) and (13) that each concept requires. That is, in order for P to be conglomerable in π with respect to a collection W, (12) must hold for every X ∈ W. Similarly, for P to be disintegrable in π with respect to W, (13) must hold for every X ∈ W. Consider the collection X of all bounded random variables and a larger collection W that satisfies the conditions of Theorem 1 (such as the W in Example 5). Let Y be an intermediate collection (such as the Y in Example 5) so that X ⊂ Y ⊂ W. If P is conglomerable in π with respect to W, then W is of Class 2 relative to P and π and so are Y and X . Similarly, if P is disintegrable in W with respect to π , then all three collections are of Class 2. However, the equivalence of conglomerability and disintegrability does not carry over from larger collections to smaller collections. The reason is that W might be of Class 0 while Y is of Class 1 and X is of Class 2. Indeed, this is precisely what occurs in Example 5.