# Interpretations of Probability

*First published Mon Oct 21, 2002; substantive revision Wed May 7, 2003*

‘Interpreting probability’ is a commonly used but
misleading name for a worthy enterprise. The so-called
‘interpretations of probability’ would be better called
‘analyses of various concepts of probability’, and
‘interpreting probability’ is the task of providing such
analyses. Normally, we speak of interpreting *a formal system*,
that is, attaching familiar meanings to the primitive terms in its
axioms and theorems, usually with an eye to turning them into true
statements about some subject of interest. However, there is no single
formal system that is ‘probability’, but rather a host of
such systems. To be sure, Kolmogorov's axiomatization, which we will
present shortly, has achieved the status of orthodoxy, and it is
typically what philosophers have in mind when they think of
‘probability theory’. Nevertheless, several of the leading
‘interpretations of probability’ fail to satisfy all of
Kolmogorov's axioms, yet they have not lost their title for
that. Moreover, various other quantities that have nothing to do with
probability *do* satisfy Kolmogorov's axioms, and thus are
interpretations of it in a strict sense: normalized mass, length,
area, volume, and indeed anything that falls under the scope of
measure theory, the abstract mathematical theory that generalizes such
quantities. Nobody seriously considers these to be
‘interpretations of probability’, however, because they do
not play the right role in our conceptual apparatus. Instead, we will
be concerned here with various probability-like concepts that
purportedly do. Be all that as it may, we will follow common usage and
drop the cringing scare quotes in our survey of what philosophers have
taken to be the chief interpretations of probability.

Whatever we call it, the project of finding such interpretations is an important one. Probability is virtually ubiquitous. It plays a role in almost all the sciences. It underpins much of the social sciences -- witness, for example, the prevalence of the use of statistical testing, confidence intervals, regression methods, and so on. It finds its way, moreover, into much of philosophy. In epistemology, the philosophy of mind, and cognitive science, we see states of opinion being modeled by subjective probability functions, and learning being modeled by the updating of such functions. Since probability theory is central to decision theory and game theory, it has ramifications for ethics and political philosophy. It figures prominently in such staples of metaphysics as causation and laws of nature. It appears again in the philosophy of science in the analysis of confirmation of theories, scientific explanation, and in the philosophy of specific scientific theories, such as quantum mechanics, statistical mechanics, and genetics. It can even take center stage in the philosophy of logic, the philosophy of language, and the philosophy of religion. Thus, problems in the foundations of probability bear at least indirectly, and sometimes directly, upon central scientific, social scientific, and philosophical concerns. The interpretation of probability is one of the most important such foundational problems.

- 1. Kolmogorov's Probability Calculus
- 2. Criteria of adequacy for the interpretations of probability
- 3. The Main Interpretations
- 4. Conclusion: Future Prospects?
- Bibliography
- Other Internet Resources
- Related Entries

## 1. Kolmogorov's Probability Calculus

Probability theory was inspired by games of chance in
17^{th} century France and inaugurated by the Fermat-Pascal
correspondence. However, its axiomatization had to wait until
Kolmogorov's classic *Foundations of the Theory of
Probability* (1933). Let
Ω
be a non-empty set (‘the universal set’). A *field*
(or *algebra*) on
Ω
is a set **F** of subsets of
Ω
that has
Ω
as a member, and that is closed under complementation (with respect
to
Ω)
and union. Let *P* be a function from **F** to
the real numbers obeying:

- (Non-negativity)
*P*(*A*) ≥ 0, for all*A*∈**F**. - (Normalization)
*P*(Ω) = 1. - (Finite additivity)
*P*(*A*∪*B*) =*P*(*A*) +*P*(*B*) for all*A*,*B*∈**F**such that*A*∩*B*= Ø.

Call *P* a *probability function*, and
(Ω, **F**, *P*) a
*probability space*.

The assumption that *P* is defined on a field guarantees that
these axioms are non-vacuously instantiated, as are the various
theorems that follow from them. The non-negativity and normalization
axioms are largely matters of convention, although it is non-trivial
that probability functions take at least the two values 0 and 1, and
that they have a maximal value (unlike various other measures, such as
length, volume, and so on, which are unbounded). We will return to
finite additivity at a number of points below. We may now apply the
theory to various familiar cases. For example, we may represent the
results of tossing a single die once by the set
Ω={1, 2, 3, 4, 5, 6},
and we could let **F** be the set of all subsets of
Ω.
Under the natural assignment of probabilities to members of
**F**, we obtain such welcome results as *P*({1})
= 1/6, *P*(even) = *P*({2}
∪ {4}
∪ {6}) = 3/6, *P*(odd or less than 4) =
*P*(odd) +
*P*(less than 4) −
*P*(odd
∩ less than 4) = 1/2 + 1/2
− 2/6 = 4/6, and so on.

We could instead attach probabilities to members of a collection
**S** of *sentences* of a formal language, closed
under (countable) truth-functional combinations, with the following
counterpart axiomatization:

*P*(*A*) ≥ 0 for all*A*∈**S**.- If
*T*is a logical truth (in classical logic), then*P*(*T*) = 1. *P*(*A**B*) =*P*(*A*) +*P*(*B*) for all*A*∈S and *B*∈**S**such that*A*and*B*are logically incompatible.

Now let us strengthen our closure assumptions regarding
**F**, requiring it to be closed under complementation
and *countable* union; it is then called a *sigma field*
(or *sigma algebra)* on
Ω. It is controversial whether we should
strengthen finite additivity, as Kolmogorov does:

3′. (Countable additivity) If {A} is a countably infinite collection of (pairwise) disjoint sets, each of which is an element of_{i}F, then

Kolmogorov comments that infinite probability spaces are idealized models of real random processes, and that he limits himself arbitrarily to only those models that satisfy countable additivity. This axiom is the cornerstone of the assimilation of probability theory to measure theory.

*The conditional probability of A given B* is then given by
the ratio of unconditional probabilities:

P(A|B)= P(A∩B)P(B), provided P(B) > 0.

This is often taken to be the *definition* of conditional
probability, although it should be emphasized that this is a
technical usage of the term that may not align perfectly with a
pretheoretical concept that we might have (see Hájek,
forthcoming). Indeed, some authors take conditional probability to be
the primitive notion, and axiomatize it directly (e.g. Popper 1959b,
Renyi 1970, van Fraassen 1976, Spohn 1986 and Roeper and Leblanc
1999).

There are other axiomatizations that give up normalization; that give up countable additivity, and even additivity; that allow probabilities to take infinitesimal values (positive, but smaller than every positive real number); that allow probabilities to be vague (interval-valued, or more generally sets of numerical values). For now, however, when we speak of ‘the probability calculus’, we will mean Kolmogorov's approach, as is standard.

Given certain probabilities as inputs, the axioms and theorems allow
us to compute various further probabilities. However, apart from the
assignment of 1 to the universal set and 0 to the empty set, they are
silent regarding the initial assignment of
probabilities.^{[1]}
For guidance with that, we need to turn to the interpretations of
probability. First, however, let us list some criteria of adequacy
for such interpretations.

## 2. Criteria of adequacy for the interpretations of probability

What criteria are appropriate for assessing the cogency of a
proposed interpretation of probability? Of course, an interpretation
should be precise, unambiguous, and use well-understood
primitives. But those are really prescriptions for good
philosophizing generally; what do we want from our interpretations
*of probability*, specifically? We begin by following Salmon
(1966, 64), although we will raise some questions about his criteria,
and propose some others. He writes:

Admissibility.We say that an interpretation of a formal system is admissible if the meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently all the theorems, into true statements. A fundamental requirement for probability concepts is to satisfy the mathematical relations specified by the calculus of probability…

Ascertainability.This criterion requires that there be some method by which, in principle at least, we can ascertain values of probabilities. It merely expresses the fact that a concept of probability will be useless if it is impossible in principle to find out what the probabilities are…

Applicability.The force of this criterion is best expressed in Bishop Butler's famous aphorism, “Probability is the very guide of life.”…

It might seem that the criterion of admissibility goes without
saying: ‘interpretations’ of the probability calculus that
assigned to *P* the interpretation ‘the number of hairs on
the head of’ or ‘the political persuasion of’ would
obviously not even be in the running, because they would render the
axioms and theorems so obviously false. The word
‘interpretation’ is often used in such a way that
‘admissible interpretation’ is a pleonasm. Yet it turns out
that the criterion is non-trivial, and indeed if taken seriously would
rule out several of the leading interpretations of probability! As we
will see, some of them fail to satisfy countable additivity; for
others (certain propensity interpretations) the status of at least
some of the axioms is unclear. Nevertheless, we regard them as genuine
candidates. It should be remembered, moreover, that Kolmogorov's is
just one of many possible axiomatizations, and there is not universal
agreement on which is ‘best’ (whatever that might
mean). Indeed, Salmon's preferred axiomatization differs from
Kolmogorov's.^{[2]}
Thus, there is no such thing as admissibility *tout court*,
but rather admissibility with respect to this or that
axiomatization. It would be unfortunate if, perhaps out of an
overdeveloped regard for history, one felt obliged to reject any
interpretation that did not obey the letter of Kolmogorov's laws and
that was thus ‘inadmissible’. In any case, if we found an
inadmissible interpretation that did a wonderful job of meeting the
criteria of ascertainability and applicability, then we should surely
embrace it.

So let us turn to those criteria. It is a little unclear in the
ascertainability criterion just what “in principle” amounts
to, though perhaps some latitude here is all to the good. Understood
charitably, and to avoid trivializing it, it presumably excludes
omniscience. On the other hand, understanding it in a way acceptable
to a strict empiricist or a verificationist may be too
restrictive. ‘Probability’ is apparently, among other
things, a *modal* concept, plausibly outrunning that which
actually occurs, let alone that which is actually observed.

Most of the work will be done by the applicability criterion. We
must say more (as Salmon indeed does) about what *sort* of a
guide to life probability is supposed to be. Mass, length, area and
volume are all useful concepts, and they are ‘guides to life’ in
various ways (think how critical distance judgments can be to
survival); moreover, they are admissible and ascertainable, so
presumably it is the applicability criterion that will rule them
out. Perhaps it is best to think of applicability as a cluster of
criteria, each of which is supposed to capture something of
probability's distinctive conceptual roles; moreover, we should not
require that all of them be met by a given interpretation. They
include:

Non-triviality:an interpretation should make non-extreme probabilities at least a conceptual possibility. For example, suppose that we interpret ‘P’ as thetruthfunction: it assigns the value 1 to all true sentences, and 0 to all false sentences. Then trivially, all the axioms come out true, so this interpretation is admissible. We would hardly count it as an adequateinterpretation ofprobability, however, and so we need to exclude it. It is essential to probability that, at least in principle, it can takeintermediatevalues. All of the interpretations that we will present meet this criterion, so we will discuss it no more.

Applicability to frequencies:an interpretation should render perspicuous the relationship between probabilities and (long-run) frequencies. Among other things, it should make clear why, by and large, more probable events occur more frequently than less probable events.

Applicability to rational belief:an interpretation should clarify the role that probabilities play in constraining the degrees of belief, orcredences, of rational agents. Among other things, knowing that one event is more probable than another, a rational agent will be more confident about the occurrence of the former event.

Applicability to ampliative inference:an interpretation will score bonus points if it illuminates the distinction between ‘good’ and ‘bad’ ampliative inferences, while explicating why both fall short of deductive inferences.

The next criterion may be redundant, given our list so far, but including it will do no harm:

Applicability to science:an interpretation should illuminate paradigmatic uses of probability in science (for example, in quantum mechanics and statistical mechanics).

Perhaps there are further *metaphysical* desiderata that we
might impose on the interpretations. For example, there appear to be
connections between probability and *modality.* Events with
positive probability *can* happen, even if they don't. Some
authors also insist on the converse condition that *only*
events with positive probability can happen, although this is more
controversial -- see our discussion of ‘regularity’ in
Section 4. (Indeed, in uncountable probability spaces this condition
will require the employment of infinitesimals, and will thus take us
beyond the standard Kolmogorov theory -- ‘standard’ both in
the sense of being the orthodoxy, and in its employment of standard,
as opposed to ‘non-standard’ real numbers. See Skyrms
1980.) In any case, our list is already long enough to help in our
assessment of the leading interpretations on the market.

## 3. The Main Interpretations

### 3.1 Classical Probability

The classical interpretation owes its name to its early and august pedigree. Championed by Laplace, and found even in the works of Pascal, Bernoulli, Huygens, and Leibniz, it assigns probabilities in the absence of any evidence, or in the presence of symmetrically balanced evidence. The guiding idea is that in such circumstances, probability is shared equally among all the possible outcomes, so that the classical probability of an event is simply the fraction of the total number of possibilities in which the event occurs. It seems especially well suited to those games of chance that by their very design create such circumstances -- for example, the classical probability of a fair die landing with an even number showing up is 3/6. It is often presupposed (usually tacitly) in textbook probability puzzles.

Here is a classic statement by Laplace:

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible. (1814, 1951 6-7)

There are numerous questions to be asked about this formulation. When are events of the same kind? Intuitively, ‘heads’ and ‘tails’ are equally likely outcomes of tossing a fair coin; but if their kind is ‘ways the coin could land’, then ‘edge’ should presumably be counted alongside them. The “certain number of cases” and “that of all the cases possible” are presumably finite numbers. What, then, of probabilities in infinite spaces? Apparently, irrational-valued probabilities such as 1/√2 are automatically eliminated, and thus theories such as quantum mechanics that posit them cannot be accommodated. (We will shortly see, however, that Laplace's theory has been refined to handle infinite spaces.)

Who are “we”, who “may be equally undecided”?
Different people may be equally undecided about different things,
which suggests that Laplace is offering a subjectivist interpretation
in which probabilities vary from person to person depending on
contingent differences in their evidence. This is not his
intention. He means to characterize the objective probability
assignment of a rational agent in an epistemically neutral position
with respect to a set of “equally possible” cases. But
then the proposal risks sounding empty: for what is it for an agent
to *be* “equally undecided” about a set of cases,
other than assigning them equal probability?

This brings us to one of the key objections to Laplace's account. The
notion of “equally possible” cases faces the charge of
either being a category mistake (for ‘possibility’ does not
come in degrees), or circular (for what is meant is really
‘equally probable’). The notion is finessed by the so-called
‘principle of indifference’, a coinage due to Keynes. It
states that whenever there is no evidence favoring one possibility
over another, they have the same probability. Thus, it is claimed,
there is no circularity in the classical definition after
all. However, this move may only postpone the problem, for there is
still a threat of circularity, albeit at a lower level. We have two
cases here: outcomes for which we have *no evidence at all*,
and outcomes for which we have *symmetrically balanced
evidence*. There is no circularity in the first case unless the
notion of ‘evidence’ is itself probabilistic; but artificial
examples aside, it is doubtful that the case ever arises. For example,
we have a considerable fund of evidence on coin tossing from the
results of our own experiments, the testimony of others, our knowledge
of some of the relevant physics, and so on. In the second case, the
threat of circularity is more apparent, for it seems that some sort of
*weighing* of the evidence in favor of each outcome is
required, and it is not obvious that this can be done without
reference to probability. Indeed, the most obvious characterization of
symmetrically balanced evidence is in terms of equality of conditional
probabilities: given evidence *E* and possible outcomes
*O*_{1}, *O*_{2}, …,
*O _{n}*, the evidence is symmetrically balanced iff

*P*(

*O*

_{1}|

*E*) =

*P*(

*O*

_{2}|

*E*) = … =

*P*(

*O*

_{n}|

*E*). Then it seems that probabilities reside at the base of the interpretation after all. Still, it would be an achievement if all probabilities could be reduced to cases of equal probability.

As we have seen, Laplace's classical theory is restricted to
finite sample spaces. When the spaces are countably infinite, the
spirit of the classical theory may be upheld by appealing to the
information-theoretic principle of *maximum entropy*, a
generalization of the principle of indifference. Entropy is a measure
of the lack of ‘informativeness’ of a probability
distribution. The more concentrated is the distribution, the less is
its entropy; the more diffuse it is, the greater is its entropy. For a
discrete distribution *P* = (*p*_{1},
*p*_{2}, …), the entropy of *P* is defined
as:

The principle of maximum entropy enjoins us to select from the family of all distributions consistent with our background knowledge the distribution that maximizes this quantity. In the special case of choosing the most uninformative prior over a finite set of possible outcomes, this is just the familiar ‘flat’ classical distribution discussed previously. Things get more complicated in the infinite case, since there cannot be a flat distribution over denumerably many outcomes, on pain of violating the standard probability calculus (with countable additivity). Rather, the best we can have are sequences of progressively flatter distributions, none of which is truly flat. We must then impose some

*further*constraint that narrows the field to a smaller family in which there

*is*a distribution of maximum entropy.

^{[3]}This constraint has to be imposed from outside as background knowledge, but there is no general theory of which external constraint should be applied when.

Let us turn now to uncountably infinite spaces. It is easy -- all too easy -- to assign equal probabilities to the points in such a space: each gets probability 0. Non-trivial probabilities arise when uncountably many of the points are clumped together in larger sets. If there are finitely many clumps, Laplace's classical theory may be appealed to again: if the evidence bears symmetrically on these clumps, each gets the same share of probability.

Enter Bertrand's paradoxes. They all arise in uncountable spaces and
turn on alternative parametrizations of a given problem that are
non-linearly related to each other. Some presentations are needlessly
arcane; length and area suffice to make the point. The following
example (adapted from van Fraassen 1989) nicely illustrates how
Bertrand-style paradoxes work. A factory produces cubes with
side-length between 0 and 1 foot; what is the probability that a
randomly chosen cube has side-length between 0 and 1/2 a foot? The
tempting answer is 1/2, as we imagine a process of production that is
uniformly distributed over side-length. But the question could have
been given an equivalent restatement: A factory produces cubes with
face-area between 0 and 1 square-feet; what is the probability that a
randomly chosen cube has face-area between 0 and 1/4 square-feet? Now
the tempting answer is 1/4, as we imagine a process of production
that is uniformly distributed over face-area. This is already
disastrous, as we cannot allow the same event to have two different
probabilities (especially if this interpretation is to be
admissible!). But there is worse to come, for the problem could have
been restated equivalently again: A factory produces cubes with
volume between 0 and 1 cubic feet; what is the probability that a
randomly chosen cube has volume between 0 and 1/8 cubic-feet? Now
the tempting answer is 1/8, as we imagine a process of production
that is uniformly distributed over volume. And so on for all of the
infinitely many equivalent reformulations of the problem (in terms of
the fourth, fifth, … power of the length, and indeed in terms of
every non-zero real-valued exponent of the length). What, then, is
*the* probability of the event in question?

The paradox arises because the principle of indifference can be used in incompatible ways. We have no evidence that favors the side-length lying in the interval [0, 1/2] over its lying in [1/2, 1], or vice versa, so the principle requires us to give probability 1/2 to each. Unfortunately, we also have no evidence that favors the face-area lying in any of the four intervals [0, 1/4], [1/4, 1/2], [1/2, 3/4], and [3/4, 1] over any of the others, so we must give probability 1/4 to each. The event ‘the side-length lies in [0, 1/2]’, receives a different probability when merely redescribed. And so it goes, for all the other reformulations of the problem. We cannot meet any pair of these constraints simultaneously, let alone all of them.

How does the classical theory of probability fare with respect to
our criteria of adequacy? Let us begin with admissibility. It is
claimed that (Laplacean) classical probabilities are only finitely
additive (see, e.g., de Finetti 1974). It would be more correct to
say that classical probabilities are countably additive, but
trivially so. As we have seen, classical probabilities are only
defined on finite sample spaces. The
statement
3′ of countable additivity, recall, is a
conditional; its antecedent, “{*A _{i}*} is a
countably infinite collection of (pairwise) disjoint sets,” is
never satisfied in such spaces. Thus, the conditional is vacuously
true.

Classical probabilities are ascertainable, assuming that the space
of possibilities can be determined in principle. They bear a
relationship to the credences of rational agents; the concern, as we
saw above, is that the relationship is vacuous, and that rather than
*constraining* the credences of a rational agent in an
epistemically neutral position, they merely record them.

Without supplementation, the classical theory makes no contact with frequency information. However the coin happens to land in a sequence of trials, the possible outcomes remain the same. Indeed, even if we have strong empirical evidence that the coin is biased towards heads with probability, say, 0.6, it is hard to see how the unadorned classical theory can accommodate this fact -- for what now are the ten possibilities, six of which are favorable to heads? Laplace does supplement the theory with his Rule of Succession: "Thus we find that an event having occurred successively any number of times, the probability that it will happen again the next time is equal to this number increased by unity divided by the same number, increased by two units." (1951, 19) That is:

Pr(success onN+1st trial |Nconsecutive successes)= N+1N+2

Thus, inductive learning is possible. We must ask, however, whether such learning can be captured once and for all by such a simple formula, the same for all domains and events. We will return to this question when we discuss the logical interpretation below.

Science apparently invokes at various points probabilities that look
classical. Bose-Einstein statistics, Fermi-Dirac statistics, and
Maxwell-Boltzmann statistics each arise by considering the ways in
which particles can be assigned to states, and then applying the
principle of indifference to different subdivisions of the set of
alternatives, Bertrand-style. The trouble is that Bose-Einstein
statistics apply to some particles (e.g. photons) and not to others,
Fermi-Dirac statistics apply to different particles (e.g.
electrons), and Maxwell-Boltzmann statistics do not apply to any
known particles. None of this can be determined *a priori*, as
the classical interpretation would have it. Moreover, the classical
theory purports to yield probability assignments in the face of
ignorance. But as Fine (1973) writes:

If we are truly ignorant about a set of alternatives, then we are also ignorant about combinations of alternatives and about subdivisions of alternatives. However, the principle of indifference when applied to alternatives, or their combinations, or their subdivisions, yields different probability assignments (170).

This brings us to one of the chief points of controversy regarding the classical interpretation. Critics accuse the principle of indifference of extracting information from ignorance. Proponents reply that it rather codifies the way in which such ignorance should be epistemically managed -- for anything other than an equal assignment of probabilities would represent the possession of some knowledge. Critics counter-reply that in a state of complete ignorance, it is better to assign vague probabilities (perhaps vague over the entire [0, 1] interval), or to eschew the assignment of probabilities altogether.

### 3.2 Logical probability

Logical theories of probability retain the classical
interpretation's idea that probabilities can be determined a
priori by an examination of the space of possibilities. However, they
generalize it in two important ways: the possibilities may be assigned
*unequal* weights, and probabilities can be computed whatever
the evidence may be, symmetrically balanced or not. Indeed, the
logical interpretation, in its various guises, seeks to encapsulate in
full generality the degree of support or confirmation that a piece of
evidence *E* confers upon a given hypothesis *H*, which
we may write as *c*(*H*, *E*). In doing so, it
can be regarded also as generalizing deductive logic and its notion of
implication, to a complete theory of inference equipped with the
notion of ‘degree of implication’ that relates *E* to
*H*. It is often called the theory of ‘inductive
logic’, although this is a misnomer: there is no requirement that
*E* be in any sense ‘inductive’ evidence for
*H*. ‘Non-deductive logic’ would be a better name,
yet even this overlooks the fact that deductive logic's relations
of implication and incompatibility are also accommodated as extreme
cases in which the confirmation function takes the values 1 and 0
respectively. Nevertheless, what is significant is that the logical
interpretation provides a framework for induction.

Early proponents of logical probability include Johnson (1921),
Keynes (1921), and Jeffreys (1939). However, by far the most
systematic study of logical probability was by Carnap. His formulation
of logical probability begins with the construction of a formal
language. In (1950) he considers a class of very simple languages
consisting of a finite number of logically independent monadic
predicates (naming properties) applied to countably many individual
constants (naming individuals) or variables, and the usual logical
connectives. The strongest (consistent) statements that can be made in
a given language describe all of the individuals in as much detail as
the expressive power of the language allows. They are conjunctions of
complete descriptions of each individual, each description itself a
conjunction containing exactly one occurrence (negated or unnegated)
of each predicate of the language. Call these strongest statements
*state descriptions*.

Any probability measure *m*(−) over the
state descriptions automatically extends to a measure over all
sentences, since each sentence equivalent to a disjunction of state
descriptions; m in turn induces a confirmation function
*c*(−, −):

c(h,e)= m(h&e)m(e)

There are obviously infinitely many candidates for *m*, and
hence *c*, even for very simple languages. Carnap argues for
his favored measure “*m**” by insisting that the only
thing that significantly distinguishes individuals from one another is
some qualitative difference, not just a difference in labeling. Call a
*structure description* a maximal set of state descriptions,
each of which can be obtained from another by some permutation of the
individual names. *m** assigns each structure description equal
measure, which in turn is divided equally among their constituent
state descriptions. It gives greater weight to homogenous state
descriptions than to heterogeneous ones, thus ‘rewarding’
uniformity among the individuals in accordance with putatively
reasonable inductive practice. The induced *c** allows
inductive learning from experience.

Consider, for example, a language that has three names, *a*,
*b* and *c*, for individuals, and one predicate *F*. For
this language, the state descriptions are:

*Fa*&*Fb*&*Fc*- ¬
*Fa*&*Fb*&*Fc* *Fa*& ¬*Fb*&*Fc**Fa*&*Fb*& ¬*Fc*- ¬
*Fa*& ¬*Fb*&*Fc* - ¬
*Fa*&*Fb*& ¬*Fc* *Fa*& ¬*Fb*& ¬*Fc*- ¬
*Fa*& ¬*Fb*& ¬*Fc*

There are four structure descriptions:

{1}, “Everything is F”;{2, 3, 4}, “Two

Fs, one ¬F”;{5, 6, 7}, “One

F, two ¬Fs”; and{8}, “Everything is ¬

F”.

The measure *m** assigns numbers to the state descriptions as
follows: first, every structure description is assigned an equal
weight, 1/4; then, each state description belonging to a given
structure description is assigned an equal part of the weight assigned
to the structure description:

State descriptionStructure descriptionWeightm*1. Fa.Fb.FcI. Everything is F1/4 1/4 2. ¬ Fa.Fb.Fc1/12 3. Fa.¬Fb.FcII. Two Fs, one ¬F1/4 1/12 4. Fa.Fb.¬Fc1/12 5. ¬ Fa.¬Fb.Fc1/12 6. ¬ Fa.Fb.¬FcIII. One F, two ¬Fs1/4 1/12 7. Fa.¬Fb.¬Fc1/12 8. ¬ Fa.¬Fb.¬FcIV. Everything is ¬ F1/4 1/4

Notice that *m** gives greater weight to homogenous state
descriptions than to heterogeneous ones, thus ‘rewarding’
uniformity among the individuals, in accordance with putatively
reasonable inductive practice. This will manifest itself in the
inductive support that hypotheses can gain from appropriate evidence
statements. Consider the hypothesis statement *h* =
*Fc*, true in 4 of the 8 state descriptions, with *a
priori* probability *m**(*h*) = 1/2. Suppose we
examine individual “*a*” and find it has property
*F* -- call this evidence *e*. Intuitively, *e*
is favorable (albeit weak) inductive evidence for *h*. We have:
*m**(*h* & *e*) = 1/3,
*m**(*e*) = 1/2, and hence

c*(h,e)= m*(h&e)m*(e)= 2/3.

This is greater than the *a priori* probability
*m**(*h*) = 1/2, so the hypothesis has been confirmed.
It can be shown that in general *m** yields a degree of
confirmation *c** that allows learning from experience.

Note, however, that infinitely many confirmation functions, defined
by suitable choices of the initial measure, allow learning from
experience. We do not have yet a reason to think that *c** is
the right choice. Carnap claims nevertheless that *c** stands
out for being simple and natural.

He later generalizes his confirmation function to a continuum of
functions
*c*_{λ}.
Define a *family* of predicates to be a set of predicates
such that, for each individual, exactly one member of the set
applies, and consider first-order languages containing a finite
number of families. Carnap (1963) focuses on the special case of a
language containing only one-place predicates. He lays down a host of
axioms concerning the confirmation function *c*, including
those induced by the probability calculus itself, various axioms of
symmetry (for example, that *c*(*h*, *e*)
remains unchanged under permutations of individuals, and of
predicates of any family), and axioms that guarantee undogmatic
inductive learning, and long-run convergence to relative
frequencies. They imply that, for a family
{*P*_{n}}, *n* = 1, …, *k*
(*k* > 2):

c_{λ}(individual s+ 1 isP_{j},s_{j}of the firstsindividuals areP_{j})= ( s_{j}+ λ/k)s+ λ ,where λ is a positive real number.

The higher the value of
λ, the less impact evidence has: induction
from what is observed becomes progressively more swamped by a
classical-style equal assignment to each of the *k* possibilities
regarding individual *s* + 1.

The problem remains: what is the correct setting of λ, or said another way, how ‘inductive’ should the confirmation function be? Also, it turns out that for any such setting, a universal statement in an infinite universe always receives zero confirmation, no matter what the (finite) evidence. Many find this counterintuitive, since laws of nature with infinitely many instances can apparently be confirmed. Earman (1992) discusses the prospects for avoiding the unwelcome result.

Significantly, Carnap's various axioms of symmetry are hardly logical truths. More seriously, we cannot impose further symmetry constraints that are seemingly just as plausible as Carnap's, on pain of inconsistency -- see Fine (1973, 202). Goodman taught us: that the future will resemble the past in some respect is trivial; that it will resemble the past in all respects is contradictory. And we may continue: that a probability assignment can be made to respect some symmetry is trivial; that one can be made to respect all symmetries is contradictory. This threatens the whole program of logical probability.

Another Goodmanian lesson is that inductive logic must be sensitive to the meanings of predicates, strongly suggesting that a purely syntactic approach such as Carnap's is doomed. Scott and Krauss (1966) use model theory in their formulation of logical probability for richer and more realistic languages than Carnap's. Still, finding a canonical language seems to many to be a pipe dream, at least if we want to analyze the “logical probability” of any argument of real interest -- either in science, or in everyday life.

Logical probabilities are admissible. Countable additivity is vacuously satisfied in any finite language, since there cannot be an infinite sequence of pairwise incompatible sentences. Given a choice of language, the values of a given confirmation function are ascertainable; thus, if this language is rich enough for a given application, the relevant probabilities are ascertainable. The whole point of the theory of logical probability is to explicate ampliative inference, although given the apparent arbitrariness in the choice of language and in the setting of λ -- thus, in the choice of confirmation function -- one may wonder how well it achieves this. The problem of arbitrariness of the confirmation function also hampers the extent to which the logical interpretation can truly illuminate the connection between probabilities and frequencies.

The arbitrariness problem, moreover, stymies any compelling
connection between logical probabilities and rational credences. And a
further problem remains even after the confirmation function has been
chosen: if one's credences are to be based on logical probabilities,
they must be relativized to an evidence statement, *e*. But
which is to be? Carnap's recommendation is that *e* should be
one's *total evidence*, that is, the maximally specific
information at one's disposal, the strongest proposition of which one
is certain. However, when we go beyond toy examples, it is not clear
that this is well-defined. Suppose I have just watched a coin toss,
and thus learned that the coin landed heads. Perhaps ‘the coin
landed heads’ is my total evidence? But I also learned a host of
other things: as it might be, that the coin landed at a certain time,
bouncing in a certain way, making a certain noise as it did so …
Call this long conjunction of facts *X*. I also learned a
potentially infinite set of *de se* propositions: ‘I
learned that *X*’, ‘I learned that I learned that
*X*’ and so on. Perhaps, then, my total evidence is the
infinite intersection of all these propositions, although this is
still not obvious -- and it is not something that can be represented
by a sentence in one of Carnap's languages, which is finite in
length. More significantly, the total evidence criterion goes hand in
hand with positivism and a foundationalist epistemology according to
which there are such determinate, ultimate deliverances of
experience. But perhaps learning does not come in the form of such
‘bedrock’ propositions, as Jeffrey (1992) has argued --
maybe it rather involves a shift in one's subjective probabilities
across a partition, without any cell of the partition becoming
certain. Then it may be the case that the strongest proposition of
which one is certain is expressed by a tautology *T* -- hardly
an interesting notion of ‘total
evidence’.^{[4]}

In connection with the ‘applicability to science’
criterion, a point due to Lakatos is telling. By Carnap's lights, the
degree of confirmation of a hypothesis depends on the language in
which the hypothesis is stated and over which the confirmation
function is defined. But scientific progress often brings with it a
change in scientific language (for example, the addition of new
predicates and the deletion of old ones), and such a change will bring
with it a change in the corresponding *c*-values. Thus, the
growth of science may overthrow any particular confirmation
theory. There is something of the snake eating its own tail here,
since logical probability was supposed to explicate the confirmation
of scientific theories.

### 3.3 Frequency Interpretations

Gamblers, actuaries and scientists have long understood that relative
frequencies bear an intimate relationship to probabilities. Frequency
interpretations posit the most intimate relationship of all:
identity. Thus, we might identify the probability of ‘heads’
on a certain coin with the frequency of heads in a suitable sequence
of tosses of the coin, divided by the total number of tosses. A simple
version of frequentism, which we will call *finite
frequentism*, attaches probabilities to events or attributes in a
finite reference class in such a straightforward manner:

the probability of an attribute A in a finite reference class B is the relative frequency of actual occurrences of A within B.

Thus, finite frequentism bears certain structural similarities to the
classical interpretation, insofar as it gives equal weight to each
member of a set of events, simply counting how many of them are
‘favorable’ as a proportion of the total. The crucial
difference, however, is that where the classical interpretation
counted all the *possible* outcomes of a given experiment,
finite frequentism counts *actual* outcomes. It is thus
congenial to those with empiricist scruples. It was developed by Venn
(1876), who in his discussion of the proportion of births of males and
females, concludes: “probability *is* nothing but that
proportion” (p. 84, his emphasis).

Finite frequentism gives an operational definition of probability,
and its problems begin there. For example, just as we want to allow
that our thermometers could be ill-calibrated, and could thus give
misleading measurements of temperature, so we want to allow that our
‘measurements’ of probabilities via frequencies could be
misleading, as when a fair coin lands heads 9 out of 10 times. More
than that, it seems to be built into the very notion of probability
that such misleading results can arise. Indeed, in many cases,
misleading results are guaranteed. Starting with a degenerate case:
according to the finite frequentist, a coin that is never tossed, and
that thus yields no actual outcomes whatsoever, lacks a probability
for heads altogether; yet a coin that is never measured does not
thereby lack a diameter. Perhaps even more troubling, a coin that is
tossed exactly once yields a relative frequency of heads of either 0
or 1, whatever its bias. Famous enough to merit a name of its own,
this is the so-called ‘problem of the single case’. In fact,
many events are most naturally regarded as not merely unrepeated, but
in a strong sense *unrepeatable* -- the 2000 presidential
election, the final game of the 2001 NBA play-offs, the Civil War,
Kennedy's assassination, certain events in the very early history of
the universe. Nonetheless, it seems natural to think of non-extreme
probabilities attaching to some, and perhaps all, of them. Worse
still, some cosmologists regard it as a genuinely chancy matter
whether our universe is open or closed (apparently certain quantum
fluctuations could, in principle, tip it one way or the other), yet
whatever it is, it is ‘single-case’ in the strongest
possible sense.

The problem of the single case is particularly striking, but we
really have a sequence of related problems: ‘the problem of the
double case’, ‘the problem of the triple case’ …
Every coin that is tossed exactly twice can yield only the relative
frequencies 0, 1/2 and 1, whatever its bias… A finite reference
class of size *n*, however large *n* is, can only
produce relative frequencies at a certain level of ‘grain’,
namely 1/*n*. Among other things, this rules out irrational
probabilities; yet our best physical theories say
otherwise. Furthermore, there is a sense in which any of these
problems can be transformed into the problem of the single
case. Suppose that we toss a coin a thousand times. We can regard this
as a *single* trial of a thousand-tosses-of-the-coin
experiment. Yet we do not want to be committed to saying that
*that* experiment yields its actual result with probability
1.

The problem of the single case is that the finite frequentist fails
to see intermediate probabilities in various places where others
do. There is also the converse problem: the frequentist sees
intermediate probabilities in various places where others do not. Our
world has myriad different entities, with myriad different attributes.
We can group them into still more sets of objects, and then ask with
which relative frequencies various attributes occur in these
sets. Many such relative frequencies will be intermediate; the finite
frequentist automatically identifies them with intermediate
probabilities. But it would seem that whether or not they are genuine
*probabilities*, as opposed to mere tallies, depends on the
case at hand. Bare ratios of attributes among sets of disparate
objects may lack the sort of modal force that one might expect from
probabilities. I belong to the reference class consisting of myself,
the Eiffel Tower, the southernmost sandcastle on Santa Monica Beach,
and Mt Everest. Two of these four objects are less than 7 ft. tall, a
relative frequency of 1/2; moreover, we could easily extend this
class, preserving this relative frequency (or, equally easily,
not). Yet it would be odd to say that my *probability* of being
less than 7 ft. tall, relative to this reference class, is 1/2, even
though it is perfectly acceptable (if uninteresting) to say that 1/2
of the objects in the reference class are less than 7 ft. tall.

Some frequentists (notably Venn 1876, Reichenbach 1949, and von Mises
1957 among others), partly in response to some of the problems above,
have gone on to consider *infinite* reference classes,
identifying probabilities with *limiting* relative frequencies
of events or attributes therein. Thus, we require an infinite
sequence of trials in order to define such probabilities. But what if
the actual world does not provide an infinite sequence of trials of a
given experiment? Indeed, that appears to be the norm, and perhaps
even the rule. In that case, we are to identify probability with a
*hypothetical* or *counterfactual* limiting relative
frequency. We are to imagine hypothetical infinite extensions of an
actual sequence of trials; probabilities are then what the limiting
relative frequencies *would be* if the sequence were so
extended. Note that at this point we have left empiricism behind. A
modal element has been injected into frequentism with this invocation of
a counterfactual; moreover, the counterfactual may involve a radical
departure from the way things actually are, one that may even require
the breaking of laws of nature. (Think what it would take for the
coin in my pocket, which has only been tossed once, to be tossed
infinitely many times -- never wearing out, and never running short
of people willing to toss it!) One may wonder, moreover, whether
there is always -- or ever -- a fact of the matter of what such
counterfactual relative frequencies are.

Limiting relative frequencies, we have seen, must be relativized to a
sequence of trials. Herein lies another difficulty. Consider an
infinite sequence of the results of tossing a coin, as it might be H,
T, H, H, H, T, H, T, T, … Suppose for definiteness that the
corresponding relative frequency sequence for heads, which begins 1/1,
1/2, 2/3, 3/4, 4/5, 4/6, 5/7, 5/8, 5/9, …, converges to 1/2. By
suitably reordering these results, we can make the sequence converge
to any value in [0, 1] that we like. (If this is not obvious, consider
how the relative frequency of even numbers among positive integers,
which intuitively ‘should’ converge to 1/2, can instead be
made to converge to 1/4 by reordering the integers with the even
numbers in every fourth place, as follows: 1, 3, 5, 2, 7, 9, 11, 4,
13, 15, 17, 6, …) To be sure, there may be something natural
about the ordering of the tosses as given -- for example, it may be
their *temporal* ordering. But there may be more than one
natural ordering. Imagine the tosses taking place on a train that
shunts backwards and forwards on tracks that are oriented
west-east. Then the *spatial* ordering of the results from west
to east could look very different. Why should one ordering be
privileged over others?

A well-known objection to any version of frequentism is that
*relative* frequencies must be *relativised* to a
reference class. Consider a probability concerning myself that I care
about -- say, my probability of living to age 80. I belong to the
class of males, the class of non-smokers, the class of philosophy
professors who have two vowels in their surname, … Presumably the
relative frequency of those who live to age 80 varies across (most
of) these reference classes. What, then, is my probability of living
to age 80? It seems that there is no single frequentist
answer. Instead, there is my probability-qua-male, my
probability-qua-non-smoker, my probability-qua-male-non-smoker, and
so on. This is an example of the so-called *reference class
problem* for frequentism (although it can be argued that
analogues of the problem arise for the other interpretations as
well^{[5]}).
And as we have seen in the previous paragraph, the problem is only
compounded for limiting relative frequencies: probabilities must be
relativized not merely to a reference class, but to a sequence within
the reference class. We might call this the *reference sequence
problem.*

The beginnings of a solution to this problem would be to restrict
our attention to sequences of a certain kind, those with certain
desirable properties. For example, there are sequences for which the
limiting relative frequency of a given attribute does not exist;
Reichenbach thus excludes such sequences. Von Mises (1957) gives us a
more thoroughgoing restriction to what he calls *collectives*
-- hypothetical infinite sequences of attributes (possible
outcomes) of specified experiments that meet certain
requirements. Call a *place-selection* an effectively
specifiable method of selecting indices of members of the sequence,
such that the selection or not of the index *i* depends at most on the
first *i*
− 1 attributes. The axioms are:

Axiom of Convergence:the limiting relative frequency of any attribute exists.

Axiom of Randomness:the limiting relative frequency of each attribute in a collective ω is the same in any infinite subsequence of ω which is determined by a place selection.

The probability of an attribute *A*, relative to a collective
ω, is then defined as the limiting
relative frequency of *A* in
ω. Note that a constant sequence such as H,
H, H, …, in which the limiting relative frequency is the same in
*any* infinite subsequence, trivially satisfies the axiom of
randomness. This puts some strain on the terminology -- offhand, such
sequences appear to be as *non*-random as they come -- although
to be sure it is desirable that probabilities be assigned even in such
sequences. Be that as it may, there is a parallel between the role of
the axiom of randomness in von Mises' theory and the principle of
maximum entropy in the classical theory: both attempt to capture a
certain notion of disorder.

Collectives are abstract mathematical objects that are not empirically instantiated, but that are nonetheless posited by von Mises to explain the stabilities of relative frequencies in the behavior of actual sequences of outcomes of a repeatable random experiment. Church (1940) renders precise the notion of a place selection as a recursive function. Nevertheless, the reference sequence problem remains: probabilities must always be relativized to a collective, and for a given attribute such as ‘heads’ there are infinitely many. Von Mises embraces this consequence, insisting that the notion of probability only makes sense relative to a collective. In particular, he regards single case probabilities as nonsense: “We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us” (11). Some critics believe that rather than solving the problem of the single case, this merely ignores it. And note that von Mises understates the commitments of his theory: by his lights, the phrase ‘probability of death’ also has no meaning at all when it refers to a million people, or a billion, or any finite number.

Let us see how the frequentist interpretations fare according to our
criteria of adequacy. Finite relative frequencies of course satisfy
finite additivity. In a finite reference class, only finitely many
events can occur, so only finitely many events can have positive
relative frequency. In that case, countable additivity is satisfied
somewhat trivially: all but finitely many terms in the infinite sum
will be 0. Limiting relative frequencies violate countable additivity
(de Finetti 1972, §5.22). Indeed, the domain of definition of
limiting relative frequency is not even a field, let alone a sigma
field (de Finetti 1972, §5.8). So such relative frequencies do
not provide an admissible interpretation of Kolmogorov's
axioms. Finite frequentism has no trouble meeting the
ascertainability criterion, as finite relative frequencies are in
principle easily determined. The same cannot be said of limiting
relative frequencies. On the contrary, any finite sequence of trials
(which, after all, is all we ever see) puts literally no constraint
on the limit of an infinite sequence; still less does an
*actual* finite sequence put any constraint on the limit of an
infinite *hypothetical* sequence, however fast and loose we
play with the notion of ‘in principle’ in the ascertainability
criterion.

It might seem that the frequentist interpretations resoundingly meet
the applicability to frequencies criterion. Finite frequentism meets
it all too well, while limiting relative frequentism meets it in the
wrong way. If anything, finite frequentism makes the connection
between probabilities and frequencies *too* tight, as we have
already observed. A fair coin that is tossed a million times is very
*unlikely* to land heads *exactly* half the time; one
that is tossed a million and one times is even less likely to do so!
Facts about finite relative frequencies should serve as evidence, but
not *conclusive* evidence, for the relevant probability
assignments. Limiting relative frequentism fails to connect
probabilities with finite frequencies. It connects them with limiting
relative frequencies, of course, but again too tightly: for even in
infinite sequences, the two can come apart. (A fair coin could land
heads forever, even if it is highly unlikely to do so.) To be sure,
science has much interest in finite frequencies, and indeed working
with them is much of the business of statistics. Whether it has any
interest in highly idealized, hypothetical extensions of actual
sequences, and relative frequencies therein, is another matter. The
applicability to rational opinion goes much the same way: it is clear
that such opinion is guided by finite frequency information, unclear
that it is guided by information about limits of hypothetical
frequencies.

### 3.4 Propensity Interpretations

Like the frequency interpretations, *propensity*
interpretations locate probability ‘in the world’ rather
than in our heads or in logical abstractions. Probability is thought
of as a physical propensity, or disposition, or tendency of a given
type of physical situation to yield an outcome of a certain kind, or
to yield a long run relative frequency of such an outcome. This view
was motivated by the desire to make sense of single-case probability
attributions such as ‘the probability that this radium atom
decays in 1600 years is 1/2’. Indeed, Popper (1957) advances his
propensity theory as an account of such quantum mechanical
probabilities.

Popper develops the theory further in (1959a). For him, a
probability p of an outcome of a certain type is a propensity of a
repeatable experiment to produce outcomes of that type with limiting
relative frequency *p*. For instance, when we say that a coin has
probability 1/2 of landing heads when tossed, we mean that we have a
repeatable experimental set-up -- the tossing set-up -- that has a
propensity to produce a sequence of outcomes in which the limiting
relative frequency of heads is 1/2. With its heavy reliance on
limiting relative frequency, this position risks collapsing into von
Mises-style frequentism according to some critics. Giere (1973), on
the other hand, explicitly allows single-case propensities, with no
mention of frequencies: probability is just a propensity of a
repeatable experimental set-up to produce sequences of
outcomes. This, however, creates the opposite problem to Popper's:
how, then, do we get the desired connection between probabilities and
frequencies?

It is thus useful to follow Gillies (2000) in distinguishing
*long-run* propensity theories and *single-case*
propensity theories:

A long-run propensity theory is one in which propensities are associated with repeatable conditions, and are regarded as propensities to produce in a long series of repetitions of these conditions frequencies which are approximately equal to the probabilities. A single-case propensity theory is one in which propensities are regarded as propensities to produce a particular result on a specific occasion (822).

Hacking (1965) and Gillies offer long-run (though not infinitely
long-run) propensity theories; Fetzer (1982, 1983) and Miller (1994)
offer single-case propensity theories. Note that
‘propensities’ are categorically different things depending
on which sort of theory we are considering. According to the long-run
theories, propensities are tendencies to produce relative
frequencies with particular values, but the propensities are not the
probability values themselves; according to the single-case theories,
the propensities *are* the probability values. According to
Popper, for example, a fair die has a propensity -- an *extremely
strong* tendency -- to land ‘3’ with long-run relative
frequency 1/6. The small value of 1/6 does *not* measure this
tendency. According to Giere, on the other hand, the die has a
*weak* tendency to land ‘3’. The value of 1/6
*does* measure this tendency.

It seems that those theories that tie propensities to frequencies do
not provide an admissible interpretation of the (full) probability calculus,
for the same reasons that relative frequencies do not. It is *prima
facie* unclear whether single-case propensity theories obey the
probability calculus or not. To be sure, one can *stipulate*
that they do so, perhaps using that stipulation as part of the
implicit definition of propensities. Still, it remains to be shown
that there really are such things -- stipulating what a witch is does
not suffice to show that witches exist. Indeed, to claim, as Popper
does, that an experimental arrangement has a tendency to produce a
given limiting relative frequency of a particular outcome, presupposes
a kind of stability or uniformity in the workings of that arrangement
(for the limit would not exist in a suitably *unstable*
arrangement). But this is the sort of ‘uniformity of nature’
presupposition that Hume argued could not be known either *a
priori*, or empirically. Now, appeals can be made to limit
theorems -- so called ‘laws of large numbers’ -- whose
content is roughly that under suitable conditions, such limiting
relative frequencies almost certainly exist, and equal the single case
propensities. Still, these theorems make assumptions (e.g., that the
trials are independent and identically distributed) whose truth again
cannot be known, and must merely be postulated.

Part of the problem here, say critics, is that we do not know enough
about what propensities are to adjudicate these issues. There is
*some* property of this coin tossing arrangement such that this
coin would land heads with a certain long-run frequency, say. But as
Hitchcock (2002) points out, “calling this property a
‘propensity’ of a certain strength does little to indicate
just what this property is.” Said another way, propensity
accounts are accused of giving empty accounts of probability, à
la Molière's ‘dormative virtue’ (Sober 2000,
64). Similarly, Gillies objects to single-case propensities on the
grounds that statements about them are untestable, and that they are
“metaphysical rather than scientific” (825). Some might
level the same charge even against long-run propensities, which are
supposedly *distinct* *from* the testable relative
frequencies.

This suggests that the propensity account has difficulty meeting the applicability to science criterion. Some propensity theorists (e.g., Giere) liken propensities to physical magnitudes such as electrical charge that are the province of science. But Hitchcock observes that the analogy is misleading. We can only determine the general properties of charge -- that it comes in two varieties, that like charges repel, and so on -- by empirical investigation. What investigation, however, could tell us whether or not propensities are non-negative, normalized and additive?

More promising, perhaps, is the idea that propensities are to play
certain theoretical roles, and that these place constraints on the way
they must behave, and hence what they could be (in the style of the
Ramsey/Lewis/‘Canberra plan’ approach to theoretical terms
-- see Lewis 1970 or Jackson 2000). The trouble here is that these
roles may pull in opposite directions, *overconstraining* the
problem. The first role, according to some, constrains them to obey
the probability calculus (with finite additivity); the second role,
according to others, constrains them to violate it.

On the one hand, propensities are said to constrain the degrees of
belief, or *credences*, of a rational agent. We will have more
to say in the next section about what credences are and what makes
them rational, but for now recall the ‘applicability to rational
belief’ criterion: an interpretation should clarify the role that
probabilities play in constraining the credences of rational
agents. One such putative role for propensities is codified by Lewis'
‘Principal Principle’ (1980). Roughly, the principle is that
rational credences strive to ‘track’ propensities --
sometimes called “chances” -- so that if a rational agent
knows the propensity of a given outcome, her degree of belief will be
the same. More generally, where ‘*P*’ is the
subjective probability function of a rational agent, and
‘*ch*’ is the propensity (chance) function,

P(A|ch(A) =x) =x, for allAand for allxsuch thatP(ch(A) =x) > 0.^{[6]}

For example, my degree of belief that this coin toss lands heads, given that its propensity of landing heads is 3/4, is 3/4. The Principal Principle underpins an argument (Lewis 1980) that whatever they are, propensities must obey the usual probability calculus (with finite additivity). After all, it is argued, rational credences, which are guided by them, do -- see the next section.

On the other hand, Humphreys (1985) gives an influential argument
that propensities do *not* obey the probability calculus. The
idea is that the probability calculus implies *Bayes'
theorem*, which allows us to ‘invert’ a conditional
probability:

P(A|B)= P(B|A).P(A)P(B)

Yet propensities seem to be measures of ‘causal
tendencies’, and much as the causal relation is asymmetric, so these
propensities supposedly do not invert. Suppose that we have a test
for an illness that occasionally gives false positives and false
negatives. A given sick patient may have a (non-trivial) propensity
to give a positive test result, but it apparently makes no sense to
say that a given positive test result has a (non-trivial) propensity
to have come from a sick patient. ‘Humphreys' paradox’, as it is
known, has prompted Fetzer and Nute (in Fetzer 1981) to offer a
“probabilistic causal calculus” which looks quite different
from Kolmogorov's calculus. Thus, we have an argument that whatever
they are, propensities must *not* obey the usual probability
calculus.^{[7]}

Perhaps all this shows that the notion of ‘propensity’ bifurcates: on the one hand, there are propensities that bear an intimate connection to relative frequencies and rational credences, and that obey the probability calculus (with finite additivity); on the other hand, there are causal propensities that behave rather differently. In that case, there would be still more interpretations of probability than have previously been recognized.

### 3.5 Subjective probability

#### 3.5.1 Probability as degree of belief

We may characterize *subjectivism* (also known as
*personalism* and* subjective Bayesianism*) with the
slogan: ‘Probability is degree of belief’. We identify
probabilities with degrees of confidence, or credences, or
“partial” beliefs of suitable agents. Thus, we really have
*many* interpretations of probability here, as many as there
are doxastic states of suitable agents: we have Aaron's degrees of
belief, Abel's degrees of belief, Abigail's degrees of
belief, … , or better still, Aaron's degrees of
belief-at-time-*t*_{1}, Aaron's degrees of
belief-at-time-*t*_{2}, Abel's degrees of
belief-at-time-*t*_{1}, … . Of course,
we must ask what makes an agent ‘suitable’. What we might
call *unconstrained subjectivism* places no constraints on the
agents -- anyone goes, and hence anything goes. Various studies by
psychologists (see, e.g., several articles in Kahneman et al. 1982)
show that people commonly violate the usual probability calculus in
spectacular ways. We clearly do not have here an admissible
interpretation (with respect to any probability calculus), since
there is no limit to what agents might assign. Unconstrained
subjectivism is not a serious proposal.

More interesting, however, is the claim that the suitable agents
must be, in a strong sense, *rational*. Beginning with Ramsey
(1926), various subjectivists have wanted to assimilate probability
to logic by portraying probability as the logic of partial belief. A
rational agent is required to be logically consistent, now taken in a
broad sense. These subjectivists argue that this implies that the
agent obeys the axioms of probability (although perhaps with only
finite additivity), and that subjectivism is thus (to this extent)
admissible. Before we can present this argument, we must say more
about what degrees of belief are.

#### 3.5.2 The betting interpretation and the Dutch Book argument

Subjective probabilities are traditionally analyzed in terms of betting behavior. Here is a classic statement by de Finetti (1980):

Let us suppose that an individual is obliged to evaluate the ratepat which he would be ready to exchange the possession of an arbitrary sumS(positive or negative) dependent on the occurrence of a given eventE, for the possession of the sumpS; we will say by definition that this numberpis the measure of the degree of probability attributed by the individual considered to the eventE, or, more simply, thatpis the probability ofE(according to the individual considered; this specification can be implicit if there is no ambiguity). (62)

This boils down to the following analysis:

Your degree of belief inEispiffpunits of utility is the price at which you would buy or sell a bet that pays 1 unit of utility ifE, 0 if notE.

The analysis presupposes that, for any *E*, there is exactly
one such price -- let's call this the agent's *fair
price* for the bet on *E*. This presupposition may
fail. There may be no such price -- you may refuse to bet on
*E* at all (perhaps unless coerced, in which case your genuine
opinion about *E* may not be revealed), or your selling price
may differ from your buying price, as may occur if your probability
for *E* is vague. There may be more than one fair price -- you
may find a range of such prices acceptable, as may also occur if your
probability for *E* is vague. For now, however, let us waive
these concerns, and turn to an argument that uses the betting
interpretation purportedly to show that rational degrees of belief
must conform to the probability calculus (with at least finite
additivity).

A *Dutch book* (against an agent) is a series of bets, each
acceptable to the agent, but which collectively guarantee her loss,
however the world turns out. Ramsey notes, and it can be easily proven
(e.g., Skyrms 1984), that if your subjective probabilities violate the
probability calculus, then you are susceptible to a Dutch book. For
example, suppose that you violate the additivity axiom by assigning
*P*(*A*
∪ *B*) < *P*(*A*) +
*P*(*B*), where *A* and *B* are mutually
exclusive. Then a cunning bettor could buy from you a bet on
*A* ∪ *B* for *P*(*A*
∪ *B*) units, and sell you bets on
*A* and *B* individually for *P*(*A*) and
*P*(*B*) units respectively. He pockets an initial
profit of *P*(*A*) + *P*(*B*)
−
*P*(*A* ∪ *B*), and retains it
whatever happens. Ramsey offers the following influential gloss: “If
anyone's mental condition violated these laws [of the probability
calculus], his choice would depend on the precise form in which the
options were offered him, which would be absurd.” (1980, 41)

Equally important, and often neglected, is the converse theorem that
establishes how you can avoid such a predicament. If your subjective
probabilities conform to the probability calculus, then no Dutch book
can be made against you (Kemeny 1955); your probability assignments
are then said to be *coherent*. In a nutshell, conformity to
the probability calculus is necessary and sufficient for
coherence.^{[8]}

But let us return to the betting analysis of credences. It is an
attempt to make good on Ramsey's idea that probability “is a
measurement of belief *qua* basis of action” (34). While
he regards the method of measuring an agent's credences by her
betting behavior as “fundamentally sound” (34), he recognizes
that it has its limitations.

The betting analysis gives an operational definition of subjective probability, and indeed it inherits some of the difficulties of operationalism in general, and of behaviorism in particular. For example, you may have reason to misrepresent your true opinion, or to feign having opinions that in fact you lack, by making the relevant bets (perhaps to exploit an incoherence in someone else's betting prices). Moreover, as Ramsey points out, placing the very bet may alter your state of opinion. Trivially, it does so regarding matters involving the bet itself (e.g., you suddenly increase your probability that you have just placed a bet). Less trivially, placing the bet may change the world, and hence your opinions, in other ways (betting at high stakes on the proposition ‘I will sleep well tonight’ may suddenly turn you into an insomniac). And then the bet may concern an event such that, were it to occur, you would no longer value the pay-off the same way. (During the August 11, 1999 solar eclipse in the UK, a man placed a bet that would have paid a million pounds if the world came to an end.)

These problems stem largely from taking literally the notion of
entering into a bet on *E*, with its corresponding payoffs. The
problems may be avoided by identifying your degree of belief in a
proposition with the betting price you regard as fair, whether or not
you enter into such a bet; it corresponds to the betting odds that
you believe confer no advantage or disadvantage to either side of the
bet (Howson and Urbach 1993). There is something of the Rawlsian
‘veil of ignorance’ reasoning here: imagine that you are to
set the price for the bet, but you do not yet know which side of the
bet you are to take. At your fair price, you should be indifferent
between taking either
side.^{[9]}

De Finetti speaks of “an arbitrary sum” as the prize of the
bet on *E*. The sum had better be potentially infinitely
divisible, or else probability measurements will be precise only up to
the level of ‘grain’ of the potential prizes. For example, a
sum that can be divided into only 100 parts will leave probability
measurements imprecise beyond the second decimal place, conflating
probabilities that should be distinguished (e.g., those of a logical
contradiction and of ‘a fair coin lands heads 8 times in a row’).
More significantly, if utility is not a linear function of such sums,
then the size of the prize will make a difference to the putative
probability: winning a dollar means more to a pauper more than it does
to Bill Gates, and this may be reflected in their betting behaviors in
ways that have nothing to do with their genuine probability
assignments. De Finetti responds to this problem by suggesting that
the prizes be kept small; that, however, only creates the opposite
problem that agents may be reluctant to bother about trifles, as
Ramsey points out.

Better, then, to let the prizes be measured in utilities: after all, utility is infinitely divisible, and utility is a linear function of utility.

#### 3.5.3 Probabilities and utilities

Utilities (desirabilities) of outcomes, their probabilities, and
rational preferences are all intimately linked. The *Port Royal
Logic* (Arnauld, 1662) showed how utilities and probabilities
together determine rational preferences; de Finetti's betting
interpretation derives probabilities from utilities and rational
preferences; von Neumann and Morgenstern (1944) derive utilities from
probabilities and rational preferences. And most remarkably, Ramsey
(1926) (and later, Savage 1954 and Jeffrey 1966) derives *both*
probabilities *and* utilities from rational preferences
alone.

First, he defines a proposition to be* ethically neutral* --
relative to an agent and an outcome -- if the agent is indifferent
between having that outcome when the proposition is true and when it
is false. The idea is that the agent doesn't care about the ethically
neutral proposition as such -- it is a means to a end that he might
care about, but it has no intrinsic value. Now, there is a simple test
for determining whether, for a given agent, an ethically neutral
proposition *N* has probability 1/2. Suppose that the agent
prefers *A* to *B*. Then *N* has probability 1/2
iff the agent is indifferent between the gambles:

AifN,Bif not

BifN,Aif not.

Ramsey assumes that it does not matter what the candidates for
*A* and *B* are. We may assign arbitrarily to
*A* and *B* any two real numbers *u*(*A*)
and *u*(*B*) such that *u*(*A*) >
*u*(*B*), thought of as the desirabilities of *A*
and *B* respectively. Having done this for the one arbitrarily
chosen pair *A* and *B*, the utilities of all other
propositions are determined.

Given various assumptions about the richness of the preference space,
and certain ‘consistency assumptions’, he can define a
real-valued utility function of the outcomes *A*, *B*,
etc -- in fact, various such functions will represent the agent's
preferences. He is then able to define equality of differences in
utility for any outcomes over which the agent has preferences. It
turns out that ratios of utility-differences are invariant -- the same
whichever representative utility function we choose. This fact allows
Ramsey to define degrees of belief as ratios of such differences. For
example, suppose the agent is indifferent between *A*, and the
gamble “*B* if *X*, *C* otherwise.”
Then it follows from considerations of expected utility that her
degree of belief in *X*, *P*(*X*), is given
by:

P(X)= u(A) −u(C)u(B) −u(C)

Ramsey shows that degrees of belief so derived obey the probability calculus (with finite additivity). He calls what results “the logic of partial belief,” and indeed he opens his essay with the words “In this essay the Theory of Probability is taken as a branch of logic….”

Ramsey avoids some of the objections to the betting interpretation,
but not all of them. Notably, the essential appeal to gambles again
raises the concern that the wrong quantities are being measured. And
his account has new difficulties. It is unclear what facts about
agents fix their preference rankings. It is also dubious that
*consistency* requires one to have a set of preferences as
rich as Ramsey requires, or that one can find ethically neutral
propositions of probability 1/2. This in turn casts some doubt on
Ramsey's claim to assimilate probability theory to logic.

Savage (1954) likewise derives probabilities and utilities from
preferences among options that are constrained by certain putative
'consistency' principles. For a given set of such preferences, he
generates a class of utility functions, each a positive linear
transformation of the other (i.e. of the form *U*_{1} =
*aU*_{2} + *b*, where *a* > 0), and a
unique probability function. Together these are said to
‘represent’ the agent's preferences. Jeffrey (1966) refines
the method further. The result is theory of decision according to
which rational choice maximizes ‘expected utility’, a certain
probability-weighted average of utilities. Some of the difficulties
with the behavioristic betting analysis of degrees of belief can now
be resolved by moving to an analysis of degrees of belief that is
functionalist in spirit. According to Lewis (1986a, 1994a), an
agent's degrees of belief are represented by the probability function
belonging to a utility function/probability function pair that best
rationalizes her behavioral dispositions, rationality being given a
decision-theoretic analysis.

There is a deep issue that underlies all of these accounts of subjective probability. They all presuppose the existence of necessary connections between desire-like states and belief-like states, rendered explicit in the connections between preferences and probabilities. In response, one might insist that such connections are at best contingent, and indeed can be imagined to be absent. Think of an idealized Zen Buddhist monk, devoid of any preferences, who dispassionately surveys the world before him, forming beliefs but no desires. It could be replied that such an agent is not so easily imagined after all -- even if the monk does not value worldly goods, he will still prefer some things to others (e.g., truth to falsehood).

Once desires enter the picture, they may also have unwanted consequences. For example, how does one separate an agent's enjoyment or disdain for gambling from the value she places on the gamble itself? Ironically, a remark that Ramsey makes in his critique of the betting interpretation seems apposite here: “The difficulty is like that of separating two different co-operating forces” (1980, 35).

The betting interpretation makes subjective probabilities ascertainable to the extent that an agent's betting dispositions are ascertainable. The derivation of them from preferences makes them ascertainable to the extent that his or her preferences are known. However, it is unclear that an agent's full set of preferences is ascertainable even to himself or herself. Here a lot of weight may need to be placed on the ‘in principle’ qualification in the ascertainability criterion. The expected utility representation makes it virtually analytic that an agent should be guided by probabilities -- after all, the probabilities are her own, and they are fed into the formula for expected utility in order to determine what it is rational for her to do.

#### 4.5.4 Orthodox Bayesianism, and further constraints on rational credences

But do they function as a *good* guide? Here it is useful to
distinguish different versions of subjectivism. *Orthodox
Bayesians* in the style of de Finetti recognize no rational
constraints on subjective probabilities beyond:

- conformity to the probability calculus, and
- a rule for updating probabilities in the face of new evidence,
known as
*conditioning*. An agent with probability function*P*_{1}, who becomes certain of a piece of evidence*E*, should shift to a new probability function*P*_{2 }related to*P*_{1}by:(Conditioning)

*P*_{2}(*X*) =*P*_{1}(*X*|*E*) (provided*P*_{1}(*E*) > 0).

This is a permissive epistemology, licensing doxastic states that we
would normally call crazy. Thus, you could assign probability 1 to
this sentence ruling the universe, while upholding such extreme
subjectivism -- provided, of course, that you assign probability 0 to
this sentence *not* ruling the universe, and that your other
probability assignments all conform to the probability calculus.

Some otherwise extreme subjectivists impose the further rationality
requirement of *regularity:* only *a priori* falsehoods
get assigned probability 0. This is sometimes also called ‘strict
coherence’, and it is advocated by authors such as Kemeny (1955),
Jeffreys (1961), Edwards et al. (1963), Shimony (1970), and Stalnaker
(1970). It is meant to capture a form of open-mindedness and
responsiveness to evidence. But then, perhaps unintuitively, someone
who assigns probability 0.999 to this sentence ruling the universe can
be judged rational, while someone who assigns it probability 0 is
judged irrational. Note also that the requirement of regularity seems
to afford a new argument for the non-existence of God as traditionally
conceived: an omniscient agent, who gives probability 1 to all truths,
would be convicted of irrationality. Thus regularity seems to require
ignorance, or false modesty. See, e.g., Levi (1978) for further
opposition to regularity.

Probabilistic coherence plays much the same role for degrees of
belief that *consistency* plays for ordinary, all-or-nothing
beliefs. What an extreme subjectivist, even one who demands
regularity, lacks is an analogue of *truth*, some yardstick for
distinguishing the ‘veridical’ probability assignments from
the rest (such as the 0.999 one above), some way in which probability
assignments are answerable to the world. It seems, then, that the
subjectivist needs something more.

And various subjectivists offer more. Having isolated the
“logic” of partial belief as conformity to the probability
calculus, Ramsey goes on to discuss what makes a degree of belief in a
proposition *reasonable*. After canvassing several possible
answers, he settles upon one that focuses on *habits* of
opinion formation -- “e.g. the habit of proceeding from the
opinion that a toadstool is yellow to the opinion that it is
unwholesome” (50). He then asks, for a person with this habit,
what probability it would be best for him to have that a given yellow
toadstool is unwholesome, and he answers that “it will in general
be equal to the proportion of yellow toadstools which are in fact
unwholesome” (50). This resonates with more recent proposals
(e.g., van Fraassen 1984, Shimony 1988) for evaluating degrees of
belief according to how closely they match the corresponding relative
frequencies -- in the jargon, how well *calibrated* they
are. Since relative frequencies obey the axioms of probability (up to
finite additivity), it is thought that rational credences, which
strive to track them, should do so
also.^{[10]}

However, rational credences may strive to track various things. For example, we are often guided by the opinions of experts. We consult our doctors on medical matters, our weather forecasters on meteorological matters, and so on. Gaifman (1988) coins the terms “expert assignment” and “expert probability” for a probability assignment that a given agent strives to track: “The mere knowledge of the [expert] assignment will make the agent adopt it as his subjective probability” (193). This idea may be codified as follows:

(Expert)P(A|pr(A) =x) =x, for allxsuch thatP(pr(A) =x) > 0

where ‘*P*’ is the agent's subjective probability
function, and ‘*pr*(*A*)’ is the assignment
that the agent regards as expert. For example, if you regard the local
weather forecaster as an expert on your local weather, and she assigns
probability 0.1 to it raining tomorrow, then you may well follow
suit:

P(rain|pr(rain) = 0.1) = 0.1

More generally, we might speak of an entire probability function as
being such a guide for an agent over a specified set of
propositions. Van Fraassen (1989, 198) gives us this definition:
“If *P* is my personal probability function, then
*q* is an *expert function for me concerning* family
*F* of propositions exactly if *P*(*A* |
*q*(*A*) = *x*) = *x* for all propositions
*A* in family *F*.”

Let us define a *universal expert function* *for* a
given rational agent as one that would guide *all* of that
agent's probability assignments in this way: an expert function
for the agent concerning all propositions. Van Fraassen (1984, 1995a),
following Goldstein (1983), argues that an agent's *future
probability functions* are universal expert functions for that
agent. He enshrines this idea in his Reflection Principle, where
*P** _{t}* is the agent's probability
function at time

*t*, and

*P*

_{t+Δ}is her function at a later time

*t*+Δ:

P_{t}(A|P_{t+Δ}(A) =x) =x, for allAand for allxsuch thatP_{t}(P_{t+Δ}(A) =x) > 0.

The principle encapsulates a certain demand for ‘diachronic coherence’ imposed by rationality. Van Fraassen defends it with a ‘diachronic’ Dutch Book argument (one that considers bets placed at different times), and by analogizing violations of it to the sort of pragmatic inconsistency that one finds in Moore's paradox.

We may go still further. There may be universal expert functions for
*all* rational agents. Let us call such a function a
*universal expert function*, without any relativization to an
agent. The *Principle of Direct Probability* regards the
*relative frequency* function as a universal expert
function; we have already seen the importance that proponents of
calibration place on it. Let *A* be an event-type, and let
*relfreq*(*A*) be the relative frequency of *A*
(in some suitable reference class). Then for any rational agent with
probability function *P*, we have

P(A|relfreq(A) =x) =x, for allAand for allxsuch thatP(relfreq(A) =x) > 0. (Cf. Hacking 1965.)

Lewis, as we have seen, posits a similar universal expert role for
the *objective chance function, ch*, in his *Principal
Principle*:

P(A|ch(A) =x) =x, for allAand for allxsuch thatP(ch(A) =x) > 0.

A frequentist who thinks that chances just *are* relative
frequencies would presumably think that the Principal Principle just
*is* the Principle of Direct Probability; but Lewis' principle
may well appeal to those who have a very different view about chances
-- e.g., propensity theorists. The argument that we saw in the
previous section, using the Principal Principle to show that
propensities (chances) must obey the probability calculus, can now be
turned on its head: assuming that they *do* obey it, rational
degrees of belief, which aim to track these propensities, must do so
too.

The ultimate expert, presumably, is the *truth* function --
the function that assigns 1 to all the true propositions and 0 to all
the false ones. Knowledge of its values should surely trump knowledge
of the values assigned by human experts (including one's future
selves), frequencies, or chances. Note that for any putative expert
*q*,

P(A|q(A) =x∩A) = 1, for allAand for allxsuch thatP(q(A) = x ∩A) > 0

-- the truth of *A* overrides anything the expert might
say. So all of the proposed expert probabilities above should really
be regarded as defeasible. Joyce (1998) portrays the rational agent as
estimating truth values, seeking to minimize a measure of distance
between them and her probability assignments. He argues that for any
measure of distance that satisfies certain intuitive properties, any
agent who violates the probability axioms could serve this epistemic
goal better by obeying them instead, however the world turns out.

There are some unifying themes in these approaches to subjective probability. An agent's degrees of belief determine her estimates of certain quantities: the values of bets, or the desirabilities of gambles more generally, or the probability assignments of various ‘experts’ -- humans, relative frequencies, objective chances, or truth values. The laws of probability then are claimed to be constraints on these estimates: putative necessary conditions for minimizing her ‘losses’ in a broad sense, be they monetary, or measured by distances from the assignments of these experts.

## 4. Conclusion: Future Prospects?

It should be clear from the foregoing that there is still much work to be done regarding the interpretation of probability. Each interpretation that we have canvassed seems to capture some crucial insight into it, yet falls short of doing complete justice to it. Perhaps the full story about probability is something of a patchwork, with partially overlapping pieces. In that sense, the above interpretations might be regarded as complementary, although to be sure each may need some further refinement. My bet, for what it is worth, is that we will retain at least three distinct notions of probability: one quasi-logical, one objective, and one subjective.

There are already signs of the rehabilitation of classical and logical probability, and in particular the principle of indifference and the principle of maximum entropy, by authors such as Stove (1986), Bartha and Johns (2001), Festa (1993), Paris and Vencovská (1997), and Maher (2000, 2001). Relevant here may also be advances in information theory and complexity theory (see Fine 1973, Li and Vitanyi 1997). These theories have already proved to be fruitful in the study of randomness (Kolmogorov 1965, Martin-Löf 1966), which obviously is intimately related to the notion of probability. Refinements of our understanding of randomness, in turn, should have a bearing on the frequency interpretations (recall von Mises' appeal to randomness in his definition of ‘collective’), and on propensity accounts (especially those that make explicit ties to frequencies). Given the apparent connection between propensities and causation adumbrated in Section 3, powerful causal modeling techniques by authors such as Spirtes, Glymour and Scheines (1993) and Pearl (2000), and recent work on causation more generally (e.g., Hall 2003, Woodward forthcoming) should also prove fruitful here.

An outgrowth of frequentism is Lewis' (1986b, 1994b) account of
chance. It runs roughly as follows. The laws of nature are those
regularities that are theorems of *the best theory:* the true
theory of the universe that best balances simplicity, strength, and
likelihood (that is, the probability of the actual course of history,
given the theory). If any of the laws are probabilistic, then the
chances are whatever these laws say they are. Now, it is somewhat
unclear exactly what ‘simplicity’ and ‘strength’
consist in, and exactly how they are to be balanced. Perhaps insights
from statistics and computer science may be helpful here: approaches
to statistical model selection, and in particular the
‘curve-fitting’ problem, that attempt to characterize
simplicity, and its trade-off with strength -- e.g., the Akaike
Information Criterion (see Forster and Sober 1994), the Bayesian
Information Criterion (see Kieseppä 2001), Minimum Description
Length theory (see Rissanen 1999) and Minimum Message Length theory
(see Wallace and Dowe 1999).

State-of-the-art contributions to the subjectivist theory of probability include Schervish, Seidenfeld and Kadane's (2000) research on degrees of incoherence (measuring the extent of departures from obedience to the probability calculus) and on the aggregation of the opinions of multiple agents (Seidenfeld et al. 1989; see also Hild forthcoming). These promise to be fertile areas of future research. We may expect that further criteria of adequacy for subjective probabilities will be developed -- perhaps refinements of ‘scoring rules’ (Winkler 1996), and more generally, candidates for playing a role for subjective probability analogous to the role that truth plays for belief. Here we may come full circle. For belief is answerable both to logic and to objective facts. A refined account of degrees-of-belief may be answerable both to a refined quasi-logical and a refined objective notion of probability.

Well may we say that probability is a guide to life; but the task of understanding exactly how and why it is has still to be completed, and will surely prove to be a guide to future theorizing about it.

### Suggested Further Reading

Kyburg (1970) contains a vast bibliography of the literature on probability and induction pre-1970. Also useful for references before 1967 is the bibliography for “Probability” in the Macmillan*Encyclopedia of Philosophy*. Earman (1992) and Howson and Urbach (1993) have more recent bibliographies, and give detailed presentations of the Bayesian program. Skyrms (2000) is an excellent introduction to the philosophy of probability. Von Plato (1994) is more technically demanding and more historically oriented, with another extensive bibliography that has references to many landmarks in the development of probability theory in the last century. Fine (1973) is still a highly sophisticated survey of and contribution to various foundational issues in probability, with an emphasis on interpretations. Billingsley (1995) and Feller (1968) are classic textbooks on the mathematical theory of probability.

^{[11]}

## Bibliography

- Arnauld, A.
*Logic, or, The Art of Thinking*("The Port Royal Logic"), 1662, tr. J. Dickoff and P. James, Indianapolis: Bobbs-Merrill, 1964 - Bartha, P. and Johns, R., 2001, “Probability and
Symmetry”,
*Philosophy of Science*68 (Proceedings), S109-S122 - Billingsley, P., 1995,
*Probability and Measure*, 3rd ed., New York: John Wiley & Sons - Carnap, R., 1950,
*Logical Foundations of Probability*, Chicago: University of Chicago Press - -----, 1952,
*The Continuum of Inductive Methods*, Chicago: University of Chicago Press - -----, 1963, “Replies and Systematic Expositions”
in
*The Philosophy of Rudolf Carnap*, P. A. Schilpp, (ed.), Open Court, Illinois: La Salle - Church, A., 1940, “On the Concept of a Random Sequence”,
*Bulletin of the American Mathematical Society*46: 130-135 - De Finetti, B., 1937, “La Prévision: Ses Lois
Logiques, Ses Sources Subjectives”,
*Annales de l”Institut Henri Poincaré*, 7: 1-68; translated as “Foresight. Its Logical Laws, Its Subjective Sources”, in*Studies in Subjective Probability*, H. E. Kyburg, Jr. and H. E. Smokler (eds.), Robert E. Krieger Publishing Company, 1980 - -----, 1972,
*Probability, Induction and Statistics*, New York: Wiley - -----, 1990, (originally published 1974),
*Theory of Probability*, Vol. 1, Wiley Classics Library, John Wiley & Sons - Earman, J., 1992,
*Bayes or Bust*, Cambridge: MIT Press - Edwards, W., Lindman, H., and Savage, L. J., 1963, “Bayesian
Statistical Inference for Psychological Research”,
*Psychological Review*LXX: 193-242 - Feller, W., 1968,
*An Introduction to Probability Theory and Its Applications*, New York: John Wiley & Sons - Festa, R., 1993,
*Optimum Inductive Methods: A Study in Inductive Probability, Bayesian Statistics, and Verisimilitude*, Dordrecht: Kluwer (Synthese Library 232) - Fetzer, J. H., 1981,
*Scientific Knowledge: Causation, Explanation, and Corroboration, Boston Studies in the Philosophy of Science*, Vol, 69, Dordrecht: D. Reidel - -----, 1982, “Probabilistic Explanations”,
*PSA*, 2: 194-207 - -----, 1983, “Probability and Objectivity in
Deterministic and Indeterministic Situations”,
*Synthese*57: 367-386 - Fine, T., 1973,
*Theories of Probability*, Academic Press - Forster, M. and Sober, E. 1994, “How to Tell when Simpler,
More Unified, or Less Ad Hoc Theories will Provide More Accurate
Predictions”,
*British Journal for the Philosophy of Science*45: 1-35. - Gaifman, H., 1988, “A Theory of Higher Order
Probabilities”, in
*Causation, Chance, and Credence*, B. Skyrms and William L. Harper (eds.), Dordrecht: Kluwer Academic Publishers - Giere, R. N., 1973, “Objective Single-Case Probabilities and
the Foundations of Statistics”, in
*Logic, Methodology and Philosophy of Science*IV, P. Suppes, et al., (eds.), New York: North-Holland - Gillies, D., 2000, “Varieties of Propensity”,
*British Journal for the Philosophy of Science*, 51: 807-835 - Goldstein, M., 1983, “The Prevision of a Prevision”,
*Journal of the American Statistical Association*, 78: 817-819 - Hacking, I., 1965,
*The Logic of Statistical Inference*, Cambridge: Cambridge University Press - Hájek, A., 1997, “
*'Mises Redux' -- Redux*. Fifteen Arguments Against Finite Frequentism”,*Erkenntnis*, 45: 209-227 - -----, forthcoming, “What Conditional Probability
Could Not Be”,
*Synthese* - Hall, N., 2003, “Two Concepts of Causation”, in
J. Collins, N. Hall, and L. Paul (eds.),
*Counterfactuals and Causation*, MIT Press - Hild, M., forthcoming, “Stable Aggregation of
Preferences”,
*Econometrica* - Hintikka, J., 1965, “A Two-Dimensional Continuum of Inductive
Methods” in
*Aspects of Inductive Logic*, J. Hintikka and P. Suppes, (eds.), Amsterdam: North-Holland - Hitchcock, C., 2002, “Probability and Chance”, in
the
*International Encyclopedia of the Social and Behavioral Sciences*, vol. 18, 12,089 - 12,095, London: Elsevier - Howson, C. and Urbach, P., 1993,
*Scientific Reasoning: The Bayesian Approach*, Open Court, 2^{nd}edition - Humphreys, P., 1985, “Why Propensities Cannot Be
Probabilities”,
*Philosophical Review*, 94: 557-70 - Jackson, F., 2000,
*From Metaphysics to Ethics: A Defence of Conceptual Analysis*, Oxford: Oxford University Press - Jaynes, E. T., 1968, “Prior Probabilities”
*Institute of Electrical and Electronic Engineers Transactions on Systems Science and Cybernetics*, SSC-4: 227-241 - Jeffrey, R., 1966,
*The Logic of Decision*, Chicago: University of Chicago Press; 2^{nd}ed. 1983. - -----, 1992,
*Probability and the Art of Judgment*, Cambridge: Cambridge University Press - Jeffreys, H., 1939,
*Theory of Probability*; reprinted in Oxford Classics in the Physical Sciences series, Oxford University Press, 1998. - Johnson, W. E., 1921,
*Logic*, Cambridge: Cambridge University Press - Joyce, J., 1998, “A Nonpragmatic Vindication of
Probabilism”,
*Philosophy of Science*, 65 (4) 575-603 - Kahneman, D., Slovic P. and Tversky, A., (eds.), 1982,
*Judgment Under Uncertainty. Heuristics and Biases*, Cambridge: Cambridge University Press - Kemeny, J., 1955, “Fair Bets and Inductive
Probabilities”,
*Journal of Symbolic Logic*, 20: 263-273 - Keynes, J. M., 1921,
*A Treatise on Probability*, Macmillan and Co - Kieseppä, I. A., 2001, “Statistical Model Selection Criteria
and Bayesianism”,
*Philosophy of Science*(Supplemental volume) - Kolmogorov, A. N., 1933,
*Grundbegriffe der Wahrscheinlichkeitrechnung*, Ergebnisse Der Mathematik; translated as*Foundations of Probability*, Chelsea Publishing Company, 1950. - -----, 1965, “Three Approaches to the Quantitative
Definition of Information”,
*Problemy Perdaci Informacii*, 1: 4-7 - Kyburg, H. E., 1970,
*Probability and Inductive Logic*, New York: Macmillan - Kyburg, H. E. and Smokler, H. E., (eds.), 1980,
*Studies in Subjective Probability*, 2nd ed., Huntington, New York: Robert E. Krieger Publishing Co. - Laplace, P. S., 1814, English edition 1951,
*A Philosophical Essay on Probabilities*, New York: Dover Publications Inc. - Levi, I., 1978, “Coherence, Regularity and Conditional
Probability”,
*Theory and Decision*9, 1-15. - Lewis, D., 1970, “How to Define Theoretical Terms”,
*Journal of Philosophy*67: 427-446 - -----, 1980, “A Subjectivist's Guide to Objective
Chance”, in
*Studies in Inductive Logic and Probability*, Vol II., University of California Press, reprinted in Lewis 1986b. - -----, 1986a, “Probabilities of Conditionals and
Conditional Probabilities II”,
*Philosophical Review*95: 581-589 - -----, 1986b,
*Philosophical Papers Volume II*, Oxford: Oxford University Press - -----, 1994a, “Reduction of Mind”, in
*A Companion to the Philosophy of Mind*, S. Guttenplan (ed.), Blackwell - -----, 1994b, “Humean Supervenience Debugged”,
*Mind*, 103: 473-490 - Li, M. and Vitányi, P., 1997,
*An Introduction to Kolmogorov Complexity**and Its Applications*, 2^{nd}ed., New York: Springer - Maher, P., 2000, “Probabilities for Two Properties”,
*Erkenntnis*52: 63-91 - -----, 2001, “Probabilities for Multiple Properties: The
Models of Hesse and Carnap and Kemeny”,
*Erkenntnis*55: 183-216 - Martin-Löf. P., 1966, “The Definition of Random
Sequences”,
*Information and Control*, 9: 602-619 - Miller, D. W., 1994,
*Critical Rationalism: A Restatement and Defence*, Chicago and Lasalle, Il: Open Court. - Paris J. and Vencovská A., 1997, “In Defence of the
Maximum Entropy Inference Process”
*International Journal of Approximate Reasoning*, 17: 77-103 - Pearl, J., 2000,
*Causality*, Cambridge: Cambridge University Press - Popper, Karl R., 1957, “The Propensity Interpretation of the
Calculus of Probability and the Quantum Theory” in S. Körner
(ed.),
*The Colston Papers*, 9: 65-70 - -----, 1959a, “The Propensity Interpretation of
Probability”,
*British Journal of the Philosophy of Science*10: 25-42 - -----, 1959b,
*The Logic of Scientific Discovery*, Basic Books; reprint edition 1992, Routledge - Ramsey, F. P., 1926, “Truth and Probability”, in
*Foundations of Mathematics and other Essays*, R. B. Braithwaite (ed.), Routledge & P. Kegan , 1931, 156-198; reprinted in*Studies in Subjective Probability*, H. E. Kyburg, Jr. and H. E. Smokler (eds.), 2^{nd}ed., R. E. Krieger Publishing Company, 1980, 23-52; reprinted in*Philosophical Papers*, D. H. Mellor (ed.) Cambridge: University Press, Cambridge, 1990. - Reichenbach, H., 1949,
*The Theory of Probability*, Berkeley: University of California Press - Renyi, A., 1970,
*Foundations of Probability*, Holden-Day, Inc - Rissanen, J. 1999, “Hypothesis Selection and Testing by the
MDL Principle”,
*Computer Journal*42 (4) 260-269 - Roeper, P. and Leblanc, H., 1999,
*Probability Theory and Probability Logic*, Toronto: University of Toronto Press - Salmon, W., 1966,
*The Foundations of Scientific Inference*, University of Pittsburgh Press - Savage, L. J., 1954,
*The Foundations of Statistics*, John Wiley - Schervish, M. J., Seidenfeld, T., and Kadane, J. B., 2000,
“How sets of coherent probabilities may serve as models for
degrees of incoherence”,
*Journal of Uncertainty, Fuzziness, and Knowledge-based Systems*8, No. 3 (June), 347-356 - Scott D., and Krauss P.,1966, “Assigning Probabilities to
Logical Formulas”, in
*Aspects of Inductive Logic*, J. Hintikka and P. Suppes, (eds.), Amsterdam: North-Holland - Seidenfeld, T., Kadane, J. and Schervish, M. 1989, “On the
Shared Preferences of Two Bayesian Decision Makers”,
*Journal of Philosophy*, 86: 225-244 - Shimony, A., 1970, “Scientific Inference”, in
*The Nature and Function of Scientific Theories*, R. Colodny (ed.), Pittsburgh: University of Pittsburgh Press - -----, 1988, “An Adamite Derivation of the Calculus of
Probability”,
*Probability and Causality*, in J.H. Fetzer (ed.), Dordrecht: D. Reidel - Skyrms, B., 1980,
*Causal Necessity*, New Haven: Yale University Press - -----, 1984,
*Pragmatics and Empiricism*, New Haven: Yale University Press - -----, 2000,
*Choice and Chance*, 4^{th}ed, Wadsworth, Inc. - Sober, E., 2000, Philosophy of Biology, 2
^{nd}ed, Westview Press - Spirtes, P., Glymour, C. and Scheines, R., 1993,
*Causation, Prediction, and Search*, New York: Springer-Verlag - Spohn, W., 1986, “The Representation of Popper
Measures”,
*Topoi*5 - Stalnaker, R., 1970, “Probabilities and Conditionals”,
*Philosophy of Science*37: 64-80 - Stove, D. C., 1986,
*The Rationality of Induction*, Oxford: Oxford University Press - van Fraassen, B., 1984, “Belief and the Will”,
*Journal of Philosophy*81: 235-256 - -----, 1989,
*Laws and Symmetry*, Oxford: Clarendon Press - -----, 1995a, “Belief and the Problem of Ulysses
and the Sirens”,
*Philosophical Studies*77: 7-37 - -----, 1995b, “Fine-grained Opinion, Conditional Probability,
and the Logic of Belief”,
*Journal of Philosophical Logic*24: 349-377 - Venn, J., 1876,
*The Logic of Chance*, 2^{nd}ed., Macmillan and co; reprinted, New York, 1962. - von Mises R., 1957,
*Probability, Statistics and Truth*, revised English edition, New York: Macmillan - von Neumann, J. and Morgenstern, O., 1944,
*Theory of Games and Economic Behavior*, Princeton: Princeton University Press; New York: John Wiley and Sons, 1964. - von Plato J., 1994,
*Creating Modern Probability*, Cambridge: Cambridge University Press - Wallace, C. S. and Dowe, D. L., 1999, “Minimum Message Length
and Kolmogorov Complexity”,
*Computer Journal*(special issue on Kolmogorov complexity), 42 (4) 270-283 - Winkler, R. L., 1996, “Scoring Rules and the Evaluation of
Probabilities”,
*Test*, 5 (1) 1-60 - Woodward, J., forthcoming,
*A Theory of Explanation: Causation, Invariance and Intervention*, Oxford: Oxford University Press

## Other Internet Resources

- "Probability",
short entry by Michael Cohen (U. Wales) in
*The Oxford Companion to Philosophy*, hosted at xrefer.com. - "Difficulties of the standard interpretations of probability", by Lazlo E. Szabo (History and Philosophy of Science, Eötvös University, Budapest)
- "Probability, (in PDF), lectures by Paul Bartha (Philosophy, University of British Columbia).