User-friendly probablity calculus & Bayesian confirmation theory
Recorded at Logical Methods, Bristol (2005), featuring Brendan Fitelson. From the Michael Wright Collection, held by the Archive Trust for Research in Mathematical Sciences & Philosophy.
- Identifier
mw0000653-cc-b- Format
- Audio recording
- Collection
- Michael Wright Collection
- Repository
- Archive Trust for Research in Mathematical Sciences & Philosophy
- Rights
- Made available for personal scholarly use. Rights in recordings are generally held by the speakers or their estates. If you believe this recording infringes your rights, please contact [email protected].
Read the automatically generated transcript
This transcript was generated by speech-recognition software from an archival recording and has not been hand-corrected. It will contain recognition errors — particularly for proper names and technical terminology — so please verify against the audio before quoting. Timestamps play the recording from that moment.
0:00 And he's going to talk about the decision procedure for the probability calculus with application. Thank you. I'll start by thanking the organizers. This has really been a wonderful experience and a pleasure to be invited and to participate. My talk is going to be something completely different than a lot of the other talks. It's a really interesting project, though. Basically it has two parts. So I'm going to begin with a philosophical motivation for the project before I get into the more technical side of things. And it isn't going to be a very technical talk, really just high school algebra is probably all you'll need to understand things. The motivation comes from a problem in Bayesian confirmation theory, and I'll tell you what that is in the first part of the talk. I'll give you all the background. I'm going to give you an overview of Bayesian confirmation theory, which is basically just an application of the probability calculus and philosophy of science, and it's very simple. There's a problem that I just discovered or popularized with this theory. what that problem is and that leads to mathematically there's a mathematical aspect of the problem and on the mathematical side it leads to a whole bunch of questions about the validity of arguments and the probability counts okay so basically there's tons of arguments that we want to know whether they're valid or not and they're not entirely trivial trivial to determine whether they're valid or not okay and so what you get is a bunch of validity problems in a certain algebraic theory, basically. And so we need a way to solve these. And the solution is going to be a decision procedure for the probability calculus. It's a completely general procedure even to generate models. Okay, so I'm going to talk about the probability calculus in the second part of the talk, which is more technical and historical. I'll tell you about the probability calculus from both an axiomatic and an algebraic point of view. And then I'll be focusing on the algebraic representation because that's where the decision procedure comes in. And this is something that goes back to Karski, of course, who gave a decision procedure for a much more general algebraic theory, the theory of real closed fields. So I'll describe how you can embed probability calculus in the theory of real closed fields
2:30 and back to them. So the problem is we're interested in a very weak fragment of that theory. Simple part of it. And then I'll show how using Mathematica in which this decision method has been implemented very recently. It's only the last couple of years. This is that part of his joint work with Jason Alexander who helped me get this into a real polished mathematical package this year. And then I'll demonstrate the program. I'll actually run on lots of examples. I'll show you how you can find models and verify theorems. A lot of them are non-trivial. In fact, I've found some models that have changed people's minds about sort of philosophical views. I'll talk about that. So there are some open questions that have been solved with it. But it's real. It works, and it actually solves problems. I'm not going to, but they will be computed. And then I'll talk a little bit about scaling this up. That's, I have a graduate student in a logic group at Berkeley, Galen Huntington, who's working on that. There's some interesting theoretical computer science there. Okay, so basically there's a philosophical problem. And that philosophical problem leads to a bunch of mathematical problems. In order to solve them, I implemented this decision method, and I'll show you how it solves those problems and other problems. So that's the topic. So the first part is I have to give you the philosophical background. Okay, so Bayesian confirmation theory. What's that? Well, here's a very quick crash course on it. Okay, according to orthodox personalistic Bayesianism, or sometimes called Bayesian epistemology. Rational agents have degrees of belief in propositions. Degrees of credence, if you want to call it that. And Bayesians assume that those degrees of belief are probabilities. In fact, that they're Kalmogorov probabilities. And I'll get a lot more specific about what that means later when we get to the technical part. Basically, they satisfy certain actions. And it's important that we're talking about There are other ways of doing probability theory, proper functions and so on. If you do those, then a lot of the things I say are false. So it's important we're using sort of standard probability theory.
5:00 Okay. So just a tiny bit of notation. This is pretty much all the notation we'll need, except for basic algebraic notation. PR of H slash K, that's the way we're going to interpret it. It's a degree that believes it a rational agent. with background knowledge k assigned to h, call that the agent's prior probability of h. The probability of h given slash e and k, that's the degree of belief that s of k, this agent with background knowledge k assigned to h on the supposition that e is true, or were they to learn e with certainty, it's the probability that they were assigned to That's somewhat a contentious way of characterizing it, but it doesn't matter for our purposes really. Those interpretive issues aren't going to be important, although we can talk about that in as much detail later. Okay, so we'll call that posterior probability. Now here's a simple toy example just to get our probabilistic use this one here. We won't need examples more to understand what's going on. Let H be the hypothesis that a card could be drawn at random from the standard 52 cards X, it is a spade, and let E be the proposition that the card is the ace of spades. And imagine that you observe that the card is the ace of spades. Okay, now given the standard model of random card draws, the probability of H is 1 fourth because a fourth of the cards is a spade, and we assume that's a probability, but the probability of H given E is 1. Given that the car is the ace of spades, it must be a spade, so that probably is one. Okay, so in this case, learning E raises the probability of age. In fact, it verifies that age is true. Simple example, but we won't need any examples more complicated than that for the basic structure. Okay, that leads us to the definition of confirmation. Okay, so contemporary Bayesian confirmation theory differs from Carnap, of course, but contemporary Bayesians, it also differs from Keynes somewhat. Evidence E will confirm or support a hypothesis of H if learning E raises the probability of H. In other words, if E and H are positively correlated on the agent's credence function. Okay, if learning E lowers the probability of H, if they're anti-correlated, then E disconfirms H. If E and H are independent on the agent's credence function, then they're confirmationally neutral. They're confirmation irrelevant. Okay, so this is a probabilistic relevance theory of confirmation. on this basic notion of probabilistic, positive probabilistic relevance, just probability
7:30 raising. Okay, so, again, assuming we call a probabilistic theory, these claims are false or proper functions, but that doesn't matter for us. Assuming we call a probabilistic theory, which I'll find later, there are actually many logically equivalent ways of saying that he confirms H under a certain probability function. Now I'm just dropping the case, because they don't matter what I'm doing anyway. So one way of saying E confirms H is just to say that the posterior is greater than the prime. The conditional probability is greater than the unconditional probability. But that's true if and only if the probability of E given at H is greater than the probability of E given the denial of H. And that's true if and only if the probability of H given E is greater than the probability of H given 9. Okay. These are all logically equivalent. That will become a little clearer later when we look at the actualization. So, that's the qualitative theory of confirmation. It's kind of a unified theory because there's lots of different ways of writing down that ENH are correlated under a probability function. But these are all true if only at the other so it's a unified theory in that sense okay now that's a qualitative account of confirmation but what if you wanted a quantitative or comparative or uh um ordinal kind of notion of confirmation well if you want a quantitative notion intuitively what you can do is for instance say take the difference between the posterior and the prior you could and then you you get a measure of how much the process is. Okay. Or, intuitively, you could take the difference, say, between the left and right-hand side of the second inequalities. These are called likelihoods. Or you could take the difference between the left and right-hand sides of this inequality. And intuitively, these are all going to be measures, in a sense, of degree of confirmation, since all of these inequalities are logically equivalent. Okay. So you could take differences, you could take ratios, you could take log ratios. There's all kinds of various functions do here. And what they're all going to have in common is they're all going to be relevance measures. That's the class of measures I'm interested in. And by relevance measure, I just mean a measure that's greater than zero if E and H are correlated if he confirms H, less than zero if he disconfirms H, zero if there's neutrality or independence. Okay,
10:00 and there's lots of ways you can take differences. You can take logarithms of ratios, and there's various other functions you can take that would be relevance measures. Okay, so on the of different measures. Certainly, there are lots of different numerically distinct measures. Okay? And in fact, if you look at the literature, dozens of measures have been proposed and defended for various purposes by various Bayesians over the years, dating back at least to years around the 20th century. W.E. Johnson and his student John Maynard Keynes, they talk about some of these measures, and then there have been lots of other people to talk about. focus on the four most popular measures in the contemporary literature. Just a sheer empirical sociological fact. These are the four most frequently used in the contemporary literature. So the first one, well that's just the simplest one, the difference between the posterior and the prior. Lots of philosophers use that one. There's a lot of references at the end. I gave references to various people who used these at the end of the paper. Okay. Or you could take the ratio and take the logarithm only because I want it to be positive when it's greater than one, that is when there's confirmation. Negative when it's less than one, and when they're equal, I want it to be zero. But the logarithm doesn't change the ordinal structure, it's an increasing function, so that's just a convention for ease of use. We could drop the log, it wouldn't really affect anything I'm going to say. It's So these are both based on the first inequality from the previous slide, comparing the posterior and the prior, but you could work with the second inequality instead and, say, take the logarithm of the ratio of the likelihood of age and the likelihood of the denial of age. That's called the likelihood ratio. Turing used that in the war, and I.J. Goodes has used in lots of interesting applications in that measure. Or you could use the third inequality and you could, say, take the difference between the left and right-hand sides of the third inequality. That has an intimate relationship with the difference measure, but that's not important. I just thought I'd point that out. And these are just the four most popular, as I said. You can do virtually every combination and almost any function that is relatively
12:30 as a relative, has been defended in the literature. So the first part of the story and what motivates the need for a decision procedure here is the nature of the disagreement between these measures. Of course they're numerically different, they assign different numbers, but that's not really important. So what kind of disagreement is important here? Well, ordinal disagreements, comparative disagreements, think about, by analogy, think about temperature. You know, the Celsius and Fahrenheit scales differ numerically, but they're both temperature scales because if something's warmer than something else Celsius-wise, it's warmer than that thing Fahrenheit-wise. And that's what's important, getting the ordinal structure. Right. So what we want to look at is ordinal differences between the metrics, not mere numerical differences. Okay. If there were such ordinal differences, that would have an effect on lots of arguments in the literature. I'm going to briefly survey some of those. Most famously, for instance, it's part of Bayesian lore that the observation of a black raven, called E1, confirms the hypothesis H, that all ravens are black, more strongly than the observation of a red herring does, or a white shoe given what we think we actually know about the world call that K that's part of Bayesian lore and that's really how Bayesians resolve the famous raven paradox that's the contemporary way of doing it given the standard background assumptions that Bayesians make about K about the background corpus this conclusion that E1 confirms stage more strongly than E2 does, follows for some relevance measure C, but it fails to follow for some other choices of C, even holding fixed all the other assumptions in the arguments. When that happens, when you have an argument which is such that it's valid for some choices of measure, but not for others, holding fixed all other assumptions, when that happens, we say the argument is sensitive to choice of measure. So the problem that that we're interested in is what kind of arguments are measure sensitive and how many of them are there in the literature? It turns out they all are. And I mean this in a very strong sense. I don't just mean that you could gerrymander a measure such that according to this gerrymandered
15:00 measure, it gets the opposite result everyone wants. That's not what I mean. What I mean is take the four most popular measures in use and every argument is such that it's valid for some of those and not for others, holding fixed to all of the premises. Okay, so here's that's with this table. This is just a small table. I have a, I can give you a much bigger table with 35 measures and, you know, 100 arguments if you want. But I'm just looking at, this is sort of my top, whatever 8 or 9 it is. So, you've got the Raven Paradox, the standard approach to that. That's going to be valid for DR and L, but not for S. Various resolutions of the The Rue paradox will be valid for some measures, but not for R, sometimes not for L, depending on how you do it. The irrelevant conjunction problem, and I can talk about any of these problems in the discussion period if you want, in as much detail as you want. That's not really important for the talk, but if you want, we can go into this in the discussion period. They're all sort of interesting. Accounts of the variety of evidence, like Horwitz is measure sensitive. There's the old evidence problem, there's resolutions of that that are measure sensitive. And it isn't just arguments within Bayesian confirmation theory, it isn't just applications of the theory that are measure sensitive, it's criticisms of the theory that are measure sensitive. So Popper and Miller have a famous critique in their paper on the impossibility of deductive probability, and the argument that they give is measure sensitive. It goes through for a couple of these four, but not for the other two. So, you have to be careful about this if you're Bayesian. You also have to be careful about it if you're criticizing Bayesians, because it could be a straw man. I mean, you have to make sure that you're criticizing a theory that someone's defending. Okay. Now, as I said, we can talk about these if you want in as much detail later, but not really important. What's important for our purposes is this. Notice, these are all questions about the validity of arguments and probability counts. a logical point of view and i have hundreds i mean i've got a table with a thousand you know some huge number of cells uh that's a lot of problems okay that you need to solve and not all of them are trivial as we'll see there have been some surprising answers to these especially the first one this one was people were very surprised and changed their minds about things as a result of the model that i found with the program that i'm going to talk about okay so that's That's part one of the talk.
17:30 That's the problem, okay? And so it was purely instrumental. I had all these problems I needed to solve, and I couldn't do them, and I asked people, and I'm like, well, I don't know, you know, and it's hard to get people to try to solve problems like this, and some of them, as I said, are not trivial. So that was necessity is the mother of invention, so that was the motivation. Okay, so that's part one of the talk. How are we doing so far? Is this clear? All right. Okay, so now I'm going to get a little bit more technical, but we won't really need to get that technical, okay? I mean, I'm just going to do the minimum, and I'm not going to get into all the gory details of the computations. I'll give you enough to understand what's going on. That's all you need. Okay, so first I'll talk a little bit about axiomatic probability calculus. Here we'll get a little bit more precise about what's going on. And then I'm going to go algebraic, and then I'm going to stay algebraic, because that's where the decision procedures are. Okay, so all problems in Bayesian confirmation theory can be represented in small, finite common bar of probability models. Okay, now, what's a common bar of probability model? Very simple. It's just a finite Boolean algebra of propositions together with a function that maps elements of that algebra onto the unit interval such that that function satisfies three axioms, and they're very simple. Probabilities are non-negative. Tautologies get probability 1. And if two propositions are mutually exclusive, then the probability of their disjunction is just the sum of the probabilities of the disjunction. in other words, it's just an additive measure function on a finite Boolean algebra. Pretty simple. Now, conditional probabilities are not axiotized in the standard approach they're defined in terms of unconditional probabilities, and that's actually important. As I said, other people like Hopper don't do it that way, but decision methods for them are a lot harder, so it's simpler to do it for this. So, the conditional probability is just defined as the ratio of the unconditional probability of the conjunction of the antecedent and the consequent, divided by the unjudicial probability of the antecedent. Standard, this is very stable, this is the common core axiom notation. Okay, so because this is just a finitely additive measure over Boolean algebra, finite Boolean algebra, and I'm thinking of atomic Boolean algebra here, we can think of this, we can interpret this all in terms of areas in Venn diagrams. And And in fact, almost all the problems that I'm going to discuss today, I think, can be
20:00 done with just three atomic propositions. So all you need are Boolean algebras with three events. Okay? That's all you need for the problems that I'm talking about today. I think the most I've seen in the literature is five or six. For like Ru, I think you probably need five to do it right. Okay, so here's how you might visualize this. Okay, so it's just a Venn diagram of three events. I put little variable names here, and just think of these as the probabilities of each of these, if you will, elements of the partition, or if you will, just the states of the Boolean Algebra. Each of these represents the probability of the two to the end, in this case, eight states. Okay, so that means you can affect the translation of probability statements into simple Algebra, either sums of real numbers on the unit interval or ratios of sums, okay? So, well, h, since the whole thing has the probability of the topology has to be 1, h is just 1 minus the sum of all the other numbers, okay? So, really, we only have 7 basic variables here, not h. And the probability of x is just the sum of all the things in x, and so on. And then the conditional probabilities, those are just ratios of the unconditional probabilities, exactly as you would expect. What's z? These are just propositions. Now, z, the letter z stands for? It's the third event. Oh, this third circle. Oh, I see. Yeah, it's just a Venn diagram. X, Y, C, yeah. Just a Venn diagram, simple thing, three propositions. You know, the usual thing. I'm going to use truth tables, actually, because it's easier when you generalize. It's hard to do Venn diagrams with more than three events. You can do it, but you can't do it symmetrically. It's annoying. But you don't have three events. You have eight events. Well, the word event, I mean, yeah. That should be the members of the Boolean algebra. Yeah. You know, probabilists are ambiguous on this. Sometimes they'll say events for the atoms, and sometimes they'll say elementary events. Yeah, I mean, let's just call these the economic propositions. And then there are the states, the opposite state descriptions, and that's what these more fine-branded things are. So, when you look at a truth table, it's a little bit easier to see. So, in a truth table, we've got A states or state descriptions. and this is how you translate it into a truth table. So, a normal truth table, you lay out all the cases, and here, the stochastic truth table just has an extra column where you put the probabilities of each of the states. So, you'll have eight real numbers, and they have
22:30 to add up to one. In fact, that's another characterization of a probability model. I variable assigned to the states, or the state descriptions, such that they add up the one. That's another characterization of a probability model. I prefer this truth table, and the program spits out truth tables for models. It's easier to do that. It's hard to draw into diagrams for larger cases. Okay, so that's the basic structure that we'll need. We won't need spaces with more than three atomic propositions. So, this is all we'll need. Okay. So, that's the basic setup. So, now that we've got this translation from probability calculus into simple algebra, we can see that all the problems we've been interested in, in terms of the validity of arguments, can be cashed out in terms of the satisfiability of sets of statements in simple algebra. All these things will be questions of the form, do there exist seven real numbers, A through G, satisfying all the members of some set of equations, inequations, and inequalities involving sums and ratios of sums? Of course, the first eight statements in this thing will always be that the numbers characterize the probability model. In other words, they have to add up to 1, they all have to be unspeakable. first eight constraints, and then you'll have the other substandard constraints, the premises of the argument and the denial of the conclusion of the argument. If that set is satisfiable, then that means there's a model showing that the argument is not valid. It's a model in which all the premises are true, and it's a probability model, and the conclusion is false. Here's an example, and this is a very famous example. It's sort of the example you learn if you take probability theory courses, the first kind of non-trivial probability model you look at. and it's the first kind of example you'll see and what it involves is the following fact it's possible for three events, x, y, and z or three propositions to be such that they're pairwise independent that is neither positively or negatively correlated but they're not independent simplifier another way of saying that is the probability of the conjunction of x and y is the same as the product of the probability of the conjuncts of x and y that's just to say x and y are independent under PR, x and z are independent, y and z are independent, but x is not independent
25:00 of the conjunction of y and z. That's a well-known fact that that's possible. And on the right I've got the algebraic translations. This just means b plus e is the product of b plus b plus e plus e and b plus e plus e plus n, just doing the translation. So if you want to know, is it possible for x, y, and z to be parallelized independent but not independent it's implicator, that's equivalent to saying, is there an assignment of values to these real variables, such that they're all in zero, one, and they add to one, okay, such that these equations are true, and this in-equation is true, okay, does that all make sense? Okay, and in fact, the answer is yes, there is, and I'll use the program to very quickly find the model. It's trivial for the program, easy, easy thing. Okay, here's a less trivial example and one related to Bayesian confirmation theory. That one that I just talked about is kind of related to Bayesian confirmation theory because it's about independence, so you could cash it out in terms of confirmation. But here's one that is really relevant to the problem of measure sensitivity. Remember, I mentioned that the typical approach to the Rabin paradox only works if you choose certain measures of confirmation, but it doesn't work if you choose, for instance, which is the difference between the probability of H given E and the probability of H given not E. The reason for that is this. This is the property that everyone relies on in their arguments about the written paradox. This property says, if the posterior probability of H is greater on E1 than it is on E2, then E1 confirms H more strongly than E2 does. That's actually a very plausible principle one that I I would accept. Almost everyone does accept that. It turns out not every measure of confirmation has that intuitive principle, and in particular S doesn't. This is somewhat embarrassing because Jim Joyce, who in his book, The Foundation of Causes of Desistory, defends the measure S because he thinks it's useful for the old evidence problem. In his entry in the Stanford Encyclopedia, in an early draft of it, he had these desiderata for measures of confirmation. And then he goes on to defend S. But it turns out S doesn't satisfy this, it's been wrong. Well, that's very surprising that it doesn't, and he wasn't the only one that did. Nobody seemed to be aware of that until I discovered the model of the program.
27:30 So, to represent this problem, what we do is, well, we can easily represent this inequality, we just translate the probabilities. This inequality is this inequality where you substitute S in for this metavariable C, and then you get this sort of nasty looking inequality there. So the question is, can one be true and two false? The answer is yes. No one knew this until we discovered a model with a program. Okay, I'll show you that in a bit. And that forced Jim Joyce to abandon the measure. Now he has a little footnote that he no longer defends that measure. Because the desideratum is correct. It's just a measure that's bad. As we'll see, the program is a very good tool because it was an interesting dialectic. I came up with a model, and they said, okay, yeah, but can you, maybe there's something weird about this model. Can you add the following properties as well and find the model that also satisfies these other conditions? And I'll do this when I demonstrate it. And I said, sure. And then he sent me some more conditions, and eventually realized, yeah, the measure's got to go. There's nothing weird about the examples. Okay. Okay, so the answer to this is yes, there is a counterexample, and I'll show you that. Okay, so now, just a little bit more, just a little bit of theory here, but the details don't matter too much. Okay, these problems can be expressed in the theory of real close fields. In fact, the theory of real close fields is really way overkill for this. It's like a sledgehammer when you just need a tweezer. Okay, so what is a real close field? Okay, real close field is a structure like this, the first part, which is a field, and then you have this less than relation which satisfies these properties. And also it sucks that every positive element of f has a square root in f, and every odd degree polynomial in f of x has a root in f. Now, it doesn't matter. What matters is that the set of reals forms a real close field. That's all you need to know, okay? In fact, this is going to be overkill, because all of our problems just involve existentially quantified simple algebraic statements over the years. It's just a string of existentials, right? You just want to know, is a certain set of equations satisfiable, or inequalities? So we don't need the whole theory to rule those fields, because the whole theory, you can have arbitrary quantification and a lot more complicated structure. We don't need that.
30:00 That's actually important, by the way. And at the end of the talk, I'll explain that that makes the problem simpler than you might have thought in a useful way. Okay. All right. So now I have to plug a book. Saul and Anita Pfeffermint have written this biography of Tarski. It's really terrific. And one of the great things about it is, so it's a combination of salacious details involving Tarski's sex life and other things, and on the other hand, these logical interludes that Saul wrote. So you're the party of Tarski, and then the logical decision methods. So the entry on decision methods here for algebra and geometry is wonderful. You could actually, it's so simple, you could teach it to grade school kids. I mean, that's how, you know, Saul is really good. So I highly recommend the book. And the reason I mention this is because now I need to talk about Tarski, who was the founder of the group that I'm now in, so it's kind of a nice historical thing. Parsky, and in fact, apparently this was in the 20s when he was still in Poland, described the decision procedure for the theory of real-world fields, for the entire, for the full theory. So in principle, this would give us a method for determining whether arbitrary arguments and probability calculus are valid, in fact, a much broader class of arguments, of which this is just a special case. So, of course, and you may know about this, you may not. I'll just quickly go through and I'm not going to get into technical details, but the basic idea is the elimination of quantifiers. So, what Tarsi showed is that a formula of the form that exists in x PxA, where PxA is quantifier-free, is equivalent to a quantifier-free formula Q of A that just depends on A. And, of course, A here could be a vector of variables. Simple example. PxA might have the form Fx of A is 0 and gx if a is zero, where f and g are polynomials, so that would be a case where we're looking for conditions on a under which the polynomials have a common root. It's a classical result of algebra that there is a polynomial, q, called the resultant of f and g, which vanishes exactly when f and g have a common zero, and Tarski's method is a vast generalization of that result. He not only showed that q exists, but he showed how to compute q from p. Applying this procedure here again and again, one existential quantifier after another, you can eliminate all quantifiers in the formula with necessary quantifiers. And in our case, we only have existential quantifiers anyway. So the crux of the matter, really, is just eliminating a single existential quantifier. This piece of algebra that goes back to Stern's theorem, which counts the
32:30 number of roots of a polynomial in an interval in terms of the alternations of sine of coefficients. Okay, I won't get into the details of the method, but let me just say something about its complexity. Tarski's method is very complex. Its complexity is not bounded by any type exponentials, that's bad. Even for our simple class of problems where we just have seven variables, it's really not tractable to use Tarsen's method. It's not really feasible, even with today's computer. The intuitive reason it's so inefficient is because it eliminates one quantifier at a time, and each time you eliminate a quantifier, the formula expands doubly exponentially. Okay, some years later, oh, so that was in the 20s, it never published it, it kind of sat in a drawer for many years, so it sat in a drawer for many years, and in the 50s, in the late 50s, McKenzie, who was at the Rand Corporation, decided finally to got this together and published it, and it came out in the book, and that's what the reference but the method is much older. Now, in the 70s, George Collins came along, and so this method just sat around since the 50s for 20 years, and it never got implemented because it was just not feasible, especially back then on the computer. Now, George Collins came along and invented an improved quantifier elimination method called cylindrical algebraic decomposition. Basically, I'm not going to get into details, but basically, this method decomposes a set to find a carcine language into a disjoint union, a disjoint union of many cells called a CAD, such that the polynomials involved in the definition of the original set do not change sign on any cell. Geometrically, existential quantification corresponds to projection onto a subspace of the dual variables. The projections of a CAD are also defined by a CAD, and the complement of a set is defined by a CAD as well. Sorry, does the complexity go up with adjacent existential quantifiers, or it's only when you've got negations, which is first? Can you take those strings in a... Oh, yeah, I'm going to get to that. no that's just that no that's when you have negations in yeah that's and that's crucial i mean and that's simpler simpler oh yeah pure the purely existential case it's not it's not a time you just got seven existentials next oh no it's not i'm going to get to that that was just for the general case no that's just but but remember this this is the method i mean this is the method i'm i just want to talk about the method for the general case and then i'll talk
35:00 about improvements of the special case. We'll get to that later. Okay, so Collins' algorithm is only double exponential in the number of variables. That's still pretty bad. And you have to be careful. When I say number of variables, I mean the underlying real variables. But now remember, the number of those variables is exponential in the number of propositions. So when a Bayesian adds one variable to their probability model, it's not just a double exponential increase in complexity using that using this general algorithm it's a lot worse because you go from seven variables to 15 and it's double exponential in those okay so it's exponential and then double exponential on that okay so from a probabilist point of view it's still pretty complex and it turns out this is a lower bound on the complexity of quantifier elimination for the general theory of real cost fields you can't do any better than double exponential the number of variables that was proved in the 90s pretty recent In a sense, Collins' method eliminates all the quantifiers we want, and that's why it's so much more efficient. Okay, Michael Beeson has recently remarked in a paper for a Turing Fester that had recently appeared, he recently remarked, talking about these methods, there's some hope of solving interesting, even open problems that are too hard for humans before the exponential behavior of the algorithm takes its toll. and, in part, this research is an indication of that remark. Since then, since Collins Hoon-Hong has improved further on the algorithm, you can come up with this partial CAD algorithm. Of course, it hasn't improved on it in principle, but for certain kinds of cases, it was slightly better. And it's been refined and sort of implemented in a public domain program, which you can download and compile on your machine. This is not the program that I'm using. From here, there's a nice tutorial about quantified elimination there as well. I have a bunch of references to that. Okay, now, some of the functionality of this partial CAD algorithm has been implemented by Strzyzbonski in Mathematica. This was in version 4.1. This is only a few years ago now. But that was an experimental thing which was kind of buggy. This year, only this year, become part of the main kernel of Mathematica in version 5. And it's pretty good on our class of problems. So Mathematica 5 comes built in with a suite of functions for CAD, cylindrical decomposition,
37:30 which computes the CADs, resolve, which eliminates quantifiers using the CAD method, and find instance, and this is the one we're going to have to use, which finds assignments to variables that satisfy some set of formulas in the theoretical field, but really just in simple algebra is all we're going to do so okay so fine instances is the real focus here okay and what it does is it takes up as an argument a finite set of equations inequations are equal over some finite set of real variables and it outputs an assignment of numbers if one exists to the variables which satisfies all the members of s if s is unsatisfiable then it outputs the empty set, and it's correct, and it's complete on these problems that we're talking about. So, using the translation procedure above, and this find instance, so basically what I did was I wrote a front end for find instance, okay? And I'm going to demo in a minute. You put in some formulas in standard notation pro-delete theory, and some other optional arguments I'll talk about to optimize things. And if there's an assignment that spits it out, gives you a model, and it's arbitrary precision, so it gives you, it gives you, provided that the polynomials aren't too high in degree, it'll give you, say, rational numbers or, say, roots, or in the really complicated cases, it'll just say, here's a root, then you can get a numerical approximation to a model. Okay, so that's basically what I did. Got a front end, so I have this function PRSAT, and basically it's just a front end that feeds into find instance. I used an early prototype of this for my dissertation, but now it's much more robust, and now it's been turned into a mathematical package that you can download from my website, and there Jason Alexander was very helpful this year in making this into a real package that's robust, that's usable, not just by me doing weird kludgy things. Okay, now it doesn't generate proofs, And it will give you models, which is, they're both very useful. So I'm going to look at examples of both kinds. Okay, so that leads me to the demo. Okay, so first thing you do, if you were to download this from my website, you would just load in this package, which has all the functions we need already in it. Okay, so that's loaded. First example, which is typical standard cross-league theory, the pairwise-independent, but not-independent
40:00 So here we've got the conjunction of X and Y, the probability of that is the same as the product of the probabilities of the contract, and the same for X and Z and Y and Z. But the probability of X and Y and Z is not the product of the probabilities of X, Y, and Z. Now, so here I'm just, whatever this output, this is going to output a model. And I'm just going to assign that to a variable called model1, so I can call it there. There's a couple of optional arguments that I want to explain. And what this does is, if there is a model, then there are probably lots of models. And so this allows you to put in some equational constraints, which reduces the number of variables and makes the search much faster. Now, of course, if you don't find a model, having added additional constraints, you can't infer much, although you'd be surprised you actually can infer a lot more. If you know a lot about probability theories, and you know what kinds of constraints are going to beg certain questions and what aren't, you actually can, it's very informative. I'll say a little bit more about that in some of the later examples. So this is a simple example. Now, I can let this run with no constraints, and it would find a model in a few minutes, but I don't have time for that. So this is like a cooking show. I've already found a model, and I know there's one subject we need to do. In fact, you can put almost any marginal constraints you want on the fryers, and it'll find a model. The second optional argument is regularity. Bayesians typically assume their probability functions are regular, which means that you don't assign probability zero to things unless they're contradictions. And so that just forces it to be positive. All the numbers in the vector are positive if you don't have any zeros or ones. So I'll run this and it'll find a pretty quickly regular model. And it outputs a data structure that's kind of hard to visualize, but we've got another function called truth table, which a more familiar form. So here's your truth table, and here's your probability. Okay, pretty nice, I think. And you can, of course, verify models trivially. Now you just substitute it back in. True, true, true, true. I mean, checking models is trivial. So once you've got them, you can check them very easily. Okay, here's another example, and this is the one that surprised everyone. So here I'm just going to define the measure S. Remember, it's the probability of H given E minus the probability of H given not E. Once that's defined, then I can go in and use it. And so I want a model in which
42:30 the posterior of H is greater on E1 than on E2. And in fact, I'm even going to throw these in because Jim Joyce wanted me to. E1 and E2 both confirm H. So it's not some weird case where E1 confirms, but E2 disconfirms. No, no. They both confirm. And in fact, according to F, E1 confirms less strongly than E2 does, even though So E1 confers a higher probability on H than E2 does. And S is one of the few measures that violates that. Now, here I've got some non-trivial constraints because it took a lot of cooking to find it. This model takes a while to find if you just let it run without constraints. But even with the constraint, it's not totally immediate. And it's not a trivial model. 768, that's a large urn. Yeah, that's not a deck of cards. It's not obvious. So now you can visualize it in the standard wave of the truth table. And you can, of course, check that it violates that dagger property very easily. And you can throw in the other conditions if you want, but I didn't do it. Now, okay, so far we've all been doing all negative results. Okay, what about positive results? What about theorems? Well, the difference measure satisfies dagger. Okay, so the difference, which is just the difference between the positive and the prior. If you plug that in here, well, you can't have E conferring as high or higher conditional probability on H than E2 does, and have D reverse the order. That can't happen, and that's pretty easy for the program to prove. That's fast. There are no such models. Okay? Yeah. So that's one that's not too hard. Here's an example that Igor Dachlan sent me last year involving four events. Okay? Given any finite set of propositions S, such that each proposition H in S is correlated with the conjunction of the members of any non-empty subset of S that doesn't contain H itself, does it follow that the conjunction of the members of any non-empty subset S prime of S is correlated with the conjunction of any non-empty subset S double prime not overlapping with S prime? Okay, so the answer is no. Okay, but you need four events to see it. So the first set of conditions is, well, they're each pairwise correlated. one's correlated with all the other conjunctions and with all the other conjunctions of three with two and three but that's consistent with in fact the pair a conjoined pair being independent of another disjoint to join the pair in
45:00 fact all three of these can be true while the first thing is true so this is a much stronger result in doubt and wanted to prove and so I just take the union of those now this one's kind of hard and okay but again I'm just gonna show you what okay so that's not easy and here's what it looks like that's true so how long did that take to do oh that takes a while I mean if you just let it run it will probably take a couple of days yeah I'm gonna address these complexity issues in a minute yeah but but you get good at I'll say a doing pretty good on top. When did I start? 16.45, you're talking for one hour. Oh, good. Good. Let me just look at a few more examples that I kind of like under the rubric proving theorems. Bay's theorem. Trivial. That's a joke. Popper Miller's theorem. Trivial. It's a joke. Popper Miller had this theorem that the degree in which he confirms age composed into a sum of the degree which E confirms H or E, plus the degree which E confirms H or not E. They call this the deductive part, and this is the inductive part. That doesn't matter, but that theorem was essential. It's trivial for the program. It's a joke. Michael Redhead, in response, noted that, well, that's true for the difference measure, but not for the ratio measure. That's one of the few cases in the literature where someone general it was and that's easy to show with the program right and so there's a model in which it's false for R R doesn't decompose in that way which isn't surprising even if you take the logarithm of R it doesn't decompose in that way so no that doesn't help Carl Wagner has an interesting paper on what he calls probabilistic modus actually modus tollens is what that should be and modus ponens he talks about both but the most tones results the new one and I won't I won't explain it but basically these are the two central theorems in that paper they're not trivial but for the program they're they're easy right so that's here's here's my one of my favorites elves and sober prove that if you have a causal chain that's Markovian, that is to say, if you have a chain of events, such that conditional
47:30 on y, x and z are independent, and also conditional on not y, then the independence relation is transitive on this chain. If x and y are dependent and y z are independent, then x and z are independent, given that y on that called the transitivity of probabilistic causation and that's very easy for the program to prove as well so there's that it's not a necessary condition and they give kind of a model involving zeros and ones here's a here's a regular model and that's very quick just assuming a uniform prior okay um so let me just make a few uh if you go back to the presentation here make a few Okay, so this is highly effective for problems involving less than or equal to three propositions. I haven't seen any problems of that size that I can't solve with this thing. Even with four events, I've been very good at solving several events considerably harder with this prototype. Now, one of my projects is to implement a more efficient version of this, and there's two aspects to doing this. One is, mathematics is an interpreted language. It's not really designed for speech. It's nice for prototyping, and it's got good notational capability and stuff, so it's a nice user interface, but it's not really designed for speed. So it'd be nice to do this in, like, Camel or ML or C or some compiled language that's faster. But there's a more important issue, which is this. In the early 90s, Canny, who's an engineer at Berkeley, proved that there's a PSPACE algorithm for the purely existential case, and that's the case we're interested in. So that's PSPACE. Now, that's asymptotically, no one has ever implemented the algorithm, and the constants are pretty large. So it's not clear whether it's really going to help with the problems that are small that we're talking about. We have a student who's doing a dissertation on that now in the logic group, implementing that and testing it on problems. It's not been done, and it's not trivial, and it's not even clear whether it can be done. When you publish these results, then when you actually try to do it, it's a different story. Sometimes it turns out it can't be done. Now, my interest in these things is largely instrumental. Of course, it grew out of an ability to just better understand arguments in the literature and under what conditions they're valid.
50:00 And this is a great area for collaborative research. I mean, it involves philosophy of science, computer science, mathematics, logic, lots of different things. And let me just say a little bit about the mental. It's been the last couple of minutes just saying about how useful this thing is. while it's true that if you add constraints and you don't find a model that doesn't mean the thing's a theorem but the following is also true in my experience when you get experience that empirically working with these models and different sets of conditions you can you get pretty good at uh picking the right constraints such that they don't beg the question okay so here's a good example uh the raven paradox the traditional approach in bayesianism makes very very strong independence assumptions. In fact, Peter Gravitz has a recent paper in BJPS where he criticizes those assumptions. Jim Hawthorne and I, using this program, discovered a much weaker set of conditions that suffice for the Bayesian approach. How did we do it? Well, I just wrote down a bunch of weaker conditions and ran the program. It found a couple, and we thought it was wrong. That's too weak. How could that be? No, it's impossible. It turns out it wasn't. It took us a month to find an axiomatic proof, but in fact, it was right. This involved me adding some, say, marginal constraints. We didn't know from the searches that it really was a term, but we suspected with high probability. This thing is very reliable once you get good at using it, even in the case where you're not doing purely exhaustive search. If you pick the right constraints, it's very reliable. A little I don't really care that much from the point of view of applications about either soundness or completeness. My mental health plans are neither sound nor complete. I don't hold that against them. All I care about is having a higher probability of solving the problem than I would if I were in my shower in the morning. That's pretty much all that matters. So this thing is very good in that sense. It's allowed to solve problems we couldn't solve, even though in some of those cases we found the solutions with methods correct or complete. Okay, so that's all I want to say. Thanks.
52:30 Can any lessons about or do you just say, Yeah. This talk was just about the validity question, but what you're asking is what about the sound in this question, right? So you have all these arguments and applications invading confirmation theory, and this is just a question about validity, purely a mathematical question. What about sound? All right? Which of these arguments is sound? I mean, provided that they're valid, right? I have an answer to that. not a bayesian just start by saying that um for various reasons but um this work has been very helpful for me to figure out what i think is the right theory of confirmation from a purely mathematical purely formal point of view from a lot of okay and i have my theory i defend is based on the likelihood which is which is just calm now so it's incumbent upon me in the book and this is That's what most of the book is about, is to argue that the yeses are good when they're yeses for L, and the noes are good when they're noes for L. But when you get disagreement with L, that's bad. What I do in the book is I take all these, maybe not all of them, but the important ones, and I show when they disagree with L, why they're fallacious, that is to say, why they're not sound arguments. And I have to do that, it's incumbent upon me to do that. on fl it's not a basic theory because it doesn't interpret probabilities at all and so it doesn't interpret them as degrees of belief okay um but it's a relevance theory it's a theory based on probabilistic relevance and it's a theory based on its likelihood ratio um so yes this stuff's been very useful for me to get ahead on you know which properties are satisfied by which measures and that's important to know if you're going to base a theory of confirmation or inductive logic on a relevance concept and you want to have some quantitative account of that which i do so i think the right answer is l for various reasons which i mean i could get into if you want um but but just to answer your question again this for me was very very helpful and for others too i mean i mentioned jim joyce uh and there are other people who have picked up on this and uh since a few years ago when i was first publishing this stuff there's been a lot of also applications in statistics uh various principles and statistics depend
55:00 there's been a lot of stuff on that now and so yeah it's been very helpful for me and not just for measures but when I teach the course it's so great to have a program if you want to know some of the theorems you just plug it in John Mayberry no I don't know what The cylindrical algebra decomposition, it smacks very much of phenomena that occur in own minimal theories, as I say. Yeah. So, I mean, things definable in structures which are models of own minimal theories, can define the sets by usually broken up into finite cells right right decomposed yeah so i would that was why i was interested to find out what you said this lower bound was yeah yeah in the general i'm not sure whether the i don't know if you've talked to any models I haven't much yet, although I will, I'm sure. Because it might be for them that they would say fairly quickly that the piece-based lower bound that you've got from the other person, this might come out rather easily from some of the considerations of the I don't know I think okay that would surprise me a little because the CAD thing looks it does and I'm just wondering whether the minimality arguments also tend to give this double explanation yeah I should talk to this but it would surprise me a little since Galen's been working on this and he's talked to most of those guys. I mean, he knows more about the real nitty-gritty of the complexity issues, because that's what he's doing in his dissertation. I've been more interested in the applications, but I should definitely talk to all of them about that. I'll chase it up. Thank you for that reference. It'd be nice, because if there was an alternative way to do that, then you might be able to come up with it, because Candy's algorithm is very messy and involves going through weird, sort of infinitesimal, nonstandard constructions. It's bizarre. So it'd be nice if there were sort of a cleaner way come up with an algorithm and that would be very helpful yeah they may have i mean they may have
57:30 a way of doing it i should definitely thank you for that had anybody tried to prove this dagger no no no so yeah they just assumed it they assumed it was true yeah surprising to me i mean afterwards that uh people haven't tried to prove it and it failed and then becomes No, no, no. See, that's the thing. It's one of those, this is an interesting thing about human beings. It's one of those properties that just seems so central to the notion of sort of probability raising, right, that you wouldn't even think that it could be false. I mean, if you think a little bit more about what it is, you've got a fixed hypothesis, age, you've got two evidential propositions. And what you want to know is you're measuring in a sense the degree of probability change right but there's only one prior so the all that should be relevant as a comparison of posteriors there's only one hypothesis so how could it be that just looking at these doesn't tell you doesn't respect the ordering imposed by um that the ordering imposed by the posterior doesn't respect it by the ordering imposed by degree of probability increase because there's only one prior i mean what could be relevant to degree of probability increase except for the the two posteriors that's the answer and i asked people that they thought it was so obvious that they didn't even bother to try through another another problem here is remember when i the way i developed the thing is very much the way people think about it you have all these logically equivalinating qualities you wouldn't think that there'd be such a radical disagreement that wasn't merely conventional i mean there's this way to the slippage you sort of slip into just thinking oh they must all these they're just conventionally different there's got to be some measure there's some order preserving transformation that gets you all the back that's false that wasn't obvious right if you didn't have really thought about you wouldn't see the need to even but this one especially i mean even when you think about it it seems bizarre that any relevance measure would violate it just seems crazy there's something really weird about measures to do that which is why everyone assumed it is going back all the way i mean popper and uh keveney and oppenheim i mean everyone assumes It's embarrassing that Jim uses his own measure violated. I mean, you just didn't check. That's irresponsible in a way I can. I don't know. But these are philosophers. You've got to remember that. These are philosophers. I mean, so yeah, so maybe you're right. It might sound somewhat disingenuous for me to call them open problems.
1:00:00 I mean, what's an open problem? I mean, it just means a problem that people are wrong about or they have false beliefs research i mean you might you have a sort of different notion of a problem which is people really trying to work on it and they can't do it that's not what that that that wouldn't be appropriate because these are philosophers i mean in other words the mathematical problems are not what's open is the philosophical application i mean it's not the mathematical you know you're assuming that's part of what makes of the open problem but it's not the entire that's what so what i meant by problem is a broader thing wasn't just the mathematical I'm familiar with a few of these arguments, but you gave an example of the last one, the critique of Bayesonism, which I call the excess content. I'll try to remember it. And you gave a little illustration of the calculation. I don't know. The, the, um... Yeah. Here's... You're talking about the popular... Uh... Now, I thought... Right, but that's not the way I remembered it. I remembered the argument as depending on the idea that the probability of the excess content was counter... Is it just reduced to this? Yeah. The argument was that the probability of the excess content was counter-submitted by the evidence itself. This property is just one premise in a larger argument. It's the basic mathematical result. What gets the philosophical thing going is that this thing is always negative, and this is always positive. So in other words, it's the deductive part that's really, there's no inductive support. That's the philosophical thing. The mathematical result that they used to explain that is this result, the additivity of the D-measure, in this sense, the D-composition. Right, but so for the other measure, you don't get. The main idea was that the excess content of a theory, the stuff that goes beyond the evidence, is actually counter-supported. Yeah, and that's not true for all of us. That's not true for all of us. that's not true either but the reason it's not true is you don't always have this decompression reduces to that right yeah more or less i mean i mean um the main feature of the measures that does the work is this and this fails but you know that other property also fails there'll be other
1:02:30 men there'll be lots of measures for which this can be positive so yeah i mean that's it really does depend on picking a certain way of carving up yeah the degree of probability it's just the way you expressed it with equation eight uh my the way that i had it in my head was this counter supporting the fact that the degree of confirmation of the excess content yeah to the evidence was negative that's the way they thought yes and the way the way they do it is they assume that d is the way you measure confirmation and they argue that this is the inductive content this is the inductive part of the confirmation that's always negative yes and since you can decompose it linearly that means there's no inductive support the linear there wasn't nothing was followed okay and that's important that's crucial but not only is this false but this can also be positive absolutely now that relates to the next question uh because when you mentioned that there was a you like the which uh which uh you like the log like likelihood ratio yeah what happens to the because i'd rather like the problem i'm not an active inductivist i mean i'd rather like their argument so does that actually come out positive on your measure no it doesn't it doesn't but i was a pity no no it's not though because you there's no way i mean part of their ability to identify this as the inductive part is this linear decomposition right that's a fails for l so there's no so the argument doesn't go through right because yes that's negative but that can't be identified with the inductive contribution because there isn't such a thing, you have to be able to decompose it linearly. You can't do that throughout. So there's no way to neatly decompose it to deduct from the inductance. That's what fails. That's what's wrong with their argument, really, in my opinion. That's how I would respond. Well, there was a standard criticism of the argument, which was the breaking it down to E, and if E, then H, which is equivalent to H or not E. That's right. Was that an inadequate way of representing the excess content? Well, I think the argument is pretty forceful if you do use D. Because then there And you really could argue that this is the deductive, and this is exactly what goes beyond the deductive. Yeah. But without that decomposition, you can't argue that. There'll be some extra, with L, there'll be a lot of extra terms. And you know, is that partially inducted, partially deducted? What do you say about the extra terms? Yeah. And they don't have an answer to that. Yeah. I mean, there isn't really. I mean, what do you say about that? Well, they express it logically. They say that, so you've got the evidence, and then if you want to get from the evidence to the hypothesis, you take the conditional of the evidence, then the hypothesis. Yeah.
1:05:00 evidence in the evidence than the hypothesis gives you the hypothesis and that that is the that was the logical motivation for claim oh yeah but you've got to translate it into there being a decomposition in terms of the measure of confirmation otherwise it doesn't doesn't go through right because if you can't do it purely deductively that would be questioned right yeah you can't just use deductive logic to argue there's no inductive support you need there to be the opposition in terms of measurement as well, otherwise, I mean, it's, right, we'll be sure to question back. Leon? Yes, I have, sorry, the surprising thing about what you say is that answers to simple probability arguments, validity questions concerning simple probability arguments, the The answers depend on the choice of over-deliving measures. Is there anything... You mean choice of confirmation measure? Yes, confirmation measure. Is there anything systematic known about under which class of transformations of confirmation the answers to the validity questions are going to be invariant? Yeah. And the answer is none. No, not everyone. As a result that I proved in the book. But I mean, for that you need a little bit more structure. So, I mean, really what you want to know in a more general case, and I've omitted this, is really compromises a three-place relation to the basis. It's H, E, and some background. So really you're not conventional conditionalized on k okay um so the real original structure is going to be for measure c is going to be something like this and what you can show is for instance take the difference in the likelihood ratio um there if you assume that the difference in likelihood ratio are all that they in a given in case agree, just in a token case, agree on a judgment of this general form, then the probability function has to be trivial. And there's lots of classes of problems like that. There's lots of cases where if they agree, even in a token case, the probability function will be trivial. There's lots of results like that that you can prove. So in fact, if you take a larger class of measures, then it's really trivial. because now if you latch up 20 measures then there really will be no conditions in which they
1:07:30 all agree in any interesting cases but even with just the four measures it's very hard to find cases where they agree as i said in every argument that i ever found they disagree and you can prove some very simple kinds of results about pairwise comparisons that if they do agree then you have a trivial for certain kinds of uh comparisons right so there are a bunch of results like that which I will explain in the book. So the disagreement really is radical. It's not at all conventional. It's a very radical kind of disagreement. I mean, I originally got into this because as an objection to Bayesian, I mean, I got into it as an objection to Bayesianism because to me this is an extra element of subjectivity, basically, or conventionality in Bayesianism that goes beyond the subjectivity of priors. Because the priors are subjective, if you want, or some people think they are. But even if you fix the priors, go ahead and fix the distribution. this distinction you're still going to resolve this disagreement how are you going to do that basing principles aren't going to i mean you need some kind of principled way and that's why for me i mean so let me just tell you about what my theory is very simple so i just have a couple of considerations one is um one is that it should be a relevance measure and why because i think judgments of evidence are relevance judgments when you say that he is evidence for age i think intuitively uh that entails uh relevance probably risk and it does take that as a basic given that be a probabilistic relevance theory so that's one assumption that is relevance and you just need basically one more assumption to get you almost uniquely the electric ratio and that is that is this um that c of h you forget about k pronounces that's not important um uh if e entails h that is the evidence guarantees that h is true then this then you should have If the evidence guarantees the truth to the conclusion, then the degree of confirmation should be maximal. That's just a deductively valid argument. If E refutes H, and here I'm assuming that E nature contingent, let's not worry about paradoxes in general, because that's not really an issue here. We're talking about really a deductive case, that's what's interesting. If E refutes H, then you should have a minimal value, and it should be negative, because I'm just going to impose that for relevance, so this should be maximal greater than zero, minimal less than zero. if you have independence then you should have zero and then here you should have greater than zero when you have when you have correlation less than zero when you
1:10:00 have anti correlation okay well this already that already gives you relevance okay so really it's just this one that basically gets you uniquely out at least it's not not that bad unique but unique out of all historical proposals if you If you add that it respects, for a fixed hypothesis, that it respects the posterior probability order of that hypothesis, then given very weak continuity assumptions, you get mathematical weakness. So it doesn't take much to get down to the lecture. But really, out of the historical prose, all you need is this. in the middle. That's it. So in a way, Carnap's mistake, I mean, one of the mistakes he made was he thought of confirmation as posterior probability. And what that does get you is it gets you these age standpoints, right? If you have entailment, then conditional probability is one, reputation is zero, but you don't get relevance. And in fact, if you use the posterior probability, if you just use that as your measure, then you get the absurd result that evidence which raises the probability of the truth of H, of H1, of H, can actually confirm H less strongly, sorry, yeah, than evidence that disconfirms it, that lowers, so evidence that indicates it's false can be more strongly confirming than evidence that indicates it's true, and that's just because it ignores relevance, which is posterior probability so this sort of gets you what carnet wanted which was a generalization of entailment and trellis which is i think intuitively correct for inductive support and that gets you l so that's that's the justification for that column but i still have to explain why all the arguments are the other arguments are wrong that's why the book is going to become long there's a lot of arguments out there I would like to add a question. Sure. Absolutely. I was, at first, a little surprised when you answered Leon's question by saying, well, L, that's the thing to look for. I would have thought he would say something, what's the scale level of questions of confirmation? And the proper scale level would be the ordinal level.
1:12:30 Or probably the ordinal level. Yeah. Yeah. And so, I was thinking, could one read through what you just said in theories of measurement? Yes, absolutely. So there you could have an empirical structure. And this is really something that you just described. You shouldn't, if you wanted to use that model, then you shouldn't say something like maximal or minimal in the sense of numerical values, but in the sense of, say, possible order? Yeah, sure. That's what I do in the boat. Yeah, yeah. I mean, I identify measures according to our original structure, officially. Here, I mean, this is just convenient to present in terms of quantitative measures and how they distribute. That's a convenient way of presenting, but I really think that's a heuristic. Once you realize what the right, what you want, I mean, what I want is sort of a logical generalization of entailment that sends it to relevance, sort of inductive logical measure. You basically get L, I mean, out of all the historical proposals, you basically get L. If you're going to pick any quantitative measures. I don't. I would say anything ordinarily equivalent to L is equally good. Yeah, that is my fault. But anything ordinarily equivalent to L will satisfy these, of course. Because those are, basically, you can do those as ordinal kinds of constraints if you do it right, if you set it up right, as you said. You have to be careful about how you do that. But in the book, I do. I'm just being kind of sloppy because the traditional way of doing inductive logic is quantitative. Of course, yeah. I'm happy to do that as a heuristic kind of sloppy way of talking about it. When you really do it, yeah, it's got to be all in terms of partial orders, or even quasi-orders. I mean, you might not even think that everything's commensurable. I mean, that's why I talk about that. So the idea is to define an empirical structure that you need? The type of empirical structure that you need. The numerical structure is clear, it's a kind of given by these numerical methods, something like that. Then you have a representation result? And for example, L will give you a representation that could be easy, and then you have, variance results because L is not uniquely fixed by that principle. Absolutely. That's how I do it. That's the way it's done in the book. Then you could say the question is empirically significant if it's invariance under every possible representation. Yeah, that's right. There are different kinds of things you can say about that, different kinds of invariance. Yeah, and I do that sort of the way I said it. I mean, that's the way I really want to do it. In fact, ultimately, what I really want to do is kick away the ladder altogether and just use ordinal ranking functions or qualitative probabilities underneath.
1:15:00 You don't even need quantitative probabilities at all. Look, all you need, at least what I ultimately will argue in the book, is all you need to do is compare conditional plausibility to the ranking functions where you look at the likelihoods. You mean ordinal ranking functions? Yeah, for instance. You could use that. And that's really all you need. the problem of priors and everything because all you're doing is conditional plausibility judges and comparisons of them that's all you need if you use d you need more than that okay whereas this is this structure that i'm setting up here really just depends on comparing the likelihood that's really essentially or if not like the ranking functions where it's it's the analog so really i want to kick away the latter altogether and do away with numbers but when you talk to people this is how people talk right and when i teach my course i do it this way because that's that's where everyone talks just related to just related to hannah's question uh what are the what do the actions look like then for i mean you can write in presumably the intrinsic actions for comfort for confirmation for the confirmation relation approval representation which you could I actually should be should be possible to do something you can do that but I actually fear away from today I do that the book but I don't endorse that and I'll say why because when you look at I don't do it as my official view my official view is just these desiderata and whatever classes of war of orderings are part consistent with that that's what I'm trying to explicate so the reason I don't do it is this if you look at something like Peter Milne who has an argument that there's a unique measure and actually he argues unfortunate choice but he gives a really a mathematically unique instrument now it's a misleading paper in the paper in the main text of the paper he lists just intuitive desiderata sort of like these they're different they involve different ways of looking at likelihoods until you get a disagreement but that's not enough in the appendix you get all these very strong archimedian principles and other kinds of assumptions that you need to get the mathematical uniqueness and i don't like that and i'll tell you why because that has nothing to do with the explocondim. The explocondim is a generalization of entailment that's sensitive to relevance. It's inductive support. That's what the explocondim is. Anything that's not essentially obviously intuitive about that, like these things, that I can really motivate intuitively, I don't want to have any trouble with officially. This idea of doing a really rigorous reputation theorem, I do it,
1:17:30 but I say with a caveat, look, if you because what did he get out right now yeah yeah very simple he uh he likes the following thing i i just a paper on the latest paper on my website is about this issue and uh i show i think he's wrong but um he wants this to be the case This is a good thing to finish on. He wants the confirmation ordering where you have a fixed hypothesis. I'm sorry. That's wrong. It should be two hypotheses and one piece of evidence. It's the inverse of my condition. Right. I'll be styled. Yeah, yeah, yeah. It's the inverse thing. Is this an eraser? What am I looking for? So, in fact, what I mean is, you have two hypotheses and one piece of evidence. it's an untrusted argument. Yeah, yeah. That's exactly right. So, uh, so, uh, two hypotheses. Right. Okay. So, he wants to say, E favors H1 over H2. It confirms H1 more strongly than H2, just in case the likelihood of H1 is greater than the likelihood of H2. Okay? That's called the law of likelihood. This is a false principle. Okay? Simple power example, based just on logic, having nothing to do with priors. Okay? Take the following kind of case. where H1 entails E and E entails H2, but the reverse entailments don't hold. Okay, well, the likelihood of H1 is going to be greater than the likelihood of H2. Why? Because this is 1. Because H1 entails E. And there's no one telling you to assume regularity, which Peter is happy with. But this is absurd. E guarantees the truth of H2. It doesn't guarantee the truth of H1. In fact, it can say very little about H1. And yet he wants to say that E is better evidence for H1 than it is for H2. It's absurd. And all I'm appealing to here is this. And I just think he's wrong. So that's, I mean, again, that's one of the examples where I have to go in the literature and say why I think he's wrong. Well, he knows. He knows. Don't worry. Peter knows that. But I think it would be great to talk to him about it.
1:20:00 Tell him. Email me. I'd like to talk to him about it. Yeah, I have a whole paper on this issue, by the way, on my website. It's called Likelihoods, Invasion, and Relational Confirmation. So you can tell Peter about that. He'd get a kick out of it. Okay, so unless we have a break, we can have advice and let's say this. Thank you. Oh, yeah, no, that's all I, no, no, no, no, right, that's all I, no, no, you don't have No, he said half possible. Twenty minutes. Twenty minutes. That's a little pop-up stance that you just have. Well, you could have a comment. You could have a door. Yeah, but even now, I think.
Transcript not yet available for this recording.