Chris Timpson Philosophy of Physics Discussion Group, Queens College Oxford 2002
← All recordings

Recorded at Philosophy of Physics Discussion Group, Queens College Oxford (2002), featuring Chris Timpson. From the Michael Wright Collection, held by the Archive Trust for Research in Mathematical Sciences & Philosophy.

Identifier
mw0001628-cc-a_p
Format
Audio recording
Collection
Michael Wright Collection
Repository
Archive Trust for Research in Mathematical Sciences & Philosophy
Rights
Made available for personal scholarly use. Rights in recordings are generally held by the speakers or their estates. If you believe this recording infringes your rights, please contact [email protected].
Transcript
Read the automatically generated transcript

This transcript was generated by speech-recognition software from an archival recording and has not been hand-corrected. It will contain recognition errors — particularly for proper names and technical terminology — so please verify against the audio before quoting. Timestamps play the recording from that moment.

0:00 Thank you. Yeah, hang on, you've gone past it. You've gone past it. Oh, sorry, no, I haven't gone past it. It's just coming up, isn't it? Sorry. My apologies, I wasn't, my brain was not in All right. I'll leave any links. Thank you. Cheers. Damage. Thank you very much indeed. Cheers, thank you. Thank you. Excuse me, I'm sorry to bother you. Can you, the old Capadar's room, can you remind me?

2:30 It's somewhere in this quad, I know. It's okay, thanks. Hello? Yeah, hello. Yeah, sorry. I know it's somewhere in this quad. Oh, thank you. to ask this information from certain nations, which is related to the nuclear virus medicine. And it's turned out that you can't do that using the channel information. This is a big problem for the channel. So why don't you go out to here? In summary, these slides are going to lie, and I'm going to see on them getting the channel information, and we're going to argue that none of them work. And since they don't work, I'm going to suggest that this is because they have this foundation principle in mind, that because they have this foundation principle in the back of their mind, along with certain other assumptions about nature, quantum mechanics, or sort Copenhagen influence assumption that they produce these arguments against the Shannon information. So I'm trying to explain how that works. And then finally, if you want to talk about this foundation principle and whether it could actually serve as a foundation principle in content mechanics. I mean, lots of you have probably heard of the Shannon information before, so So we might find it through this quite quickly. Right, this quantity H is a function of probability distribution. It's called the Shannon information minus the sum over PI log PI logarithm to base 2 to fit the amount of information binary bits. This quantity finds its own in the context of communication theory. and we have to be aware of the word information here, because it doesn't connect anything to do with our normal concept of information.

5:00 It's rather concerned with the notion of quantity of information, and not just any old notion of quantity of information. The notion of quantity is made precise by talking about the amount of resources required to transmit messages. This is a very limited sense of quantity. Of course, we have no sense of how useful any of these messages we might be transmitting are, nor do we have any sense of the completeness to do with quantity. So we can say if we have this much information, then we have complete information, just notions of completeness, how much information there might be without something just going to enter into the sphere at all. The first way of understanding Shannon information, the most important one, is that deriving from Shannon's original 1948 paper, based on the Meiser's coding theorem. I think this quotation from that paper of Shannon's is very instructive. It says, the fundamental problem of communication is that of reproducing at one point, might have exactly or approximately a message selected at another point. Frequently, these messages have meaning. These semantic aspects of communication are irrelevant to the engineering problem. Now, this is a communication system. It has some sort of information source. It has an encoder, which encodes messages from the source and transmits them down the channel, which may or may not be noisy. And the other end, we try to decode them. And we're hoping to get the same out that we put in. Now the reason why it's blindingly obvious that meaning has got absolutely nothing whatsoever to do with it, is that all we're interested in is reproducing at the far end whatever we got in at the end near to us, meaning all the document information source. And since we must be able to deal with whatever the source produces, it's not quite relevant whether the source is actually producing anything which has any meaning. Now, the other important point to bear in mind about understanding Shannon information and not to get carried away with this teasing word, information, is that in Shannon's theory, information is not primarily associated with individual messages, but rather characterises the source of the messages. The point of this is that really what the whole business is all about is trying to understand the capacity required to transmit all the messages they've given the source And we're hoping that the statistical nature of the source will allow us to reduce the capacity of the channel we need,

7:30 the physical resources we need to transmit all the messages. Now, so we start with this model of an information source. We can think of it as being an ensemble X of letters, XI to XN, which occur within an ensemble with probabilities PI, B1 to BN. We consider drawing our messages from this ensemble. We draw capital N letters where we're going to take N to be a very large number. Now we know that for a very large N, typical sequences of letters will have NPI of the letter XI, MPJ of the letter XJ and so on. And then we know the number of distinct typical sequences because of this will be given by this standard expression. we can then use Stirling's approximation. So the number of typical sequences will actually be 2pn hx, where h of x, x is the name of the ensemble rather than being a probability distribution. You know, I slip a bit between having h as a function, as being described in the ensemble and being a function of probability distribution. We have 2n h, where h is the channel information, minus some g log p i. Right, now, as n gets even larger as it tends to infinity, the probability that we're actually going to get an a to the full sequence appearing when we draw it out of our information source tends to zero. So we only need to ever really consider as real messages the typical sequences of which we know they're two, the NHX, they're the only possible messages we're going So, rather than actually sending big N letters, we could just produce the binary code for each number in which typical sequence we have instead, and then we would have compressed the message which was originally N, a couple N letters, which in bits is n, or little n bits, to nhx bits, which in general is less than n log n bits. unless the probability of all the vectors are equal. Now, Shannon's, I mean, this is a very rough sketch of the noise that's coding theorem. Shannon's, the sort of meat of the noise that's coding theorem,

10:00 is to say that this amount of compression is optimal, that this is the best possible compression that we can do. So, that's what we can set from there. The Shannon information is a perfectly cool measure of information, as it represents the maximum amount of messages drawn from an ensemble, X, A little caveat here, a little warning. Just because the Shannon information tells us the maximum amount that a message can be compressed, we mustn't somehow think that it actually tells us somehow the irreducible meaning content of a message which is specified in bits which are somehow supposed to possess their own intrinsic meaning. I hope it sounds odd. The thought that that might be the case I think is just due to the possibility of confusion and a code. A language obviously is concerned with meaning and a code has no concern with meaning whatsoever. So that's one thing that might tempt us to take information in the wrong sense here. But the noiseless coding theorem isn't telling us that there's anything like irreducible meaning content going on. There's a minor way in which meaning is that there's a technical problem. As much as if the sequences I'm transmitting are highly non-random, there's likely to be a sometimes likely to be a highly compressed description of the level above the compression I can get randomly. It might, next is a long string of twos because it's got structure rather than I'm usually going to compress it by describing individual description. for the algorithmic, you know, you're describing using an algorithm on a computer. Yeah, and I want to say a good term in two, I would say. Yeah. I don't see that something that's not going to need to do with meaning, I think. No, I guess not. It's just another way of talking about different structures within a thing. And also, I think for large, I think it turns out that for large n, that the algorithmic information and the shaman information are not being approximately equal. I need to think about questions about algorithmic information in a bit more detail. This is more to do with how you can beat the apparently optimal, according to the theorem, for individual messages or small groups of messages, but the theorem is about optimality for all possible, and a single-speech.

12:30 You said that you're suggesting that it's the assumption that it's worth a single-speech. Chris, could I ask, is it hard to prove in that theorem? Well, that's not terribly hard to prove then. As is quite intuitive, the way that Shannon does it in his paper, and then people make it a bit more precise and so on, but the basic idea is reasonably simple. It's about 2, 3, 3 years. But that's the main, possibly the most important way of understanding that Shannon information is better in mind. is to do with the relationship between information and uncertainty. Because the Shannon measure is actually a measure of the uncertainty of the probability distribution, as well as being a measure of information. Now, what do I mean by measure of uncertainty? Measure of uncertainty is a quantitative measure of the lack of concentration of the probability distribution. And it's called uncertainty because it's supposed to measure uncertainty about what the outcome where the given distribution will be. Now, you also think it's derived a general class of measures of uncertainty, of which the channel information is 1. The most general form is this one here, where chi is a continuous decreasing function and phi is a convex function. This form here is interesting essentially because if we take a sum of convex functions like a shear convex function and then this measure on because then with Christen minus one we're going to end up with a shear concave function overall. I'm going to come back to that later on. Then if we impose a couple of scaling conventions then we get this thing, a slightly more restricted of measures of uncertainty. The mu here is a background measure in our probability distribution. We can often just take this as being the counting measure and just ignore it like I have done here.

15:00 But also you can take it straight to relative entropy. If you wish. This is very, well, very highly related to the reigning entropy that people are familiar with. I said that the Shannon information is one of these measures of uncertainty. If we take R is equal to 0, we essentially get the exponential of the Shannon information. And if we want to adopt and impose an additive scale for uncertainty, you just take logs and we end up that the Shannon information is the log of u0. Now the conceptual link between uncertainty and information is basically uncertainty in the probability distribution, the more we start to get from it to perform an experiment. Now I'll give you a little bit of a tale when you think about the uncertainty and information because in one sense, the shape of the probability distribution might not be able to think about what the outcome of an experiment would be simply because any of the actual outcomes of non-zero probability But we can, clearly probability distribution is telling us something and one way we can get a handle on that is to use the probability distributions with a value on any given outcome that we can advise from the experiment. If an outcome is highly comparable, it's not going to surprise us if it occurs. In a sense, the experiment isn't doing much for us. If we get that high on, unlike the outcome, the experiment was given to the next factor. We've learned something as well. We've learned more from it than we would have done in the high probability outcome. So we have a notion of the surprise information, the value of the experiment gives us. And a nice measure of the surprise information is minus log for the probability of a given outcome, and that's a decreasing function of the value of the probability, and how low there's certain nice properties. So before the experiment, we don't know which of the outcomes it's going to be, but we can say the unexpected information gained from doing the experiment, which is just going to be the expectation of the surprise information, of course is just the sham information back again. Now I can play a similar game for the whole general class of uncertainty measures.

17:30 Here what we do is consider doing a whole long string of experiments again. And the knowledge that the probability distribution gives us is about the typical sequences. It tells us that the outcome of this great long sequence of experiments is going to be one of the typical sequences. So how much we've learned from performing this quite long sequence of experiments depends on the number of sequences. So you could get a handle on how much the value of doing these experiments from looking at the number. You learn more sequences there are. So you want some to do with how the increasing function of the number of sequences. And it turns out you're going to get that if you have a sure concave function that's going to do that job for you. And so you can interpret n times URP, n times the uncertainty of the amount we've actually gained from doing a long sequence end of experiments. We could then divide by n and talk about the average information gained per experiment. But we But you would need to bear in mind that that's only been in the context of doing this great long sequence of experiments. So the idea of the amount gained on its own needs to be treated with great care as well as an average amount of long sequence. We could do precisely the same thing if we wanted to talk about rather than how much information we gained, If you want to talk about how much we know given a probability distribution, there we're going to want a major concentration of the probability distribution. How much we know just means how well can we predict the outcome. The more concentrated distribution, the better we can predict. So we're going to want a concentrated distribution. That's going to be a sure compact function when we get there. Now, I said there were two main ways of interpreting the Shannon information. Of course, there's a very popular one. But one, I don't think we should actually know independent interpretation of the other two. This is the minimum average number of questions needed to actually specify a sequence. Again, we have a long sequence of letters drawn from the ensemble, or equivalent to our own independent experiments. But this time, the outcomes are kept hidden from us. In fact, you can even imagine that the experiment was never performed,

20:00 emotional, emotional hidden outcome. And our task is to determine the sequence by asking yes or no questions. And the stipulation is that we do this in such a way that the minimum average number of questions need to identify the sequence as minimized. We need an average in here because otherwise a lucky guess might identify the sequence for it. Now the optimal possibilities with each question, because then each time we're getting the maximum work from the question. We clearly can't do that all the time, but we can have to do it on average. The way to do that would be to, one way of trying to implement that, is to divide the outcomes of each of the individual experiments that we performed into classes of equal probability, and then ask if the result of that actual experiment belonged to, which class it belonged to, then once we've written off one bunch of outcomes and we say, well, okay, if we go back and we've got some left, we're trying to divide those equally again. Now, clearly that's not going to, you're not going to be able to carry on doing that indefinitely if we just type, we're just asking about the individual experiment in isolation simply because you can't carry on dividing up the probabilities of the outcomes in such a way that you get 50-50 probability for each one. So using this strategy, the average number of questions needed is always greater than or equal to the channel information. That's quite interesting in and of itself, but we can do better. we can do better by asking about joint outcomes of experiments. Then we can continue, we can carry on dividing the probabilities in the appropriate way, but we can always strike out half the possibilities of each question on average. Given that's the case, we know there seems to be NHX typical sequences. If we can strike out half of each question, then the minimum average number of questions is just going to be NHX. Now, the reason I don't like this as an interpretation of a sharing information is that we've been given no reason why the minimum average number of questions need to specify sequence should be related to the notion of information.

22:30 Again, the temptation here is to say, well, it's because there's something irreducibly meaningful about the number of questions, you know, yes, no questions here. And it's because the irreduced system meaning gives us the link to information. But that's, I've already said, that's not the right way to think about it. These concepts of information have nothing to do with meaning in that sense. So it seems that we, in order to make sense of this interpretation, we're going to say, well, it's just another way of talking about how much we can pass a message. And we're back to the interpretation from the Moises coding theorem. or it's simply another way, the minimum average number of questions needed to specify a sequence is just another way of characterising how much we stand to gain from learning a typical sequence and we're back to interpretation in terms of an uncertainty. So that's the Shaman information. And then go on now to Bruckner and Zeiger's arguments against the Shaman information. Now clearly this quantity is going to be well-defined whenever we have a well-defined probability distribution. So Brooklyn and Dione are going to argue that although in fact this quantity doesn't correspond to the concept of information in the constant case, the way they do this is, well the position we have in mind is where we have a whole sequence of systems all prepared in the given state psi, where psi may not be an eigenstate observer when we're measuring. Of course, the probability of distribution is just given by a normal trace rule. The intercapital PI of the operators corresponding to measurement outcomes that could be factors if we did projective measurements or POV elements, if we wish. We may be thinking about projective measurements here. In this sort of scenario, printable residing suggests that Shannon information has no meaning as it lacks an operational definition in terms of the number of binary questions needed to specify the sequence of outcome. And we have a couple of quotations. The non-existence of well-defined bit values prior to and independent of observation suggests that the Shannon measure, as defined by the number of binary questions needed to determine the particular observed sequence, noughts and ones, becomes problematic and

25:00 even untenable in defining our uncertainties given before measurements have performed. They go on. No definite outcomes exist before measurements have performed, and therefore the number of different possible sequences of outcomes does not characterise uncertainty about the individual system, the formation is performed. But you should be very worried by those statements. If you just recall the basic ideas in the interpretation of the Shannon information, the whole point, the whole way you get compression and everything, is that if you're given a long run of experiments, then we know that it's one of the typical sequences that will be instantiated. Given the probability distribution, we can say what the typical sequences are, and hence the number of bits NHP needs to specify them. And that's quite independent of whether we actually have any concrete sequence or not, because we know that all sequences that will be produced, or may be produced, require the same number of bits to specify them, because they must always be one of the typical sequences. So the fact that we don't actually have a typical sequence sitting there in front of us makes Any sequence that will reproduce must be one of the sequences, it doesn't matter which one. Here they say a number of different possible sequences doesn't characterise uncertainty. But I was just giving you an earlier explanation of how uncertainty gave rise to, could be measured or related to just counting the number of sequences, the number of typical sequences that we produce. So it does seem that the number of different possible sequences precisely tells us how ignorant we are. And that's one of the ways in which we can relate uncertainty in information and just count the number of typical sequences. Finally, they want to distinguish between uncertainty before and after the experiment is performed. But that distinction isn't going to help them at all is a function of probability distribution and that's perfectly well defined for the experiment. So, their argument hasn't succeeded. It doesn't look as if pre-existing bits are necessary for the interpretation of the Shannon information. Now, the reason they make the claim is that they seem to think that the existence of a concrete stream of value

27:30 is a necessary and sufficient condition for the interpretation of the Shannon information as a mirror of information. I've just been pointing out in regards to the way we actually interpret the Shannon information that the existence of a concrete string is not a necessary condition at all. Another reason to point out that it's not a necessary condition is that we have the other two interpretations, the two prior interpretations of the Shannon information, neither of which require the existence of a pre-existing string of Ania. It's not a necessary condition, it's just a concrete string. And in fact there's not a sufficient condition either. Imagine we're all faced with a sequence of n values. On its own that's going to be no good to us. We won't need to know that that sequence is perhaps a typical sequence. We need to know that the relative frequencies of each of the outcomes in that sequence is representative of the probabilities of the outcome in the ensemble drawn, because otherwise there's no sense in which you could have any compression going on. The compression arises from the statistical nature of the source from which the sequence is nationally drawn. And just the sequence on its own, if we don't make any probabilistic assumptions, isn't going to allow us any compression. So whether we have a classical case or a quantum case, we need to make the same assumption. The probability distribution by the learning in advance will derive from observed relative frequencies, correctly describe the probabilities with a different possible outcome. So a string of values on its own isn't a sufficient condition for interpretation of the channel information. And then just, they reiterate their points in a later paper. They say, we require that the information game be directly based on the observed probabilities, but not, for example, the precise sequence of individual outcomes on which Shannon's measure of information is based. But I find it's false that it's a necessary condition. Yeah. Shannon's information isn't necessarily based on size and individual outcomes, and Shannon's information already is and must be based on the observed probability because the scale of the data is not sufficient.

30:00 sorry I don't want to hold you up but in that slide I'm not quite sure that it's clear enough about the distinction between a single concrete string and the set of all of them. It does seem to me that the way you all three of your interpretations spoke about Shannon which I endorse, of course, you did need the set of all the strings to get weaving and it was necessary and sufficient really to get weaving that you could think of that. But you're right, it could be a string that was all post-facto measurement outcomes without pre-existing possess values prior to measurement. Is that fair? I mean, what they're all about, they don't talk about the pre-existing, and that's what's wrong with them. Well, I mean, the point is here that I'm just saying you're right that you do the ensemble. Because when you say it's not sufficient, you're precisely focusing on a single string. And all hands would agree a single string was ruled in an interpretation. Unless you assume that the relative frequencies of outcomes of the string are representing what's in the ensemble. And so the point being that they're trying to say that when they use their concept of information They say, look, aren't we good? We basically observe probability. We require the information going to delete and basically observe probability. But, of course, that's necessary. It's necessary for everyone. And we couldn't interpret this random information without making a probabilistic assumption. So, what's important is the probability distribution. and that's available in the quantum case, this is not just in the classical case. So, the argument about pre-existing strings looks like a complete failure. So let's move on to the argument about proofing action. Bruton's argument suggests there's a problem with the computing axiom, which is the famous third axiom of Shannon. In Shannon's original presentation, he introduces the requirements

32:30 which is just as natural for any measure of information. The quantity h should be continuous in the probability. And for equitable events, age should be a monotonically increasing function of the number of outcomes. The third requirement is the key one, the key one, securing the neatness of the form of the Shannon information. And he states that if a choice is broken down into successive choices, the original age should be awaited by some of the individual values of age. Now that's not a screamingly intuitive thing, it's not actually that obvious what that means or what, it's a natural requirement. And it's normally illustrated like this. What do we mean by breaking down a choice into two successive choices? The picture is something like this. probability distribution like this. We'd rather consider giving these values to the outcomes directly. We can consider first having a choice, a 50-50 choice, then a second, two-thirds, one-third choice conditional, and the second of the first two outcomes carrying. And this is an equivalent way of presenting the probability distribution. The challenge third requirement uncertainties, if we've been these two experiments to be related like this, we have the uncertainty of the overall choice is equal to the uncertainty of the first stage of the choice, the 50-50 choice, plus the uncertainty of the second choice, the conditional choice, weighted in probability of the current. And that requirement is imposed with us choosing the weights on the branches when you group, as in the second diagram. It's got to be a sum like that, whatever we choose. I mean you just happen to choose half-half and then two-thirds. The requirement is just that we end up with the same averted, finally, as we do, as we did to begin with. So we could greet the outcomes in any way we wish.

35:00 And then the equivalent of the axiom would apply. The axiom was made more precise, given a more precise number of expression by value, sort of a recursive axiom. We see the primary n greater than equal to 2 depends on the number of possible outcomes. If we group the two of them together, such as the probability here is given by the sum of these two. Then the uncertainty in the, in the first case which equals the uncertainty in the new probability distribution here. That's the weight of traditional uncertainty. And that allows us to derive the, to derive the form of the Shannon information. You just need that axiom in continuity, in fact, because this grouping axiom includes Sherman's second axiom as a second case. Again, just to try and explain to me a little bit better, this grouping axiom is quite a good name. This is sort of more intuitively, is that we have a bunch of outcomes here. we can consider grouping the outcomes into composite events, A and B, say, whose probabilities will be given by some of the component events belonging to the composite event. And then given those events, we can then specify the probabilities of the outcomes, the original outcomes X1 to X to K, conditional on event A happened, incident A happened, and the outcomes X to K plus 1 to XM, conditional on event B happened, and specifying the probability of A and probability of B at least conditional probability is precisely equivalent to specifying the original probability distribution. And the meaning of the grouping axiom is just at saying that uncertainty about which outcome will occur could be equal to uncertainty about which group it would belong to,

37:30 plus the expected value of the uncertainty which would be made if it's made if it's made if it's made if it's made if it's made if it's made if it's made if it's made if it's made if it's made if it's made. That's a bit easier to understand. So in a particular case of expression of the value axiom that I gave, we could consider we had n plus 1 outcomes, a1, 10-1, b1, b2. The probability is p1, bn plus 1, q1, q2. You group two of them together, you group b1, b2 together, b1, b2 being just joint events. Then the probability for this The next grouped event would just be Q1 plus Q2. The conditional ones would be Q1 from P1, Q2 from P1. And if I do form the grouping actions of saying the uncertainty unique for the outcomes A1 to B2 is equal to the uncertainty for A1 to A1 plus the uncertainty for B1 to B2 conditional on A1 weighted by the primitive A1 should occur. That's just what that equation is saying. Now, Bruckner and Zeiling are actually on the job to slightly different interpretation of the grouping axiom. They think it concerns joint experiments. We have an experiment A with the outcomes A1 to AN, and the probability is B1 to PN. And another experiment B, two outcomes, B1, B2. For the joint experiment, A and B, the AN event is equal to the union of the two disjoint events, AN, B1, and AN, B2. And we assign the probabilities like this. And then the probability of the AN event is again Q1 plus Q2 is equal to PN. And then there forms a grouping axiom that's similar but slightly different. We have joint events here. This is a joint experiment. The joint events here. And this is the A experiment on its own. And then we have some conditional, looks like a conditional B experiment. Then if we, if we imagine there actually, we look at the whole, here we've just, if we carry on applying there,

40:00 we think this form of grouping axiom will end up with this one. we imagine as well as having AN being given by AN, B1, AN, B2, we give A1 as A1, B1, meaning A1 and B2, and we'll end up with this expression here. Now, these expressions worry Britain's irony, because they say these look like there are classical assumptions sneaking in here. And it looks like we're assuming that attributes corresponding to all measurements, all possible measurements, will be assigned simultaneously. So here, we're assuming we've got probability for the outcome of A, and the outcome of B, and for joint outcomes. And we know that's not going to be the case in general in quantum mechanics. And let's say also what we're expressing here is that measurements can be made ideally non-disturbing. For instance in here, imagine you could do an joint experiment, we could do experiment A followed by the experiment B, and we see doing the first experiment doesn't disturb the B values, So we can update our information, assuming there's no disturbance, which is what this equation expresses. Then they're going to say, because these are classical assumptions, we're going to have inefficability of the grouping axiom in the quantum gate. Since we have one comedian observable, the probabilities on the left-hand side of the axiom aren't going to be defined. And then they say, this makes sense that we've got classical the sham information simply not justified as a measure of uncertainty, because it essentially involves a classical assumption. So to say, only for the special case of commuting, i.e. simultaneously definite observables, is the sham measure applicable and the use of the sham information justified in defining uncertainty given before quantum measurement is performed. But, again, there's a number of problems with this argument. First of all, a group of axiom figures are an axiomatic derivation, a uniqueness derivation

42:30 for the form of Shannon information. The failure of uniqueness doesn't imply that Shannon information isn't a measure of uncertainty at all, it just says it's perhaps not a unique measure of uncertainty. and so their conclusion here that the use of the shown information isn't justified to define uncertainty wouldn't be justified by the simple fact that the grouping actually wasn't before remember that we already know that the shown measure is a unique measure of uncertainty because there's a whole cast of which it's a member I'm sorry, I couldn't say just that The other numbers of that class don't favor grouping now. And there are reasons to deny grouping in certain classical situations. Well, it's... Says U or Ufink or Renly Y or T. Well, Yost says it, and I agree with him. It's in fact, grouping axioms are highly unnatural constraint really. It's appealing mathematically because of its sort of recursive character. It can lead to infinities where the players don't want them. the continuous case from it, because you effectively need to get to the relative entropy, and you can't get to the relative entropy from the grouping axiom, but you can get to it from Josse's axiom, and the grouping axiom seems to be naturally justified, it seems you're already assuming an interpretation of uncertainty in terms of the minimum average number of questions needing to be asked. why uncertainty should be characterised in that way, even if we may agree that information should be, if we want to agree that information should be. So, I mean, the grouping action is not actually that hot, it's just the one that Shannon came up with, and, say, first figure in the first bunch with axiomatic derivations. I mean, that being said, I think people would you'd still be interested in the fact that it didn't apply in the quantum case. It just made no sense at all. But I'm actually going to say that, in fact, it's not the case that it doesn't make sense, because the Bruchman-Zionier form of grouping action is not in fact a permanent, a standard form which I gave you earlier.

45:00 And that can apply in both classical and the quantum case. In Bruchman-Zionier's notation, the first grouping action that we had, Here we have n plus 1 outcomes, a to n minus 1, b1, b2, and we group b1, b2 together. We have the uncertainty in this probability distribution, b8, b1 minus 1, probability b1 or b2. And then we have the conditional distribution, the outcome b1 given that b1 or b2 occurs, probability of B2 given that B1 or B2 occurred, and weighted by a probability of the air current. And that's quite a different thing from the Britten and Zioner case, because of all the objects, it's got a diagram. so this is the over here we have the standard form of the grouping axiom we group the two events together the event AN is defined as a union of these two events and events B1 and B2 cannot occur without AN occurring in the joint experiment scenario We have joint experiments, we have these. This is the first experiment, then we do the second one. on Britain's eye by doing a coarse graining, and then you're recording B values for the A outcome AN. So these two are different cases, this is a single, this is applied to a single probability distribution, this is applied to a joint probability distribution, and the two are mathematically distinct. The joint experiment formalism can't express the case in which the B outcomes don't occur unless AN occurs. because then we have required that the probability of the AN outcome is given by the sum of the probabilities of B1, B2 then the marginal distribution for B outcomes doesn't sum to unity in general as is required for well-defined giant probability distribution

47:30 So, what the resilience form of the grouping axiom simply isn't an expression of the grouping axiom. It's really what it is, is the application of the grouping axiom to an already well defined joint probability distribution. So it's no, the lack of certain probability distribution in one mechanics doesn't mean there's any problem for the grouping axiom. The grouping axiom can still be meaningfully applied. In fact, consider this first experiment here and represent the outcomes there by one-dimensional projectors, say. They will be orthogonal. This outcome here that we group it is just going to be a sum of two orthogonal one-dimensional projectors. And that just commutes with all the other projectors. So there's no problem with non-commutativity here if we have the correct, the standard form of the grouping action. And you can do the same with the POVM here rather than commutativity have co-distants. But you might think, well isn't there still some intuitive argument remaining? We might agree that the grouping action can still be applied, but perhaps there's something fishy going on, so it can only be applied to a single distribution. So I would like to come back to me and say, well, look, the Shannon information is still on the continent because it cannot tell us about the uncertainty of the full set of observables. It cannot tell us about uncertainty in mutually commuting observables, but they're not simply If we remember that the major uncertainty is a measure of the spread of probability distribution, then we can't have a spread of probability distribution at the end of the probability distribution. So there's no problem for the shallow information, it's just that they're trying to get the shallow information to do a job. but it couldn't possibly do mainly be a measure of uncertainty for non-existent probability distribution, it just doesn't make sense.

50:00 Now we already know that a function of a joint distribution can't tell us about uncertainty in general in quantum mechanics, simply because we know that joint distributions don't exist for all possible measurements. And that's why we go to all the trouble of introducing measures of mixedness, such as the borderline entropy, minus trace rho log rho, which are functions of the state rather than property distribution. So it looks as if the intuitive argument that seems to be there was simply that we want the shown information to be a measure of mixedness, we want it to somehow tell us about our uncertainty in general when we know the state. But then that's a a confused demand because that's not the job that the sham information does and tells us about uncertainty for a given probability distribution. If you want to know about uncertainty in general and use a measure of mixedness, look at the function of the state. Right. I'll move on to the third argument. This one proceeds by comparing the firm information on the table to their preferred measure, which is given by this company here, we have the sum, the sum of squares of the probability of the outcomes with a certain normalisation factor in there. Now they say this quantity IP, it's the Brooklyn and Zioner preferred information measure has a nice property, namely that if we add up such measures for a complete set of mutually unbiased measurements then we end up with a unitary invariant quantity. I said briefly what are mutually unbiased measurements were before. If we have the sets of projected DMQ associated with the particular measurements, then if they satisfy the measurements of mutually unbiased. The reason they're talking about complete sets of mutually unbiased measurements is because an unknown state can be completely determined by looking at a complete set, by doing a complete set in each end by its measurements. And then they say, well, as an analogy of determining the state of a classical system, if we learn the position of a quantum phase space, then we acquire the amount of information associated with that

52:30 given by the probability distribution over phase space. In the quantum case, you can't determine the states by single measurement. So what we do is we do a complete set which determines its position in the state. And so that's what the total information content is. Sorry Chris, I'm sorry to say that, but I'm a little confused about the green at the top now. in that when we first heard this phrase mutually unbiased and you gave the spin example it's the kind of formula we had like the trace formula seemed to have a product of two projectors in there well actually we saw a Keknabra multiply but one was therefore labelling the state in which we prepared and we were performing only one measurement, as it might be, we prepared spin Z up, and we measured spin X. So, I mean, maybe it doesn't matter, but maybe you could just say it more slowly with, so to speak, the Qs being the eigenstates we prepare, the Ps being what we measure, the unknown state row being specified, therefore, by a complete set of one-dimensional projectors, Is that what you're saying? Is that the green there and the black that falls there? Sorry, I haven't said I forgot to say about a complete set of unbiased measurements. Yeah, because we talked about a complete set of variables being mutually commuting. This is just a minus and unbiased measurement. These two sets of projectors are associated with observables which are mutually unbiased with respect to one another. And that's the definition, the green is the definition of it. Yeah. And they're both of them being measured, rather than one of them being prepared and the other... Well, one could be prepared. Well, one is prepared and the other one is measured. But then we can think about doing... We have an ensemble of similarly prepared systems. You know, I have said this as well. We have an ensemble of similarly prepared systems. We divide up into sub-ensembles. One in each sub-ensemble, we do a measurement of a... One of these... Either the p's or the q's. you can find n plus 1 sets of projectors that have this property, and that's called a complete set of projectors,

55:00 n being the dimension of the Hilbert space. So the spins are the parallel example. You have three orthogonal components of spins, the orthogonal space directions. Er, ok, well this is all neat for me, but I mean it does seem a bit odd to then say that a state in the normal sense is determined by three spin components. OK, normally we say it's determined by, you know, a complete set of variables, but you're now saying, no, you can get one of these. And that's quite general in any finite dimension. You can get rho out, plus the probabilities for them, I take it. Well, that's what does the work for you. But I just wanted to introduce it schematically to start off with and say that there's something special about a complete set of unbiased measurements that allows us to determine the state and say that they think we should be able to add up the amount of information gained from each type of measurement to give us the total information. And it turns out that their quantity has this nice property that if we add up n plus 1 information measures, then we end up with this quantity which is unitarially invariable, frankly, in a function of the density matrix. So this leads them to impose the total information constraint, which is that in order to be a satisfactory measure of information, you must, in fact, be very quantitative to interpret this information content to complete setting the equilibrium bias measurement. Now, this is a good example of the Shannon information, as the Shannon information won't satisfy this. So the worry is that then the the amount of the total information you've gained, if this is supposed to be the total information you've gained, made up by a sum of individual measures of information, but an equation like this won't hold to the Schoen information,

57:30 then the Schoen information is somehow failing to tell us how much information, how much the total information you've gained in measurement. And it also seems to be a bit of a doubt that they suggest that the von Weyman entropy because, unlike their preferred information, total information measure, by tot, the von Weyman entropy doesn't have this sort of relation to an individual measure of information. So it's simply a measure of mixness, they suggest, rather than being able to do with information. Now, I want to remark that their preferred measure, IP, the sum of squares, probability distributions, is different from, the first difference from the Shannon information is that it's a measure of concentration or probability distribution rather than being a measure of spread. So, in that sense, it's different from the Shannon information. Another difference, of course, is that because you've got squares in there and no logs, it's not going to be an additive information measure. So that's perhaps the first of the warning that this equation doesn't have a straightforward interpretation and information-theoretic term. I'd like to give you a sense that why would they want, why do they think that it's good that the information gained in a measurement is irrespective of which measurement you do it? Because surely that's what happens if you've got this human therapy in there, isn't that potentially going to mean that which, basically, whichever basis or whichever type of measurement you're trying to do? No, it's because these are the ones that depend on the measurement you do, and you perform n plus 1 of them, and you have a set of n plus 1 of that measurement you do, although the new actual values that you add up here might be different, so depending on the basis we measure, we get a different value amount of information gained. Still, if we choose a set of n plus 1, and if you have biased measurements, then we're always going to get the sum to the same quantity. What then is the problem? Because it doesn't seem to be pretending to be the same job that the Shannon information does.

1:00:00 So, is this a false opposition that they're setting up, because they're saying, you know, the channeling information is not good because this is better, but the channeling information is not required to do this job, it doesn't do that. Yeah, that's essentially it. That's basically what it was. Good. I'll stop that. You asked what you asked. That's essentially it. I was trying to make it sound as convincing as possible. The other remark I was going to make is that information content isn't a local concept. There are many different things you might mean by information content. We might mean just the measure of mixedness, the more mixed state, the less we know about what the outcome of measurements in general performance will be. We want to talk about the specification information, if we imagine encoding classical bits into quantum systems, The sequence of quantum systems and the amount of information necessary to specify that sequence will be measured by the Shannon information of the original classical source, and that would be greater than equal to the monotropy. Again, that's an information, that's a quantum information concepts, talking about the amount encoded for a bunch of quantum systems, well the amount specified for a bunch of quantum systems. And we also have how much is encoded and the email bound tells us that the amount of the maximum interaction encoded of quantum systems is bound above by the one-man metric. So those are all the different perfect set of information content that are different from Fruittman's item's notion. So you can immediately see that there's something wrong with the total information constraint.

1:02:30 It looks like it's too strong. So it may seem to just not be reasonable to acquire the total information content or information content in individual measures of information, such as IP or Shannon information. have the relation that's expressed in this information constraint. Even further, we might say, well actually, hang on, why should we actually expect a sum like this to, or sum like that, to sum to a unitarily invariant quantity for a compute-setting nutrient bias measurement? Britain has already asserted that the information measure should have this property. Chris, can I ask, you said it was a measure of the, IRO was a measure of concentration, not spread that trace of rho minus identity over n all squared didn't obviously have a single distribution because no quantity or observable had been chosen but it also seems very crudely to me to be something that smaller the closer being the suitably normalised identity? No, is it smaller or larger? It gets larger than the further I can get from the maximum mixed state. So for a pure state it's equal to minus one. So for all pure states it's large and the same? Yes. It's a measure of purity rather than being a measure of mixtures. Yes, right. OK. It's just a function of the state, much more there's no observable in the... So perhaps I-row isn't really a measure of the concentration of probability distribution, it's much more a measure of purity. Well, sorry, this should be IP rather than I-row. Oh, I'm sorry, you've written IP actually, it's got an arrow on top of it, in the black below.

1:05:00 It has to do with probability distribution. It turns out that certain combinations of probability distributions end up giving us something about the state. There's a key way of representing the states of density of place as a generalisation of the block-sphere representation. in which we define an inner product by the trace, that's called the Hilbert-Schmidt and the norm is given by square root of the trace of the opposite squared, the Hilbert-Schmidt norm Are you sure there shouldn't be traced adjunctly in the universe? I don't think so. Well, they're a Hermitian. Yeah. I see so. Yeah. Yeah. And to grasp what's going on here, every time you see a trace, think of a scalar product. We're talking about the angles that vectors make with each other. Right, density matrices, of course. The density matrix of our own dimensional system is owned by a complex emission matrix. It also has a couple more constraints on it. Namely, it's got to have trace 1, normalisation, and it's got to be positive. These two constraints allow us to gain a picture of the geometry of density matrices, or the vectors representing density matrices in this space. From the fact that some of the eigenvalues have to add to one, and from the fact that all of the eigenvalues have to be positive, It shows that the square of any eigenvalue could be less than or equal to the eigenvalue itself. We're going to stick in a sum that shows that the sum of the square of the eigenvalue is going to be less than or equal to 1. This of course is just trace rho squared. It's going to be less than or equal to 1. See a trace, think of a product, a scalar product of 2 vector.

1:07:30 You've got the equation of the sphere. So in the normalization constraint of positivity, it follows the vector representing the identity matrix as a lie on or within the surface of the sphere in this n-squared dimensional space. The trace condition gives us another condition. Trace rho is equal to 1. Trace rho times identity is equal to 1. put up between vectors. This is the equation of a plane, maybe this plane. That's a plane of distance we want to reach out to the origin. So, our vectors representing density matrices must lie all in this green dotted region here. Of course, we have to make care of these diagrams because they're a bit low-dimensional from 3 in this frame. I've tried to draw this a bit better in case that was a very bad circumference. Right, so this green area here is a form of extremal points of the convex set of density matrices. Now, this is a nice simple vector space, easy to understand. Let's introduce some basis operations. We're going to want N squared B independent condition operations, might be useful to have them more formal. And then we can represent the state row in this fashion. We've chosen as one of our basis operators the identity of root N, and we multiply it by one on root N, and that takes care of the normalisation condition. Again, this is telling us something about projection of the state into these little vectors UI. It's also, of course, expectation value of the state mode in the expectation of the operator UI in the state mode. So we could determine the density matrix just by finding a suitable set of linear independent emission operators and doing appropriate experiments giving the expectation values. It's more interesting if you look at how much we can get out of doing individual measurements,

1:10:00 to a maximum observable, more degenerate observable basis projectors, PI, post movement state that's given by a sum, the probabilities of the outcome of the perfect projector, in our space, this is telling us, these are specifying points within the plane spanned by these projectors, So the projectors associated with the observable, they of course also live in this space. They're at any of them, they span the subspace like this. And the probabilities associated with each of the outcomes specify the point in this plane, which is effectively the projection of the state row onto that hyperplane. If we include it as we have the identity in the basis set, each maximum observable is actually only going to give us n minus 1 linear independent operators, because the nth is always given by the identity minus sum to the other one. So you can get at most n-1 linear-independent observables from linear-independent operators from a given maximum observable, and the expectation values specify the projection of the state into the hyperplane defined by those ejectors. which was extremely clever and so if we actually choose those choose mutually unbiased measurements then the hyperplanes associated with different mutually unbiased measurements would be mutually orthogonal in the space in which density ventures are constrained to lie in virtue of a trace condition. So if we can just consider constrain our attention to this space here, then in this space mutually unbiased measurements will give us orthogonal subspaces within that space. So if we can find n plus 1 mutually unbiased measurements, n plus 1 times n minus 1 gives us n squared minus 1, it's going to

1:12:30 the t hyperplay, the n squared minus one dimensional hyperplay, including identity we've got n squared independent operators. So we can determine the state completely by doing n plus one mutually unbiased measurements. And then we write down the state in this form here. Again this is just the generalization of the book that's a representation these these P bars here is the projection of the projector into the T hyperplane and the QI J just given the expectation values so they're determined just by the experimental outcome. A fixed J then, one of these sets of projective projectors spans a full-on subspace. A full-onality is nice and it tells us the length squared of any vector expressed in this form when we've just chosen these mutually unbiased projections as our basis operators. The length squared expressed in this form will just be given by the sum of the Qij squared. Now, lo and behold, this is just the IP, Fruittmann-Zioning's information measure. So, why does Fruittmann-Zioning's information measure a sum, even a fairly invariant quantity, is because it's telling us the length squared of the components of the density matrix in the hyperplane that we're doing a measurement in. And if we add up all those components, the complete set of mutually unbiased measurements, then we just get the original length squared of the whole density matrix. So, because the Bruchman-Zeilingen measure is a measure of, well, squared length, that it satisfies the total information constraint, but now we've seen why IP satisfies the total information constraint,

1:15:00 we can see very obviously why there's no problem that the Shannon information doesn't. It's just the simple fact that the Shannon information isn't such a measure of length. It doesn't tell us the measure of the component of the density matrix lying in the height plane defined by the measurement you're doing in it. But that doesn't stop you being a measure of information at all. It just means you're not a measure of length. Sorry, how is this J. Fit A paper that you've cited? That's... Right, because I thought he did his paper. Well, it was 1981. Yeah. Oh, what's that one? That's a lie, it's not as much as you want. Just trying to see who's awake. Oh dear, we've got masses to go here. Well, since Claire's actually already explained what's holding the rest of the pointers basically, I might skip the rest of it. I'll open the rest of this and go straight on to the... to... foundation. But I just want to say... I explain how I taught us to be interpreted as a measure of mixedness and explain that essentially as measures of inflammation, the Shannon inflammation, there is precisely the same relation to the LLNHP as that might seem a bit opaque. So, and the primary of that is that the fact that the shown information doesn't substitute material variant quantity is not a problem for it or for it as measures of information. To include material information in Australia is unreasonable. It's not necessary that every meaningful measure of information could sum up a unit or a variant quantity that defeats that from different rights movements, nor is it necessary that every viable nation and total information content be given by such a self-right on the foundational principles. These arguments were arisen because Britain was arguing on to get something out of this. An elementary system represents the truth value of one proposition, carries one bit of information, those two are supposed to be equivalent, I think they are for the intensive purposes. Now, in the paper when she presents this foundational principle,

1:17:30 Zeininger seems to sort of propose an interesting, well not an interesting, a sort of formal idealism. It starts with this non-secretary argument. Physical description of the world is expressed in propositions. Any physical object can be described by a set of true propositions. These propositions are arrived at by observation and verified by future observation. Therefore, an object is a useful construct for connecting observations. We've just been talking about the basis on which propositions are asserted rather than what they mean. You have knowledge of an object only through observation, thus any concept from existing reality has to be based on observation. Not really quite clear what that means anyway, but if you read it in the right way, it looks like a form of idealism. Dick Henry says, reality is not simply subjective human construct, there's also a subjective agreement. So it just looks like sort of Berkian idealism, God makes everyone agree with what exists here in subjective agreement. Then he's an instrumentalist about physics partly and explicitly about the quantum state. This is important. The initial state represents all our information as obtained by your observation. The time-involved state is just a short-hand way of representing the outcomes of all possible future observations. You often come across those statements, statements like that. I think Brutton and Zeiling are unique in actually relating them to a sort of geometric representation of density matrices I was talking about earlier. They're not sure to what extent they understand that that's what they're doing. because, let's say, we describe a photon by a catalog of information vector about mutually complementary propositions. Such propositions are, for example, a polarization of photons vertical or horizontal. To me, that's basically that's as if they're trying to talk about the Hilbert's representation where we have probabilities, time projectors, times projectors. Here they have, well, functions of

1:20:00 probabilities, time projectors. Here they have functions of probabilities, times projectors. So when they say the state is information about the measurement, the information is what's contained in that function of those probability distributions. Just then to explain what we mean by a system carrying or representing some information. That a system represents the truth value of proposition or that it carries on with information only implies a statement concerning what we said about possible measurement results. So the state is only an amount of information about mutual complementary observables. We will have partial information about a number of observables. We can either imagine all the information being encoded in one observable say its spin-up and the vector points is fully along the spin-up vector projectors. Or if we rotate the axes in our n-square space, then we'll have shorter vectors along different observables. But if they want the state to contain a certain amount of information, then the amounts of partial information need to add up to one. The state contains a certain amount of information, that comes from the Prolation Principle. If they want to give that only teeth consistent with this idea of what it is for the state to be information, It's a force to say that the Shannon information can't be the information natural, it can't be the information contained in this state, because the Shannon information won't have this property, the partial information must add up to one. So, again, quickly in case that's not clear, partial information is supposed to be the amount of information associated with particular type of measurement. The state consists of those amounts of partial information associated with certain propositions, in fact, mutually unbiased propositions.

1:22:30 So, the amount of information obtained in a state has to add up to 1 from the foundation principle, so the sum of partial information has to add up to 1. So, if the partial information is given by just the function that appears, the function of probability distribution that appears, given Hilbert-Schmidt representation of the density matrices, then we can't have the Shannon information. So that's why they try and find problems with the Shannon information. They think they've found a more fundamental way of finding information in the state. But to be consistent with the foundational principle, the idea that there's only one bed contained in it, it means they've got to somehow get rid of the Shannon information. So they generate two spurious arguments against sexually clearly false and then this other one which doesn't tell to do with type of information constraint. You've all been very patient. I have something a bit more realist about the content state described as situation. Well, the translation of the foundational principle, the motor elementary system represents the truth value of one proposition only, into a bit more substantial about the state, is this. Any projected measurement other than the eigenbasis of rho results in a shorter vector, i.e. more spread distribution, i.e. we no longer have a 1D projector, by a 1-D projector. So it's not the answer to a yes-no question. One-dimensional projectors are, of course, just an experimental question. So that's the realist foundation principle. I've actually explained what's going on, I think, most clearly. Does that mean, though, that the instrumentalist about the constant state should think that the Brooklyn-design information measure is the correct measure of information, then? I think the realist But perhaps the instrumentalists should say that, well, there is something more fundamental than the Brooklyn and Zeilinger constitute, because it is telling us about the information in the state. But given the explanation, why the explanation I haven't given you of the relationship, how

1:25:00 similar, the Brooklyn and Zeilinger measure and the Shannon information measure R as functions of, as measures of spreading concentration and probability distribution, it's It's going to be, the idea that the Brutman-Sahling measure must be the correct one is clearly looking artificial. The two have slightly different mathematical properties, interestingly different mathematical properties, but as measures of information they have sufficient properties similar in the way they function based on the more fundamental property of the way different majorisation relations between probability distribution. that saying that IP must be the correct way in terms of evidence must be really artificial. But you can still be instrumentalist in that to say that one is just saying this information about the ability of term distributions, but it's not just sort of real as foundational as being more experimental than what's going on with designing this foundational principle. It's just that you don't have this particular idea of how the state is related to the information you have about measurement. So when you're saying artificiality because of the earlier relation, you mean it would be artificial to choose IP over HP? That they should both be given, you know, they both obviously have merit of articulating That's essentially it. And basically that translation I gave with the foundational principle is not clear about what's going on, rather than having to talk about partial information encoded, where you say, well, how are you measuring information, and the way you measure the information in the system generate problems for how you understand the state, just to say, well, look, you're just talking about the fact that if you project into a measurement line, you get a shorter vector, that's what's going on, that's what the foundation is. principle means. If you can bear another few minutes, this is the last two slides. But has the foundation principle got any merits as a foundational principle? I said right at the beginning it looked like it probably hasn't because it doesn't seem

1:27:30 to be able to distinguish a classical case from a quantum case. It's interesting to look at how they get at how Zionghe arrives at the foundation principle. the information content of elementary systems we now decompose a system which may be represented by numerous propositions into constituent systems. It is natural to assume that each constituent system will be represented by a few propositions. Well that's just not the case. Hadnard always found it harder and harder to describe smaller and smaller systems and then to use more and more propositions to describe it. He's assuming, already assuming a certain language within which we're describing the system. We need to just store the distinction between describing a system within a certain set of propositions, a certain set of concepts, and encoding information in it. What do you mean when we found it hard to describe the response systems? Well, classical is a hard job to get to quantum physics and classical physics. Well, sure, but I mean it's not harder to describe a DNA molecule than just describe Yeah, but that's really when you, I'm just wanting to draw the distinction between, well, the point I want you to make is that the difficulty of description depends on the conceptual resources available in the language. And so if you have a certain language, then you can say, well, okay, relative to that description, this description is longer than that description. Some concepts are more simple and you need fewer of them to describe these systems. it's purely conventional matter that the British Library is more complicated than the quote-un. I wouldn't say it's conventional, I just said that it depends on the conceptual resources of the language. Conceptual resources isn't a conventional matter. Chris, I think you've got a valid point there, but I would read the black quotation as also making a valid point, which I think David is saying without the jargon, but in the jargon it would be composite systems ipso facto have more degrees of freedom than one of their components. that's more that's what the black is getting at but so he's trying to say that an elementary system contains only one well he's trying to get something substantial answer but the only point I actually wanted to draw here is the difference between describing a system and coding information and it's plausible

1:30:00 there's a certain smaller system into which information can be encoded and that could be an empirically available fact but the length of description of the system is just dependent on the conceptual resources you use to describe it so simply from saying that yes you can if you have a few degrees of freedom that you need to describe a system, you can describe a system or something. And that's already, you know, the whole language there is already within one, within one certain conceptual framework. Yeah, isn't it that if you still, say you choose a language, the one in which the number of edges of a jigsaw puzzle is going to be a degree of freedom or a proposition or whatever, piece and say they compose a big square thing plane whatever each piece is going to have more of those edges it's going to need more description to individualize each piece than just the plain square when they're all put together if you're just looking at the outer edges of whatever you're making so they are wrong in the black thing if you really like that but then david generally wants concerned about where there's some more fundamental level description and so on but so they would come back and say would you throw away the which you can see all lines inside the button and show it's simple of the future the point here is that they've already assumed a certain language where they're describing insectally proposition here means one dimensional projector question. And then that's a problem. This is where they get to the elementary system, the foundation principle. How far can we carry on going? It's obviously limited reach for an individual system. Represents the truth value of one position in it. Such a system we call an elementary system. So he's already chosen a set of concepts in which we're describing the system. The reason I'm saying that this is a problem is when we actually try and get characterised to do anything. For instance, can we use it to explain randomness, as Zainé suggests it can? The randomness in the quantum case is supposed to be irreducible since an

1:32:30 elementary system cannot carry enough information to provide definite answers to all questions that could be asked experimentally. We might civilise that. We say, OK, well, the information specifying one proposition we say what's the proposition that's a one-dimensional project so the question in fact is why do they actually exist non-equivalent measurement in that case why does the why does the proposition that we specified not rule out any other possible experimental outcome so the question because they've already assumed that the propositions are quantum it doesn't look as if the foundation principle is going to be able to do anything for you because what needs to be explained is why are the propositions with which we represent the state why are those one-dimensional projectors in a complex separable hillwood space why does there exist when we specify the state completely the possibility of doing non-equivalent measurements to which we don't have an answer determined Just saying that the system doesn't carry enough information, i.e. it's represented by a single projector, a single vector state, doesn't explain why there's randomness, it just says that there is randomness. We're just saying there's no element in the formalism which corresponds to a determinist outcome of all measurement. That's precisely what we're saying is the proposition with which to describe a system as a one-dimensional objective. Great. Can I make a point that Harvey would say if he were him? This is a Harvey Brown point, right? There is a spell wrote down this model for spin-half, hidden variable. So, even when you go down to qubits, the obvious candidate for elementary system, in the spin-half case, in fact, you do get the possibility of having a definite answer to any question that could be asked experimentally. So, I agree with what you said, that it's just an assertion that there is randomness, it's not an explanation of why,

1:35:00 but when we in fact get down to the spin-half case, you can't any longer say this randomness in the sense that there is this way of modelling. Well, yeah. Thank you.