entropy - Ignorance in statistical mechanics

Consider this penny on my desc. It is a particular piece of metal, well described by statistical mechanics, which assigns to it a state, namely the density matrix $\rho_0=\frac{1}{Z}e^{-\beta H}$ (in the simplest model). This is an operator in a space of functions depending on the coordinates of a huge number $N$ of particles.

The ignorance interpretation of statistical mechanics, the orthodoxy to which all introductions to statistical mechanics pay lipservice, claims that the density matrix is a description of ignorance, and that the true description should be one in terms of a wave function; any pure state consistent with the density matrix should produce the same macroscopic result.

Howewer, it would be very surprising if Nature would change its behavior depending on how much we ignore. Thus the talk about ignorance must have an objective formalizable basis independent of anyones particular ignorant behavior.

On the other hand, statistical mechanics always works exclusively with the density matrix (except in the very beginning where it is motivated). Nowhere (except there) one makes any use of the assumption that the density matrix expresses ignorance. Thus it seems to me that the whole concept of ignorance is spurious, a relic of the early days of statistical mechanics.

Thus I'd like to invite the defenders of orthodoxy to answer the following questions:

(i) Can the claim be checked experimentally that the density matrix (a canonical ensemble, say, which correctly describes a macroscopic system in equilibrium) describes ignorance? - If yes, how, and whose ignorance? - If not, why is this ignorance interpretation assumed though nothing at all depends on it?

(ii) In a though experiment, suppose Alice and Bob have different amounts of ignorance about a system. Thus Alice's knowledge amounts to a density matrix $\rho_A$ , whereas Bob's knowledge amounts to a density matrix $\rho_B$ . Given $\rho_A$ and $\rho_B$ , how can one check in principle whether Bob's description is consistent with that of Alice?

(iii) How does one decide whether a pure state $\psi$ is adequately represented by a statistical mechanics state $\rho_0$ ? In terms of (ii), assume that Alice knows the true state of the system (according to the ignorance interpretation of statistical mechanics a pure state $\psi$ , corresponding to $\rho_A=\psi\psi^*$ ), whereas Bob only knows the statistical mechanics description, $\rho_B=\rho_0$ .

Presumably, there should be a kind of quantitative measure $M(\rho_A,\rho_B)\ge 0$ that vanishes when $\rho_A=\rho_B)$ and tells how compatible the two descriptions are. Otherwise, what can it mean that two descriptions are consistent? However, the mathematically natural candidate, the relative entropy (= Kullback-Leibler divergence) $M(\rho_A,\rho_B)$ , the trace of $\rho_A\log\frac{\rho_A}{\rho_B}$ , [edit: I corrected a sign mistake pointed out in the discussion below] does not work. Indeed, in the situation (iii), $M(\rho_A,\rho_B)$ equals the expectation of $\beta H+\log Z$ in the pure state; this is minimal in the ground state of the Hamiltonian. But this would say that the ground state would be most consistent with the density matrix of any temperature, an unacceptable condition.

Edit: After reading the paper http://bayes.wustl.edu/etj/articles/gibbs.paradox.pdf by E.T. Jaynes pointed to in the discussion below, I can make more precise the query in (iii): In the terminology of p.5 there, the density matrix $\rho_0$ represents a macrostate, while each wave function $\psi$ represents a microstate. The question is then: When may (or may not) a microstate $\psi$ be regarded as a macrostate $\rho_0$ without affecting the predictability of the macroscopic observations? In the above case, how do I compute the temperature of the macrostate corresponding to a particular microstate $\psi$ so that the macroscopic behavior is the same - if it is, and which criterion allows me to decide whether (given $\psi$ ) this approximation is reasonable?

An example where it is not reasonable to regard $\psi$ as a canonical ensemble is if $\psi$ represents a composite system made of two pieces of the penny at different temperature. Clearly no canonical ensemble can describe this situation macroscopically correct. Thus the criterion sought must be able to decide between a state representing such a composite system and the state of a penny of uniform temperature, and in the latter case, must give a recipe how to assign a temperature to $\psi$ , namely the temperature that nature allows me to measure.

The temperature of my penny is determined by Nature, hence must be determined by a microstate that claims to be a complete description of the penny.

I have never seen a discussion of such an identification criterion, although they are essential if one want to give the idea - underlying the ignorance interpretation - that a completely specified quantum state must be a pure state.

Part of the discussion on this is now at: http://chat.stackexchange.com/rooms/2712/discussion-between-arnold-neumaier-and-nathaniel

Edit (March 11, 2012): I accepted Nathaniel's answer as satisfying under the given circumstances, though he forgot to mention a fouth possibility that I prefer; namely that the complete knowledge about a quantum system is in fact described by a density matrix, so that microstates are arbitrary density matrces and a macrostate is simply a density matrix of a special form by which an arbitrary microstate (density matrix) can be well approximated when only macroscopic consequences are of interest. These special density matrices have the form $\rho=e^{-S/k_B}$ with a simple operator $S$ - in the equilibrium case a linear combination of 1, $H$ (and vaiious number operators $N_j$ if conserved), defining the canonical or grand canonical ensemble. This is consistent with all of statistical mechanics, and has the advantage of simplicity and completeness, compared to the ignorance interpretation, which needs the additional qualitative concept of ignorance and with it all sorts of questions that are too imprecise or too difficult to answer.

Answer

I wouldn't say the ignorance interpretation is a relic of the early days of statistical mechanics. It was first proposed by Edwin Jaynes in 1957 (see http://bayes.wustl.edu/etj/node1.html, papers 9 and 10, and also number 36 for a more detailed version of the argument) and proved controversial up until fairly recently. (Jaynes argued that the ignorance interpretation was implicit in the work of Gibbs, but Gibbs himself never spelt it out.) Until recently, most authors preferred an interpretation in which (for a classical system at least) the probabilities in statistical mechanics represented the fraction of time the system spends in each state, rather than the probability of it being in a particular state at the present time. This old interpretation makes it impossible to reason about transient behaviour using statistical mechanics, and this is ultimately what makes switching to the ignorance interpretation useful.

In response to your numbered points:

(i) I'll answer the "whose ignorance?" part first. The answer to this is "an experimenter with access to macroscopic measuring instruments that can measure, for example, pressure and temperature, but cannot determine the precise microscopic state of the system." If you knew precisely the underlying wavefunction of the system (together with the complete wavefunction of all the particles in the heat bath if there is one, along with the Hamiltonian for the combined system) then there would be no need to use statistical mechanics at all, because you could simply integrate the Schrödinger equation instead. The ignorance interpretation of statistical mechanics does not claim that Nature changes her behaviour depending on our ignorance; rather, it claims that statistical mechanics is a tool that is only useful in those cases where we have some ignorance about the underlying state or its time evolution. Given this, it doesn't really make sense to ask whether the ignorance interpretation can be confirmed experimentally.

(ii) I guess this depends on what you mean by "consistent with." If two people have different knowledge about a system then there's no reason in principle that they should agree on their predictions about its future behaviour. However, I can see one way in which to approach this question. I don't know how to express it in terms of density matrices (quantum mechanics isn't really my thing), so let's switch to a classical system. Alice and Bob both express their knowledge about the system as a probability density function over $x$ , the set of possible states of the system (i.e. the vector of positions and velocities of each particle) at some particular time. Now, if there is no value of $x$ for which both Alice and Bob assign a positive probability density then they can be said to be inconsistent, since every state that Alice accepts the system might be in Bob says it is not, and vice versa. If any such value of $x$ does exist then Alice and Bob can both be "correct" in their state of knowledge if the system turns out to be in that particular state. I will continue this idea below.

(iii) Again I don't really know how to convert this into the density matrix formalism, but in the classical version of statistical mechanics, a macroscopic ensemble assigns a probability (or a probability density) to every possible microscopic state, and this is what you use to determine how heavily represented a particular microstate is in a given ensemble. In the density matrix formalism the pure states are analogous to the microscopic states in the classical one. I guess you have to do something with projection operators to get the probability of a particular pure state out of a density matrix (I did learn it once but it was too long ago), and I'm sure the principles are similar in both formalisms.

I agree that the measure you are looking for is $D_\textrm{KL}(A||B) = \sum_i p_A(i) \log \frac{p_A(i)}{p_B(i)}$ . (I guess this is $\mathrm{tr}(\rho_A (\log \rho_A - \log \rho_B))$ in the density matrix case, which looks like what you wrote apart from a change of sign.) In the case where A is a pure state, this just gives $-\log p_B(i)$ , the negative logarithm of the probability that Bob assigns to that particular pure state. In information theory terms, this can be interpreted as the "surprisal" of state $i$ , i.e. the amount of information that must be supplied to Bob in order to convince him that state $i$ is indeed the correct one. If Bob considers state $i$ to be unlikely then he will be very surprised to discover it is the correct one.

If B assigns zero probability to state $i$ then the measure will diverge to infinity, meaning that Bob would take an infinite amount of convincing in order to accept something that he was absolutely certain was false. If A is a mixed state, this will happen as long as A assigns a positive probability to any state to which B assigns zero probability. If A and B are the same then this measure will be 0. Therefore the measure $D_\textrm{KL}(A||B)$ can be seen as a measure of how "incompatible" two states of knowledge are. Since the KL divergence is asymmetric I guess you also have to consider $D_\textrm{KL}(B||A)$ , which is something like the degree of implausibility of B from A's perspective.

I'm aware that I've skipped over some things, as there was quite a lot to write and I don't have much time to do it. I'll be happy to expand it if any of it is unclear.

Edit (in reply to the edit at the end of the question): The answer to the question "When may (or may not) a microstate $\phi$ be regarded as a macrostate $\rho_0$ without affecting the predictability of the macroscopic observations?" is "basically never." I will address this is classical mechanics terms because it's easier for me to write in that language. Macrostates are probability distributions over microstates, so the only time a macrostate can behave in the same way as a microstate is if the macrostate happens to be a fully peaked probability distribution (with entropy 0, assigning $p=1$ to one microstate and $p=0$ to the rest), and to remain that way throughout the time evolution.

You write in a comment "if I have a definite penny on my desk with a definite temperature, how can it have several different pure states?" But (at least in Jaynes' version of the MaxEnt interpretation of statistical mechanics), the temperature is not a property of the microstate but of the macrostate. It is the partial differential of the entropy with respect to the internal energy. Essentially what you're doing is (1) finding the macrostate with the maximum (information) entropy compatible with the internal energy being equal to $U$ , then (2) finding the macrostate with the maximum entropy compatible with the internal energy being equal to $U+dU$ , then (3) taking the difference and dividing by $dU$ . When you're talking about microstates instead of macrostates the entropy is always 0 (precisely because you have no ignorance) and so it makes no sense to do this.

Now you might want to say something like "but if my penny does have a definite pure state that I happen to be ignorant of, then surely it would behave in exactly the same way if I did know that pure state." This is true, but if you knew precisely the pure state then you would (in principle) no longer have any need to use temperature in your calculations, because you would (in principle) be able to calculate precisely the fluxes in and out of the penny, and hence you'd be able to give exact answers to the questions that statistical mechanics can only answer statistically.

Of course, you would only be able to calculate the penny's future behaviour over very short time scales, because the penny is in contact with your desk, whose precise quantum state you (presumably) do not know. You will therefore have to replace your pure-state-macrostate of the penny with a mixed one pretty rapidly. The fact that this happens is one reason why you can't in general simply replace the mixed state with a single "most representative" pure state and use the evolution of that pure state to predict the future evolution of the system.

Edit 2: the classical versus quantum cases. (This edit is the result of a long conversation with Arnold Neumaier in chat, linked in the question.)

In most of the above I've been talking about the classical case, in which a microstate is something like a big vector containing the positions and velocities of every particle, and a macrostate is simply a probability distribution over a set of possible microstates. Systems are conceived of as having a definite microstate, but the practicalities of macroscopic measurements mean that for all but the simplest systems we cannot know what it is, and hence we model it statistically.

In this classical case, Jaynes' arguments are (to my mind) pretty much unassailable: if we lived in a classical world, we would have no practical way to know precisely the position and velocity of every particle in a system like a penny on a desk, and so we would need some kind of calculus to allow us to make predictions about the system's behaviour in spite of our ignorance. When one examines what an optimal such calculus would look like, one arrives precisely at the mathematical framework of statistical mechanics (Boltzmann distributions and all the rest). By considering how one's ignorance about a system can change over time one arrives at results that (it seems to me at least) would be impossible to state, let alone derive, in the traditional frequentist interpretation. The fluctuation theorem is an example of such a result.

In a classical world there would be no reason in principle why we couldn't know the precise microstate of a penny (along with that of anything it's in contact with). The only reasons for not knowing it are practical ones. If we could overcome such issues then we could predict the microstate's time-evolution precisely. Such predictions could be made without reference to concepts such as entropy and temperature. In Jaynes' view at least, these are purely macroscopic concepts and don't strictly have meaning on the microscopic level. The temperature of your penny is determined both by Nature and by what you are able to measure about Nature (which depends on the equipment you have available). If you could measure the (classical) microstate in enough detail then you would be able to see which particles had the highest velocities and thus be able to extract work via a Maxwell's demon type of apparatus. Effectively you would be partitioning the penny into two subsystems, one containing the high-energy particles and one containing the lower-energy ones; these two systems would effectively have different temperatures.

My feeling is that all of this should carry over on to the quantum level without difficulty, and indeed Jaynes presented much of his work in terms of the density matrix rather than classical probability distributions. However there is a large and (I think it's fair to say) unresolved subtlety involved in the quantum case, which is the question of what really counts as a microstate for a quantum system.

One possibility is to say that the microstate of a quantum system is a pure state. This has a certain amount of appeal: pure states evolve deterministically like classical microstates, and the density matrix can be derived by considering probability distributions over pure states. However the problem with this is distinguishability: some information is lost when going from a probability distribution over pure states to a density matrix. For example, there is no experimentally distinguishable difference between the mixed states $\frac{1}{2}(\mid \uparrow \rangle \langle \uparrow \mid + \mid \downarrow \rangle \langle \downarrow \mid)$ and $\frac{1}{2}(\mid \leftarrow \rangle \langle \leftarrow \mid + \mid \rightarrow \rangle \langle \rightarrow \mid)$ for a spin- $\frac{1}{2}$ system. If one considers the microstate of a quantum system to be a pure state then one is committed to saying there is a difference between these two states, it's just that it's impossible to measure. This is a philosophically difficult position to maintain, as it's open to being attacked with Occam's razor.

However, this is not the only possibility. Another possibility is to say that even pure quantum states represent our ignorance about some underlying, deeper level of physical reality. If one is willing to sacrifice locality then one can arrive at such a view by interpreting quantum states in terms of a non-local hidden variable theory.

Another possibility is to say that the probabilities one obtains from the density matrix do not represent our ignorance about any underlying microstate at all, but instead they represent our ignorance about the results of future measurements we might make on the system.

I'm not sure which of these possibilities I prefer. The point is just that on the philosophical level the ignorance interpretation is trickier in the quantum case than in the classical one. But in practical terms it makes very little difference - the results derived from the much clearer classical case can almost always be re-stated in terms of the density matrix with very little modification.

Blog

Tuesday, December 29, 2015

entropy - Ignorance in statistical mechanics

No comments:

Post a Comment

classical mechanics - Moment of a force about a given axis (Torque) - Scalar or vectorial?