thermodynamics - How do you prove $S=-sum pln p$?

Tuesday, July 22, 2014

thermodynamics - How do you prove $S=-sum pln p$ ?

How does one prove the formula for entropy $S=-\sum p\ln p$ ? Obviously systems on the microscopic level are fully determined by the microscopic equations of motion. So if you want to introduce a law on top of that, you have to prove consistency, i.e. entropy cannot be a postulate. I can imagine that it is derived from probability theory for general system. Do you know such a line?

Once you have such a reasoning, what are the assumptions to it? Can these assumptions be invalid for special systems? Would these system not obey thermodynamics, statistical mechanics and not have any sort of temperature no matter how general?

If thermodynamics/stat.mech. are completely general, how would you apply them the system where one point particle orbits another?

Answer

The theorem is called the noiseless coding theorem, and it is often proven in clunky ways in information theory books. The point of the theorem is to calculate the minimum number of bits per variable you need to encode the values of N identical random variables chosen from $1...K$ whose probabilities of having a value $i$ between $1$ and $K$ is $p_i$ . The minimum number of bits you need on average per variable in the large N limit is defined to be the information in the random variable. It is the minimum number of bits of information per variable you need to record in a computer so as to remember the values of the N copies with perfect fidelity.

If the variables are uniformly distributed, the answer is obvious: there are $K^N$ possiblities for N throws, and $2^{CN}$ possiblities for $CN$ bits, so $C=\log_2(k)$ for large N. Any less than CN bits, and you will not be able to encode the values of the random variables, because they are all equally likely. Any more than this, you will have extra room. This is the information in a uniform random variable.

For a general distribution, you can get the answer with a little bit of law of large numbers. If you have many copies of the random variable, the sum of the probabilities is equal to 1,

$P(n_1, n_2, ... , n_k) = \prod_{j=1}^N p_{n_j}$

This probability is dominated for large N by those configurations where the number of values of type i is equal to $Np_i$ , since this is the mean number of the type i's. So that the P value on any typical configuration is:

$P(n_1,...,n_k) = \prod_{i=1}^k p_i^{Np_i} = e^{N\sum p_i \log(p_i)}$

So for those possibilities where the probability is not extremely small, the probability is more or less constant and equal to the above value. The total number M(N) of these not-exceedingly unlikely possibilities is what is required to make the sum of probabilities equal to 1.

$M(N) \propto e^{ - N \sum p_i \log(p_i)}$

To encode which of the M(N) possiblities is realized in each N picks, you therefore need a number of bits B(N) which is enough to encode all these possibilities:

$2^{B(N)} \propto e^{ - N \sum p_i \log(p_i)}$

which means that

${B(N)\over N} = - \sum p_i \log_2(p_i)$

And all subleading constants are washed out by the large N limit. This is the information, and the asymptotic equality above is the Shannon noiseless coding theorem. To make it rigorous, all you need are some careful bounds on the large number estimates.

Replica coincidences

There is another interpretation of the Shannon entropy in terms of coincidences which is interesting. Consider the probability that you pick two values of the random variable, and you get the same value twice:

$P_2 = \sum p_i^2$

This is clearly an estimate of how many different values there are to select from. If you ask what is the probability that you get the same value k-times in k-throws, it is

$P_k = \sum p_i p_i^{k-1}$

If you ask, what is the probability of a coincidence after $k=1+\epsilon$ throws, you get the Shannon entropy. This is like the replica trick, so I think it is good to keep in mind.

Entropy from information

To recover statistical mechanics from the Shannon information, you are given:

the values of the macroscopic conserved quantities (or their thermodynamic conjugates), energy, momentum, angular momentum, charge, and particle number

the macroscopic constraints (or their thermodynaic conjugates) volume, positions of macroscopic objects, etc.

Then the statistical distribution of the microscopic configuration is the maximum entropy distribution (as little information known to you as possible) on phase space satisfying the constraint that the quantities match the macroscopic quantities.

Blog

Tuesday, July 22, 2014