Wednesday, May 2, 2018

mathematical physics - Is there a Bayesian theory of deterministic signal? Prequel and motivation for my previous question


This is a prequel to my question:


What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?


Clearly my question looks at the same time fairly elementary but completely unexpected, even crazy, out of nowhere. That's probably the reason why I did not get any answer nor comment after more than 200 views, only one upvote (thanks!).


Therefore I'd like to explain what's the motivation behind it and why it might be an important and fundamental question.


Indeed, the starting point is to acknowledge that the theory of signal we know well is in fact the theory of random signal, as developed by Shannon and many others, but that the/a theory of deterministic signal might not be so well-known. Even, it is not yet clear, at least to myself, whether such theory exists or not!


To see it, let's consider a basic, practical problem in signal processing taken from J. D. Scargle, Bayesian estimation of time series lags and structure:


http://arxiv.org/abs/math/0111127



Consider the problem of estimating the time delay between two discrete-time signals corrupted by additive noise.


Hence we record some samples from a first signal ${X_m}$ corrupted by additive noise ${B_X}$


${X_m} = {S_m} + {B_X}$


where ${S_m}$ is the theoretical signal and a second, time-delayed signal ${Y_m}$ also corrupted by additive noise ${B_Y}$


${Y_m} = {S_{m - \tau }} + {B_Y}$


and we want to estimate the time delay $\tau $ between both theoretical signals.


Let $D$ be our experimental data. Assuming both noises to be zero-mean Gaussian with standard deviations ${\sigma ^X}$ and ${\sigma ^Y}$, Bayes rule writes


$p\left( {\left. {\tau ,{S_m},{\sigma ^X},{\sigma ^Y}} \right|D} \right) \propto p\left( {\tau ,{S_m},{\sigma ^X},{\sigma ^Y}} \right)p\left( {\left. D \right|\tau ,{S_m},{\sigma ^X},{\sigma ^Y}} \right)$


so that $\tau $ has marginal posterior probability distribution


$p\left( {\left. \tau \right|D} \right) = \int\limits_{{S_m}} {\int\limits_{{\sigma ^X}} {\int\limits_{{\sigma ^Y}} {p\left( {\left. {\tau ,{S_m},{\sigma ^X},{\sigma ^Y}} \right|D} \right){\text{d}}{\sigma ^X}{\text{d}}{\sigma ^Y}{{\text{d}}^M}{S_m}} } } $



from which we can get standard Bayesian estimators such as the maximum a posteriori estimator (MAP) since $\tau $ is discrete.


So it remains to assign the prior probability distribution


$p\left( {\tau ,{S_m},{\sigma ^X},{\sigma ^Y}} \right) = p\left( \tau \right)p\left( {{S_m}} \right)p\left( {{\sigma ^X}} \right)p\left( {{\sigma ^Y}} \right)$


in particular the prior probability distribution for the samples of the theoretical signal


$p\left( {{S_m}} \right)$


Assigning for instance with Scargle a prior uniform distribution on ${\mathbb{R}^M}$ or on some interval ${\left[ {{S_0},{S_1}} \right]^M}$ (eq. 26), we finally prove (probabilistically) that the classical cross-correlation function


${\gamma _{X,Y}}\left( \tau \right) = \sum\limits_{m = 1}^M {{x_m}{y_{m + \tau }}} $ (eq. 70)


is a sufficient statistics for the problem of estimating the time delay/lag between two random signals corrupted by Gaussian noises.


Now, consider the very same problem but with a deterministic theoretical signal $S\left( m \right)$ instead of a random one.


To my mind, a sufficient statistics for this second problem is not expected, before proceeding to its calculation, to be the usual cross-correlation function because the cross-correlation or the covariance between two samples from two deterministic signals and, a fortiori, the cross-correlation function or the cross-covariance function between those samples do not make much sense for deterministic signals for a simple and good reason I believe.



Indeed, the cross-correlation ${\gamma _{X,Y}}\left( 0 \right)$ of two samples ${x_1},{x_2},...,{x_M}$ and ${y_1},{y_2},...,{y_M}$


${\gamma _{X,Y}}\left( 0 \right) = \sum\limits_{m = 1}^M {{x_m}{y_m}}$


is, by definition, invariant by permutation of the times points/indices $m$: for any permutation $\sigma$ over $\left\{ {1,2,...,M} \right\}$, we have


$\sum\limits_{m = 1}^M {{x_m}{y_m}} = \sum\limits_{m = 1}^M {{x_{\sigma \left( m \right)}}{y_{\sigma \left( m \right)}}} $


Hence the order of the samples does not matter at all, as expected if they were assumed to be i.i.d. in the frequentist framework or De Finetti-exchangeable in the Bayesian framework, as Scargle did.


But obviously, for deterministic signals the order of the signals' samples does and should matter: they define the chronological order, i.e. the time. And without time/chronological order, no deterministic signal. So, time is not expected to disappear in our statistics (more precisely, for a given delay $\tau $ time disappears in the cross-correlation or covariance since they are invariant under permutation. But then it reappears in the cross-correlation or the covariance functions since they are functions of the time!?).


Hence, for a deterministic signal, the sufficient statistics for our time delay estimation problem, a hypothetical "deterministic cross-correlation function" is expected to be something quite different from the classical cross-correlation function. In particular, it is not expected to be invariant under permutation of the time points for a given delay $\tau $.


Moreover, it is well-known that cross-correlation or covariance (functions) are in general not suitable statistics for quantifying dependencies between (nonlinear) deterministic signals: in some cases they can be completely blind to some nonlinear effects. More suitable statistics do exist (e.g. nonlinear dependencies) but as far as I know they may lack rock-solid theoretical foundations and are not derived from probability theory.


It is worth observing that standard mathematical notations precisely handle the difference between both problems:





  • If the signal is random, then time plays essentially no role. So we have a stochastic process, i.e. a collection of random variables indexed by time that we denote ${S_m}$;




  • If the signal is deterministic, then it is a function of the time that we denote $S\left( m \right)$.




So, consider $M$ evenly sampled samples from a discrete-time real deterministic signal


$ s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right) $


By the standard definition of a discrete-time deterministic dynamical system, there exists:




  • a phase space $\Gamma$, e.g. $\Gamma \subset \mathbb{R} {^d}$

  • an initial condition $ z\left( 1 \right)\in \Gamma $

  • a state-space equation $f:\Gamma \to \Gamma $ such as $z\left( {m + 1} \right) = f\left[ {z\left( m \right)} \right]$

  • an output or observation equation $g:\Gamma \to \mathbb{R}$ such as $s\left( m \right) = g\left[ {z\left( m \right)} \right]$


Hence, by definition we have


$\left[ {s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right)} \right] = \left\{ {g\left( {{z_1}} \right),g\left[ {f\left( {{z_1}} \right)} \right],...,g\left[ {{f^{M - 1}}\left( {{z_1}} \right)} \right]} \right\}$


or, in probabilistic notations


$p\left[ {\left. {s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right)} \right|{z_1},f,g} \right] = \prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {{z_1}} \right)} \right] - s\left( {m} \right)} \right\}} $



Therefore, by total probability and the product rule, the marginal prior probability distribution of $M$ samples from a deterministic signal, should it ever exists, is formally given by


$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \int\limits_{{\mathbb{R}^\Gamma }} {\int\limits_{{\Gamma ^\Gamma }} {\int\limits_\Gamma {{\text{D}}g{\text{D}}f{{\text{d}}^d}{z_1}\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {{z_1}} \right)} \right] - s\left( m \right)} \right\}p\left( {{z_1},f,g} \right)} } } } $


"Of course", should it be unknown a priori, we may also need to marginalize the phase space $\Gamma$ itself! But I told to myself that marginalizing the dynamical system/state-space equation $f$ and the output/observation equation $g$ was enough in a first step!


Should we be able to define and compute this joint prior probability, we could derive, at least in principle, our "deterministic cross-correlation function" for our time delay estimation problem by applying the rules of probability theory.


To sum up,




  • either such marginal prior probability distributions are usual, i.i.d. or exchangeable, joint probability distributions such as Scargle's uniform distribution. In this case, the classical theory of signal would work for both random and deterministic signals;





  • or such marginal prior probability distributions, once computed from the joint prior distribution $p\left( {{z_1},f,g,\Gamma } \right)$, are something quite different from usual joint probability distributions because time still plays an essential role. In this case, there would exist two different theories of signal, one for random signals, which we know well, and another one for deterministic signals waiting to be developed, to the best of my knowledge, if we can ever define and compute those unusual functional integrals.




So, the following questions arise:




  • (Conditionally on phase space $\Gamma$,) Can we define functional probability distributions over the set of all dynamical systems/state-space equations/functions $f$ acting on $\Gamma$? If true, how to integrate over/marginalize them?




  • (Conditionally on phase space $\Gamma$,) Can we define functional probability distributions over the set of all output/observation equations/functions $g$ from $\Gamma$ to e.g. $\mathbb{R}$? If true, how to integrate over/marginalize them?





  • (Conditionally on phase space $\Gamma$ and both previous questions,) Can we compute the marginal prior probability distribution of $M$ samples from a discrete-time deterministic signal from default, basic joint prior probability distributions $p\left( {{z_1},f,g } \right)$ such as the uniform distribution on $\Gamma \times {\Gamma ^\Gamma } \times {\Gamma ^\mathbb{R}}$?




  • Can we define probability distributions over the set of all phase spaces in, say, the set of all Cartesian powers of $\mathbb{R}$?




Already asked on MO:


https://mathoverflow.net/questions/236527/is-there-a-bayesian-theory-of-deterministic-signal-prequel-and-motivation-for-m



but no much success up to now




No comments:

Post a Comment

classical mechanics - Moment of a force about a given axis (Torque) - Scalar or vectorial?

I am studying Statics and saw that: The moment of a force about a given axis (or Torque) is defined by the equation: $M_X = (\vec r \times \...