Motivation for tensor product in Physics

This question is about a mathematical object (the tensor product) but thinking about the motivation that comes from Physics. Algebraists motivate the tensor product like that: "given $k$ vector spaces $V_1,\dots,V_k$ over the same field $\Bbb K$ we want to find a new space $S$ and a universal multilinear map $T$ such that for every vector space $W$ and multilinear mapping $g : V_1\times\cdots\times V_k\to W$ we have a linear map $f : S\to W$ such that $g = f\circ T$".

Then, they prove this thing exists by constructing it. They take the free vector space $\mathcal{M}=F(V_1\times\cdots\times V_k)$ and consider the subspace $\mathcal{M}_0$ spanned by all elements of the form

$$(v_1,\dots,v_i'+av_i'',\dots,v_k)-(v_1,\dots,v_i',\dots,v_k)-a(v_1,\dots,v_i'',\dots,v_k),$$

and define $S=\mathcal{M}/\mathcal{M_0}$ denoting $S=V_1\otimes\cdots\otimes V_k$ and define $T(v_1,\dots,v_k)=(v_1,\dots,v_k)+\mathcal{M}_0$ and denote this by $T(v_1,\dots,v_k)=v_1\otimes\cdots\otimes v_k$.

That's fine, but tensors appear a lot in Physics. In General Relativity, in Electrodynamics, in Classical Mechanics, in Quantum Mechanics, etc. So, if someone asked me: "what's the motivation for that definition of tensor prodct" and I wished to motivate it through Physics, what should be the motivation?

How would I convince myself that the tensor product as defined like that is useful in Physics?

I know that one can defined tensors as multilinear maps, and that is far mor intuitive, however I'm interest to see how one would motivate this definition.

Answer

It is essentially impossible to answer the general question of "how does multilinearity come up naturally in physics?" because of the myriad of possible examples that make up the total answer. Instead, let me describe a situation that very loudly cries out for the use of tensor products of two vectors.

Consider the problem of conservation of momentum for a continuous distribution of electric charge and current, which interacts with an electromagnetic field, under the action of no other external force. I will describe it more or less along the lines of Jackson (Classical Electrodynamics, 3^rd edition, §6.7) but depart from it towards the end. This will get very electromagneticky for a while, so if you want to skip to the tensors, you can go straight to equation (1).

The rate of change of the total mechanical momentum of the system is the total Lorentz force, given by $$ \frac{ d\mathbf{P}_\rm{mech}}{dt} =\int_V(\rho\mathbf{E}+\mathbf{J}\times \mathbf{B})d\mathbf{x}. $$ To simplify this, one can take $\rho$ and $\mathbf{J}$ from Maxwell's equations: $$ \rho=\epsilon_0\nabla\cdot\mathbf{E} \ \ \ \text{ and }\ \ \ \mathbf{J}=\frac1{\mu_0}\nabla\times \mathbf{B}-\epsilon_0\frac{\partial \mathbf{E}}{\partial t}.$$ (In particular, this means that what follows is only valid "on shell": momentum is only conserved if the equations of motion are obeyed. Of course!)

One can then put these expressions back, to a nice vector calculus work-out, and come up with the following relation: $$ \begin{align}{} \frac{ d\mathbf{P}_\rm{mech}}{dt} +&\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x} \\ &= \epsilon_0\int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) + c^2 \mathbf{B} (\nabla \cdot \mathbf{B})- c^2 \mathbf{B} \times (\nabla \times \mathbf{B}) \right]d\mathbf{x}. \end{align} $$

The integral on the left-hand side can be identified as the total electromagnetic momentum, and differs from the integral of the Poynting vector by a factor of $1/c^2$. To get this in the proper form for a conservation law, though, such as the one for energy in this setting, $$ \frac{dE_\rm{mech}}{dt} +\frac{d}{dt}\frac{\epsilon_0}{2}\int_V(\mathbf{E}^2 +c^2\mathbf{B}^2)d\mathbf{x} = -\oint_S \mathbf{S}\cdot d\mathbf{a}, $$ we need to reduce the huge, ugly volume integral into a surface integral.

The way to do this, is, of course, the divergence theorem. However, that theorem is for scalars, and what we have so far is a vector equation. To work further then, we need to (at least temporarily) work in some specific basis $\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3\}$, and write $\mathbf{E}=\sum_i E_i \mathbf{e}_i$. Let's work with the electric field term first; after that the results also apply to the magnetic term. Thus, to start with, $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} = \sum_i \mathbf{e}_i \int_V \left[ E_i(\nabla\cdot \mathbf{E})-\mathbf{e}_i\cdot\left(\mathbf{E} \times(\nabla \times \mathbf{E})\right) \right]d\mathbf{x}. \end{align} $$ These terms should be simplified using the vector calculus identities $$ E_i(\nabla\cdot \mathbf{E}) = \nabla\cdot\left(E_i \mathbf{E}\right) - \mathbf{E}\cdot \nabla E_1 $$ and $$ \mathbf{E} \times(\nabla \times \mathbf{E}) = \frac12\nabla(\mathbf{E}\cdot\mathbf{E})-(\mathbf{E}\cdot\nabla)\mathbf{E}, $$ which mean that the whole combination can be simplified as $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} = \sum_i \mathbf{e}_i \int_V \left[ \nabla\cdot\left(E_i \mathbf{E}\right) - \mathbf{e}_i\cdot\left( \frac12\nabla(\mathbf{E}\cdot\mathbf{E}) \right) \right]d\mathbf{x}, \end{align} $$ since the terms in $\mathbf{E}\cdot \nabla E_i$ and $\mathbf{e}_i\cdot\left( (\mathbf{E}\cdot\nabla)\mathbf{E}\right)$ cancel. This means we can write the whole integrand as the divergence of some vector field, and use the divergence theorem: $$ \begin{align}{} \int_V \left[ \mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) \right]d\mathbf{x} &= \sum_i \mathbf{e}_i \int_V \nabla\cdot\left[ E_i \mathbf{E} - \frac12 \mathbf{e}_i E^2 \right]d\mathbf{x} \\ & = \sum_i \mathbf{e}_i \oint_S\left[ E_i \mathbf{E} - \frac12 \mathbf{e}_i E^2 \right]\cdot d\mathbf{a}. \tag 1 \end{align} $$

In terms of conservation law structure, we're essentially done, as we've reduced the rate of change of momentum to a surface term. However, it is crying out for some simplification. In particular, this expression is basis-dependent, but it is so close to being basis independent that it's worth a closer look.

The first term, for instance, is simply crying out for a simplification that would look something like $$ \sum_i \mathbf{e}_i \oint_S E_i \mathbf{E}\cdot d\mathbf{a} = \oint_S \mathbf{E}\, \mathbf{E}\cdot d\mathbf{a} $$ if we could only make sense of an object like $\mathbf{E}\, \mathbf{E}$. Even better, if we could make sense of such a combination, then it turns out that the seemingly basis-dependent combination that would come up in the second term, $\sum_i \mathbf{e}_i\,\mathbf{e}_i$, turns out to be basis independent: one can prove that for any two orthonormal bases $\{\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3\}$ and $\{\mathbf{e}_1', \mathbf{e}_2', \mathbf{e}_3'\}$, those combinations are the same: $$ \sum_i \mathbf{e}_i\,\mathbf{e}_i = \sum_i \mathbf{e}_i'\,\mathbf{e}_i' $$ as long as the product $\mathbf{u}\,\mathbf{v}$ of two vectors, whatever it ends up being, is linear on each component, which is definitely a reasonable assumption.

So what, then, should this new vector multiplication be? One key to realizing what we really need is noticing the fact that we haven't yet assigned any real physical meaning to the combination $\mathbf{E}\,\mathbf{E}$; instead, we're only ever interacting with it by dotting "one of the vectors of the product" with the surface area element $d\mathbf{a}$, and that leaves a vector $\mathbf{E}\,\mathbf{E}\cdot d\mathbf{a}$ which we can integrate to get a vector, and that requires no new structure.

Let's then write a list of how we want this new product to behave. To keep things clear, let's give it some fancy new symbol like $\otimes$, mostly to avoid unseemly combinations like $\mathbf{u}\,\mathbf{v}$. We want then,

a function $\otimes:V\times V\to W$, which takes euclidean vectors in $V=\mathbb R^3$ into some vector space $W$ in which we'll keep our fancy new objects.

Combinations of the form $\mathbf{u}\otimes \mathbf{v}$ should be linear in both $\mathbf{u}$ and $\mathbf{v}$.

For all vectors $w$ in $V$, and all combinations $(\mathbf{u},\mathbf{v})\in V\times V$, we want the combination $(\mathbf{u}\otimes \mathbf{v})\cdot\mathbf{w}$ to be a vector in $V$. Even more, we want that to be the vector $(\mathbf{v}\cdot\mathbf{w})\mathbf{u}\in V$.

That last one looks actually pretty strong, but there's evidently room for improvement. For one, it depends on the euclidean structure, which is not actually necessary: we can make an equivalent statement that uses the vector space's dual.

For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f\in V^\ast$, we want $f_\to(\mathbf{u}\otimes \mathbf{v})=f(\mathbf{v})\mathbf{u}\in V$ to hold, where $f_\to$ simply means that $f$ acts on the factor on the right.

Finally, if we're doing stuff with the dual, we can reformulate that in a slightly prettier way. Since two vectors $\mathbf{u},\mathbf{v}\in V$ are equal if and only if $f(\mathbf{u})=f(\mathbf{v})$ for all $f\in V^\ast$, we can give another equivalent statement of the same statement:

For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f,g\in V^\ast$, we want $g_\leftarrow f_\to(\mathbf{u}\otimes \mathbf{v})=g(\mathbf{u})f(\mathbf{v})\in V$.

[Note, here, that this last rephrasing isn't really that fancy. Essentially, it is saying that the vector equation (1) is really to be interpreted as a component-by-component equality, and that's not really off the mark of how we actually do things.]

I could keep going, but it's clear that this requirement can be rephrased into the universal property of the tensor product, and that rephrasing is a job for the mathematicians. Thus, you can see the story like this: Upon hitting equation (1), we give to the mathematicians this list of requirements. They go off, think for a bit, and come back telling us that such a structure does exist (i.e. there exist rigorous constructions that obey those requirements) and that it is essentially unique, in the sense that multiple such constructions are possible, but they are canonically isomorphic. For a physicist, what that means is that it's OK to write down objects like $\mathbf{u}\otimes \mathbf{v}$ as long as one does keep within the rules of the game.

As far as electromagnetism goes, this means that we can write our conservation law in the form $$ \frac{ d\mathbf{P}_\rm{mech}}{dt} +\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x} = \oint_A \mathcal T\cdot d\mathbf{a} $$ where $$ \mathcal T = \epsilon_0\left[ \mathbf{E}\otimes\mathbf{E}+c^2\mathbf{B}\otimes\mathbf{B} -\frac12\sum_i\mathbf{e}_i\otimes\mathbf{e}_i\left(E^2+c^2 B^2\right) \right] $$ is, of course, the Maxwell stress tensor.

I could go on and on about this, but I think this really captures the essence of how and where it happens in physics that a situation is really begging the use of a tensor product. There are other such situations, of course, but this is the clearest one I know.

Blog

Saturday, August 10, 2019

Motivation for tensor product in Physics

No comments:

Post a Comment

classical mechanics - Moment of a force about a given axis (Torque) - Scalar or vectorial?