As requested by @j_bertolotti, a thread on covariance/contravariance as found in linear algebra & differential geometry.

Thing is, these have nothing to do with vectors or geometry, but rather a fundamental property of functions.

(1/n)
Say we have two sets X, Y. We think of elements of X as some kind of objects, while the elements of Y provide some representation of these objects. The exact correspondence between an object and its representation is given by a function R: X→Y.

(2/n)
Now, suppose we want to change the representation from R₁ to R₂. Given y₁∈Y a representation of some object x∈X, how do we figure out the new representation? From y₁ = R₁(x) we get x = R₁⁻¹(y₁) (the inverse of R₁), and the new representation is R₂(R₁⁻¹(y₁)).

(3/n)
That is, the function T: Y→Y defined as T = R₂∘R₁⁻¹ implements the change of representation from R₁ to R₂. Note that its inverse T⁻¹ = R₁∘R₂⁻¹ changes the representation back from R₂ to R₁.

(4/n)
Now we are interested in functions f: X→Z. What would be a representation of this function? A function g: Y→Z would probably do. To construct this representation from R we do g(y) = f(R⁻¹(y)).

(5/n)
That is, if the representation for objects is given by x ↦ R(x), then the representation for functions is given by f ↦ f∘R⁻¹. Note that evaluating the function f(x) doesn't depend on representation g(y) = (f∘R⁻¹)(R(x)) = f(R⁻¹(R(x))) = f((R⁻¹∘R)(x)) = f(x).

(6/n)
Now let's see happens if we change representation from R₁ to R₂: we go from g₁ = f∘R₁⁻¹ to g₂ = f∘R₂⁻¹, which can be computed as g₂ = g₁∘(R₁∘R₂⁻¹) = g₁∘T⁻¹.

(7/n)
So, when changing the representation for objects, we use some transformation T, while the corresponding change of representation for functions is implemented by the inverse T⁻¹. The preceding discussion hides the fundamental duality between objects and functions, though.

(8/n)
To restore the duality in full, we should've used functions x: {⦁}→X instead of elements x∈X (where {⦁} is any one-element set). This way, the representation for an object is y: {⦁}→Y, and the change of representation is y₂ = T∘y₁.

(9/n)
Let's repeat the crucial observation once more: for objects, we act by T on the left; for functions, we act by T⁻¹ on the right.

(10/n)
What happens in linear algebra is just an algebraization of this, although messed up a little for historical reasons. Here, X = V is an abstract vector space, and Y = Fⁿ is the space of column-vectors, which serve as a representation for V.

(11/n)
F is the base field of scalars (say, real numbers). A representation R: V → Fⁿ maps a vector to a column of coordinates in some specific basis. A change of representation corresponds to a choice of basis.

(12/n)
We already know that the change of coordinates is given by T = R₂R₁⁻¹ (which is now a matrix!). Let's see how does the basis change!

There are n vectors {e₁, ..., eₙ} in Fⁿ with 1 in one coordinate and 0 in all the others. They form the canonical basis of Fⁿ.

(13/n)
For a representation R: V → Fⁿ, the corresponding basis in V is {R⁻¹(e₁), ..., R⁻¹(eₙ)}. To see this, apply the representation to a basis vector: R(R⁻¹(eᵢ)) = eᵢ. So, the representation of R⁻¹(eᵢ) is all zeros except a 1 in the i-th coordinate, as expected.

(14/n)
When changing representation (i.e. changing a basis), we go from R₁⁻¹(eᵢ) to R₂⁻¹(eᵢ), which means multiplication by R₂⁻¹R₁, that is, T⁻¹. Again: if the coordinates change by applying T, the basis vectors change by applying T⁻¹.

(15/n)
For linear functionals (elements of the dual space V*) we get the same as just for functions: their coordinates transform through T⁻¹. The basis of V*, however, transforms through T (for the same reasons why vectors and basis of V transform differently).

(16/n)
In linear algebra, if the coordinates of an object change in the same way as the basis of the main vector space, this object is called covariant. If the change is represented by an inverse matrix, the object is called contravariant.

(17/n)
We've seen that usual vectors are contravariant: their coordinates change by the inverse of how the basis changes. Linear functionals are covariant: their coordinates change using the same matrix.

(18/n)
Note that this is dependent on what we call the "main" vector space: choose V* instead of V, and suddenly elements of V* have contravariant coordinates and elements of V have covariant coordinates.

(19/n)
The whole discussion can be distilled to the following identity: f∙x = (f∙T⁻¹)∙(T∙x), which means again that if you want to change the object by T, you should change the function by T⁻¹ to preserve the value of f(x).

(20/n)
All this might have felt like an unnecessarily abstract exposition, so let's get down to earth and talk about a simple & intuitive way of thinking about co/contravariance: units of measure.

(21/n)
As before, let V be a vector space, but this time we think of its vectors specifically representing length, with coordinates that are not just numbers, but meters. What should be the units of coordinates of linear functionals?

(22/n)
The value of a functional on a vector is a scalar f(x), and scalars are dimensionless, i.e. they don't depend on any choice of units, e.g. like π and unlike electron charge. Expressed in coordinates, f(x) = sum fᵢ∙xᵢ.

(23/n)
If xᵢ are meters, and f(x) is a dimensionless scalar, we are forced to accept that fᵢ have units 1/meter, that is meter⁻¹. Now, an analogue of basis change is changing the scale of a unit of measure (this actually is change of basis in the 1-dimensional space).

(24/n)
Say we go from meters to centimeters. All vectors' coordinates are multiplied by 100. What happens with linear functionals is that their coordinates get divided by 100 instead, so that the value of f(x) remains unchanged, with their unit now being 1/centimeter.

(25/25)
You can follow @lisyarus.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: