Vectors in Physics

Notice to the Reader

This article is directed towards first-year undergraduate physics students. It is assumed that the reader is experienced with high school physics. Familiarity with just the concepts in calculus and linear algebra are recommended.
This was originally written as a submission to the 1st Summer of Math Exposition (all blue-colored texts like this are hyperlinks).
This was written by two people.

Introduction

One of the key concepts we learned early on are vectors. We already have an idea of what they are, from the arrows to the component representations in high school. However, those representations of vectors could only take us so far in our journey in studying physics. With this article, it is our hope that we can present a summarized version of our experience with vectors and how they are used in physics.

Vectors are quantities with both magnitude and direction

One of our first introductions to vectors involves its definition as a quantity with both a magnitude and a direction. An arrow is used to represent a vector, with its length corresponding to the magnitude and its direction, well, the direction. Vectors are written with an arrow over it, such as \(\vec{a}\). This is common practice when handwriting vectors. Printed physics texts use boldface letters such as \(\mathbf{a}\) to instead denote vectors.

The simplest definition of a vector

This simple definition provides the groundwork into understanding more advanced ideas in math and physics, since the arrow representation is easy to understand and visualize. One notable example is position, which is a vector describing an object’s location from a chosen point we pick as our origin. The distance between the object and the origin is the vector’s magnitude. Its time derivatives or rates of change, namely velocity and acceleration, are also vectors.

Thinking of vectors as arrows with magnitude and direction is important for our intuition. We understand what arrows represent. We understand that when we move, we have a speed and we have a direction that we move in. It is convenient to represent objects and quantities as vectors When we want to understand how objects move as well as make predictions of where they are going.

Vectors are more than just arrows however; they are also mathematical objects, and operations can be performed on them, one of the simplest being addition. Even with just arrows, we already have a method to add two vectors together. We can think of it like this. If we first go north, then go west, we end up somewhere to the northwest of our original position. When we add vectors, we can just think of it as connecting them. This is the head-to-tail method, which involves attaching the tail (the non-pointy part) of one vector to the head (the pointy part) without changing their orientations, and the result is a new vector.

Vector addition using the head-to-tail method

This is good enough to get us started in analyzing behaviors of systems; all we need is a ruler and a protractor, and we can work with vectors geometrically by representing them as arrows.

Vectors are a collection of components which are related to a set of basis vectors

While our current definition of vectors works excellently in visualizing vectors, it is difficult to use them in more rigorous calculations. In practice, we can’t always perfectly draw vectors, and this can lead to serious problems if we have to deal with too many of them at once.

What we can instead do is to represent them as a set of numbers which we call the components of a vector. For clarity, we also need to use what as known are basis vectors. These basis vectors are a group of vectors which are essentially used to build the coordinate system we want to use.

For instance, in 2D Cartesian coordinates, we can choose one basis vector to represent the \(x-\)axis and one basis vector to represent the \(y-\)axis. To get the components of a vector under these basis vectors, we can project our vector onto the basis vectors and measure how much we need to scale our basis vectors. Intuitively, we can think of it as the shadow produced by the vector on the \(x-\) or \(y-\)axes when a light is shined upon it. We can then measure how much of each basis vector we need to match the shadow, and this is the component along \(x-\) or \(y-\)axes.

Projecting using light to get vector components

We can write the components of vectors in multiple ways. The most unambiguous way of writing it is to explicitly write down the basis vectors we used. We usually write basis vectors with a caret (\(\hat{\,\,}\)) on top of a vector. From our example, we will call the unit (meaning a length of 1 unit) basis vectors \(\mathbf{\hat{x}}\) and \(\mathbf{\hat{y}}\) for the basis vectors along the \(x-\)axis and \(y-\)axis respectively. In our previous example, the horizontal projection is three times longer than \(\mathbf{\hat{x}}\) and two times longer than \(\mathbf{\hat{y}}\), so we can write this as \begin{equation} 3\mathbf{\hat{x}} + 2\mathbf{\hat{y}}. \end{equation} Here, the \(x-\)component of the vector is 3, while its \(y-\)component is 2. If the basis vectors were initially specified by an author, they may choose to omit them and use any of the following notations to describe a vector in two dimensions: \begin{equation} (3,2), \quad \left\langle 3, 2 \right \rangle, \quad \begin{bmatrix} 3 \\ 2 \end{bmatrix}. \end{equation} The first one intentionally represents vectors as points in space, assuming the tail is always at the origin and the head is at the specified point. The second one is usually preferred so as not to confuse vectors with points, which is important when both vectors and points appear in a single instance. The third one meanwhile writes them down as column matrices. All of these notations are great when we already know what basis vectors we are using. The Cartesian basis vectors such as our example above is the default assumption, but confusion may arise when we work with other coordinate systems. We’ll worry about them later on when we start transforming our coordinate system.

So why does this make calculations easier? Well, let’s say we have another vector \(\mathbf{v}\) in 2D space with components \(a\) and \(b\) with respective basis vectors \(\mathbf{\hat{x}}\) and \(\mathbf{\hat{y}}\) as before. We can now write \(\mathbf{v} = a\mathbf{\hat{x}} + b\mathbf{\hat{y}}\). Then, if we have another vector, say \(\mathbf{w}\), which has some other components \(c\) and \(d\) under the same basis vectors, we have \(\mathbf{w} = c\mathbf{\hat{x}} + d\mathbf{\hat{y}}\). Using our component definition, we can add up the respective \(x-\) and \(y-\)components without having to draw anything: \begin{equation} \mathbf{v} + \mathbf{w} = a\mathbf{\hat{x}} + b\mathbf{\hat{y}} + c\mathbf{\hat{x}} + d\mathbf{\hat{y}} = (a + c)\mathbf{\hat{x}} + (b + d)\mathbf{\hat{y}}. \end{equation} Some people may not be convinced that this is any easier. And yes, it doesn’t immediately appear to be simpler, but note that we did not even have to draw any vector. All we had to do is break vectors down into components, which is a lot easier when we have a coordinate system, and then add them up accordingly. This saves us a lot of time with measuring and drawing the vectors, which we are not even guaranteed to do perfectly.

Addition of vectors through geometry and through components

Additionally, now that we have access to components, we can do a lot more things with vectors. This component form is useful for calculating things such as dot products and cross products, which are used to calculate physical quantities such as the work done by an object and the force exerted by a magnetic field on a charge, to give a few examples.

In our discussions, we only worked with two dimensions, but it is fairly common to work with 3D vectors since we live in 3D space. If working with relativity, we may encounter what are known as four-vectors, which are 4D vectors composed of 3 spatial dimensions and 1 time dimension.

Representing vectors in this component form also unlocks the tools in linear algebra. For this, we think of vectors as matrices, conventionally a row matrix. This change in mindset allows us to use tools like linear transformations (which we will get into later) or even the theory of eigenvalues and eigenvectors. These will come in handy in say, handling rotations, as well as in quantum mechanics.

While we may lose the intuition and graphical representation offered by arrow vectors whenever we define vectors as having components along some bases vectors, we gain an impressive amount of practicability, which both make simple operations more convenient, and give us access to new and more powerful operations.

Vectors have components that transform in a specific manner

So far, we have worked with the components of vectors using the standard Cartesian bases. However, there are several problems that are easier to work with using different coordinate systems, and sometimes we may even be forced to use non-Cartesian bases.

But before we get into that, we have to first talk about what it means to transform something in a mathematical context. A transformation is essentially a method of changing one thing to another, which can be as simple as rotation or as complex as using a series of algorithms. In our situation, we can choose to transform or change our coordinate system. That constitutes a passive transformation, since we are changing anything but the vector itself. We can also directly change our vector; this would be known as an active transformation. Both appear in physics, but we will focus on passive transformations since we are more free to change coordinate systems as we see fit.

Passive transforms change the coordinate system, while active transforms change the vector

Now suppose that we apply a simple linear transformation—intuitively a combination of rotation, scaling, and shearing—on a 2D Cartesian coordinate system. This linear transformation changes the size and orientation of our basis vectors, which in turn affects the grid lines of the entire coordinate system. After applying this transformation, even though the vector was unchanged, its components are now different.

Changing coordinates means changing basis vectors means changing components

Besides the change in the components due to a change in the bases used to describe the vector, this linear transformation gives rise to an interesting behavior: a passive transformation on the coordinate system has the same effect as applying the inverse as an active transformation on the vector itself. If that sounds complicated, take the example of a rotation. If we rotate our coordinates counterclockwise, it results in the same effect as directly rotating our vector clockwise. Alternatively, if we double the size of our grid, while keeping the vector the same size, then its components will necessarily have to shrink.

Active and passive transformations are inverse of each other

In this sense, we can see that for a given transformation, the components transform in the opposite way as the vector. Some people call this property contravariance, where the basis vector transformation is the inverse of the components.

This property is not unique to linear transformations. Any transformation can be concisely expressed by examining how a unit grid changes under the transformation. One common non-linear transformation involves moving from Cartesian to polar coordinates. The two components change from being horizontal and vertical components into the radial length and angle. While this radically changes how components are represented, this behavior is theoretically predictable just by examining how our two axes change. In practice we first come up with how to transform the components then derive how the bases vectors change, so that we can do it in the opposite order if the need to do so arises.

Coincidentally, polar coordinates allow us to tie this idea back to the simple magnitude and direction definition, since the components of a vector in polar form are its radius (which is the magnitude) and angle (which describes direction).

If we can explicitly represent the transformation, linear or otherwise, we can use it directly to calculate the new bases, or take its "inverse" to calculate the new components. This is a special transformation property of vectors that makes them valuable under more general coordinate transformations, which come in handy when working with curved surfaces in general relativity. Besides its utility, thinking of vectors based on how their components transform can help us identify vectors which do not easily admit to interpretations but behave in a similar manner. In physics, sometimes we prefer to be more liberal by taking the duck test: if it looks like a duck and quacks like a duck, it basically is a duck; if it transforms in a specific way like a vector, then we might as well call it a vector.

Since the way vectors transform is important when working with generalized coordinate systems, we use that as a definition and identifier of vectors. If it transforms in such a way that its components change in an inverse manner to the bases, we call it a vector. It is quite strange to define an object based on how one aspect of it changes, but this definition is easily extendable and identifiable when we deal with generalized coordinate systems, where it is often the case that visualizing vectors and coordinate systems can be difficult.

A succinct and powerful but unhelpful way to summarize this is that a vector is anything that transforms like a vector, through the help of the duck test. This definition also extends to how physicists view tensors in general, but we’ll save that for another time.

Vectors are elements of vector spaces

Vectors are useful in analyzing systems because we have access to a wide variety of tools that we can use. But our definitions so far do not admit a more ubiquitous use of the tools we have for vectors. Fortunately, mathematicians have devised a general definition that allows us to use our tools on objects that we would not immediately think of as vectors.

In previous sections, we mentioned how we can represent vectors as arrows, but we can also represent them as row matrices or as points in space. The choice of representation is purely for convenience; it is more straightforward to use the arrow representation for visualizing vectors, while using row matrices lend themselves to analysis thanks to linear algebra. We did not offer an explanation as to why we can do this multiple representation of vectors in the first place, so in this section we will examine why this is the case.

In the same way that we looked at how a vector can be described by how its components get transformed, we can describe a vector based on how it interacts with other vectors of similar nature. "Interact" is a loose term here; it refers to how we can apply certain operations such as stretching a vector by some amount, known as scalar multiplication (the stretching factor is known as the scalar), or adding two vectors, to produce a vector with the same properties. We can use these to formulate a more general and more abstract concept of a vector.

Simply put, a vector space must be closed under vector addition and scalar multiplication. That is, if we have a set of objects that can be manipulated using scalar multiplication and vector addition, however they may be defined, take their linear combinations and end up with something that is also part of the set, then we call that set a vector space and the objects themselves as vectors. Equivalently, we say that a vector is an element of a vector space.

An effective but non-rigorous way of telling if a set \(S\) is a vector space is by checking if a linear combination of its elements is also an element of the set. Mathematically, this is satisfied if the object \begin{equation} \mathbf{u} = \alpha \cdot \mathbf{v} + \beta \cdot \mathbf{w} + \ldots \end{equation} belongs to \(S\), provided that both \(\mathbf{v}\) and \(\mathbf{w}\) also belong to \(S\), and \(\alpha\) and \(\beta\) are any scalar. This simplistic check does not impose any requirement on what the scalar multiplication \((\cdot)\) or the addition \((+)\) operators mean; they could be conventional multiplication and addition, or something else entirely. \(\alpha\) and \(\beta\) are scalars, but they do not have to be conventional scalars like the set of real numbers. If you use any allowed values of \(\alpha\), \(\beta\), \(\mathbf{v}\), and \(\mathbf{w}\), use the above formula, and produce something that belongs in \(S\), then \(S\) is what we call a vector space. It follows then that, since all of its elements are necessarily vectors, then \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\) are all vectors.

A proper and rigorous test is to see if the set \(S\) and its operations of \((\cdot)\) and \((+)\) satisfy the nine axioms of vector spaces. There are additional requirements that we will not discuss here for brevity, but keep in mind that all nine axioms must be satisfied for the set to qualify as a vector space. In practice, many of these axioms are implicitly satisfied by vector spaces in physics, so we can just use the above condition to do a quick and dirty check if we are working with vector spaces.

Interestingly, we can interpret this linear combination requirement as saying that we can express \(\mathbf{u}\) in terms of bases vectors \(\mathbf{v}\), \(\mathbf{w}\), \(\ldots\). Writing it as\begin{equation} \mathbf{u} = \alpha_1 \cdot \mathbf{v}_1 + \alpha_2 \cdot \mathbf{v}_2 + \ldots = \sum_i \alpha_i \cdot \mathbf{v}_i, \end{equation} we can think of \(\alpha_i\) as the components of the vector \(\mathbf{u}\) along a basis vector \(\mathbf{v}_i\).

Using the concept of vector spaces, we can perform a non-rigorous proof that complex numbers are vectors. Suppose that we have some generic vector \(\mathbf{w} = a\mathbf{\hat{x}} + b\mathbf{\hat{y}}\), but instead we rename the unit vectors as \(1\) and \(i\) respectively. We then get \(\mathbf{w} = a(1) + b(i)\). If we omit the \(1\), we end up with \(a + ib\), which suspiciously looks like a complex number, because it is indeed a complex number. We can even verify this by taking a linear combination of two complex numbers \(a+ib\) and \(c+id\), and show that the result is a complex number: \begin{equation} \alpha(a+ib) + \beta(c+ id) = (\alpha a + \beta c) + i(\alpha b + \beta d). \end{equation} This is a complex number with real part \((\alpha a + \beta c)\) and imaginary part \((\alpha b + \beta d)\), so this confirms our assertion that complex numbers are indeed vectors. And we did so without even mentioning that \(i^2 = -1\).

Besides showing that complex numbers are vectors, we may encounter other exotic types of vectors not normally considered as such. Functions are one such example of this, which are elements of the Hilbert space, an infinite-dimensional vector space. Under this Hilbert space, functions are vectors that can be represented as a linear combination of other functions, which serve as our basis vectors. If we have a Hilbert space, then we can express most functions \(f\) using an infinite set of basis vectors \(g_i\) using linear combinations: \begin{equation} f = c_1 g_1 + c_2 g_2 + \ldots = \sum_{n=1}^{\infty} c_n g_n. \end{equation} Functions are indeed vectors on a Hilbert space

In this notation, we say that \(g_n\) are the basis functions and \(c_n\) acts like the components of g_n . The ability to write a function \(f\) this way is called completeness; we say that the bases \(g_n\) are complete if we can do it for any of the normal functions we are familiar with. For instance, \(g_n\) could be polynomials or trigonometric functions, which we know are complete because the Taylor series uses \(x^n\) as basis vectors while the Fourier series uses \(\sin(nx)\) and \(\cos(nx)\) as basis vectors.

The set of square-integrable functions are a noteworthy example of a Hilbert space in physics; they are functions whose square of the absolute value (or modulus if it is complex) have an area that is finite on some region or interval \(I\): \begin{equation} \int_{I} |f(x)|^2\,dx = k < \infty. \end{equation} If the integral sign is a bit foreign, for now we can think of this equation as saying that the signed area of \(|f(x)|^2\) over \(I\) has a finite value.

The interval \(I\) can be finite, such as \([1,2]\), or infinite such as \((-\infty, +\infty)\). Physically, we can interpret this interval as the section of the system’s potential that constrains where the particle can move due to limitations in available energy.

In some systems, particles can only exist in a certain interval

Square-integrable functions are important because they represent real wavefunctions, \(\psi\), which encode the non-collapsed state of a particle such as its position. \(\psi\) is important because \(|\psi|^2\) is the probability density of a particle. When using the quantum treatment, we can only describe a particle’s state in terms of probabilities. If we perform a measurement on the particle’s position, from the position wavefunction we can get the probability of finding a particle in the interval \([a,b]\) by taking the area of a segment of \(|\psi|^2\) that runs from \(a\) to \(b\). More precisely, \begin{equation} \int_a^b |\psi|^2 dx = P(a \leq x \leq b). \end{equation} If this wavefunction \(\psi\) is square-integrable on the interval \(I\), then we should expect that integrating over the entirety of \(I\) should give \(1\), which is the probability of finding the particle somewhere where it should. We may not know where it is exactly, but we know it exists, and we are guaranteed to find the particle if we try to measure it. If \(\psi\) were not square-integrable, we could not normalize it (by dividing \(|\psi|^2\) by \(k\) or dividing \(\psi\) by \(\sqrt{k}\) so that the total integral is 1, and so it cannot possibly represent a real particle.

(Technically speaking, this must be our first measurement of the particle in the state \(\psi\), because after our first measurement, \(\psi) fundamentally collapses so that the state will always be the same in subsequent measurements, but these technicalities are described in detail when you take undergraduate quantum mechanics).

Interestingly, because they are vectors, they are also subject to the rules and properties of linear algebra, such as linear transformations, eigenvalues, and eigenvectors, which are fundamental to quantum mechanics. The eigenvalues in particular come into play when we observe the curious fact that a system can take on any position in the valid region, but its energy can only have certain values.

To quickly summarize, we define vectors to be elements of a vector space, equipped with some notion of vector addition and scalar multiplication. Vector spaces must be closed under a linear combination of vector addition and scalar multiplication. With this definition, we first discovered that complex numbers could be treated as vectors. We also saw that we could define a vector space using square-integrable functions, which forms the foundation for wavefunctions in quantum physics.

Summary

We have discussed the following definitions of a vector:

A simple arrow, with magnitude and direction,
A set of components with respect to some basis,
An object whose components change inversely to how the basis vectors of a coordinate system change,
An element of a vector space.

Through these various representations we were able to use various tools for applications in physics. With these different definitions in mind, we now return to the title’s question: what exactly is a vector? As cliche as it sounds, a vector is all of these things; we decide on a definition that is more convenient for use. The arrow representation is useful for initially getting started in physics; the component definition is more usable for doing calculations; the transformation-based definition can be generalized to arbitrary coordinate systems, and the vector space definition can help us use identify unfamiliar vectors and be able to use the tools we have to analyze them.

To avoid making this article too long, we have decided to omit an even more abstract perspective on vectors, which is that they belong to an even more general class of objects known as tensors. Tensors are invaluable in advanced physics, but it is complicated enough that warrants its own article in the future.

References

3blue1brown’s Essence of Linear Algebra video playlist goes in-depth on the intuition behind linear algebra, which has helped us formulate the visualizations used in this article.
George Arfken, Hans Weber, and Frank Harris wrote the 7th edition of Mathematical Methods for Physicists, which was the primary text used for our mathematical methods classes. This reference the foundation for our understanding of vectors. This is not exactly the best text to be self-enjoyed, it can get math and definition-heavy, but it is complete as far as mathematical physics goes.
Wikipedia has an entire page dedicated to active and passive transformations (https://en.wikipedia.org/wiki/Active_and_passive_transformation), which we used to explain transformation properties of vectors.
Eigenchris has a wonderful Youtube video on transformation rules of vectors that we used to figure out how to describe them without resorting to complicated equations.
Daniel Fleisch’s book, A Student’s Guide to Vectors and Tensors, excellently explains vector transformation properties and how this can be generalized even further.
David Griffith’s Introduction to Quantum Mechanics introduced a useful physics-related application for vector spaces and how square-integrable functions play a role in quantum mechanics.

What Exactly is a Vector?

A brief introduction to various definitions and their applications in physics