Lesson 2.4: The Chain Rule (by Carlo Angiuli)

Graph of y=f(x)=sin(3x)
Graph of y=f(x)=sin(3x)

We already know how to differentiate many functions, but there are some which still escape our grasp: among them are composite functions. Composite functions are those formed by plugging a function into another function. Let’s look at a simple example: f(x)=sin(3x). The graph of y=f(x) is shown.

We can logically deduce the derivative f'(x) by comparing f to sin x, whose derivative you will recall is cos x. Well, sin 3x is the same as sin x, except that it is compressed horizontally by a factor of 3.

Notice that at x=pi/2 we are at the bottom of the wave, not the top, as you would be in sin x at x=3pi/2. This makes sense, of course, since you’re multiplying x by 3 before plugging it into the sine function. So places at x on sin x correspond to places at 3x on sin 3x, so we have to plug 3x into the derivative, too, so it corresponds to the right location. Therefore our notion so far of the derivative of sin 3x is cos 3x.

But we’re not done yet. Since sin 3x is compressed by a factor of three, notice that it’s three times as steep at every point, since it’s going through the same sequence of points three times as quickly. Or, if you’d prefer, Delta y is the same for a Delta x that’s a third as large (because y will go through the same points in that span of x values), and Delta y/Delta(x/3) = 3(Delta y/Delta x).

So since it’s three times as steep, we have to multiply its derivative by three. Now we think the derivative of sin 3x is 3cos 3x. This happens to be exactly correct.

Now, let’s generalize and figure out the derivative of f'(g(x)), which you will sometimes see written as f'(g(x)). (Note that in the previous example, f(x)=sin x and g(x)=3x.)

Think of g(x) as another variable; let’s call it u. Therefore we’re looking at the function f(u). f(u) behaves similarly to f(x), except that the value of x doesn’t determine where you are on the graph; rather, the value of u does. Since f'(x) corresponds to values on f(x), likewise f'(u) corresponds to the correct place on f(u).

The difference is that, since you’re still modifying x directly rather than u, f'(g(x)) might be getting compressed or expanded at certain points, depending on how g(x) acts at that point. (3x is linear, so it provides a consistent compression for the entire graph.) The amount by which f(g(x)) is getting compressed or expanded at a given x is proportional to how g(x) is changing at that point—in other words, it is proportional to du/dx. If du/dx>1, then f(u) is being compressed, because u is going through values faster than x is, just as 3x is going through values faster than x is. That means that the derivative of f(u) is larger than the derivative of f(x). By a factor of du/dx, to be precise, as that is the ratio of how much faster u goes through values than x does.

Likewise, if du/dx is less than 1, then f(u) is being expanded since u is going through values more slowly than x is, so the f'(u) is less than f'(x), again by a factor of du/dx. And if du/dx is negative, then f(u) is going backwards as compared to f(x), so we have to take the opposite of the derivative.

In all of these cases, f(u) differs from f'(u) by a factor of du/dx, which is the same as g'(x). Therefore, just as in the first example, the derivative of f'(g(x)) is f'(g(x))g'(x). Or, if you prefer, we can write df/dx=(df/du)(du/dx). This latter equation for what we will now call the Chain Rule has the advantage of providing a convenient (albeit somewhat incorrect) mnemonic that you can simply “cancel” du.

Some of you may be interested in a rigorous proof, but that is really a topic more suited to the study of analysis than that of calculus.

Valid XHTML 1.1!
Valid CSS!