7.9. The Mean Value Theorem
Extreme points are a basic concept in applied mathematics. Examples in 7.11 will demonstrate this. So we start this chapter introducing this notation.
The major part however is up to the mean value theorem and its implications. The theorem itself is easily depicted (look at the sketch in [7.9.4]), but its proof is far from being trivial. The actual clue is the extreme value theorem ([6.6.5]) for continuous functions.
The mean value theorem proves to be the most powerful device in calculus.
Definition: For an arbitrary point $a\in A$ a function $f:A\to \mathbb{R}$ is said to have a

global maximum at a, if $f(a)\ge f(x)\text{for all}x\in A$

global minimum at a, if
$f(a)\le f(x)\text{for all}x\in A$

local maximum at a, if there is a relative εneighbourhood
$A}_{a,\epsilon$
i 
${A}_{a,\epsilon}=A\cap ]a\epsilon ,a+\epsilon [$

such that
$f(a)\ge f(x)\text{for all}x\in {A}_{a,\epsilon}$

local minimum at a, if there is a relative εneighbourhood $A}_{a,\epsilon$ such that $f(a)\le f(x)\text{for all}x\in {A}_{a,\epsilon}$

[7.9.1] 
In any case we speak of a global or local extremum. Occasionally the notations absolute extremum and relative extremum respectively are used.
a is called an extreme point for the extremum (or the extreme value) $f(a)$.

To illustrate this new concept we look at the function $f:[\mathrm{2,}\infty [\to \mathbb{R}$given by
$f(x)=\frac{1}{4}{x}^{4}\frac{5}{6}{x}^{3}\frac{1}{2}{x}^{2}+\frac{5}{2}x$.
The sketch to the right obviously shows that f has

a local minimum at −1 and at 2.5. The local minimum at −1 is even a global one.

a local maximum at −2 and at 1.

no global maximum as f is unbounded from above.


Note that with the differentiable function f a horizontal tangent is attached to all of the interior extreme points, but not to −2.
Consider:

The ability to have an extremum is not bound to differentiability. The absolute value function $\mathrm{X}$ for instance has a global minimum at 0, but fails to be differentiable.

Each global extreme point is a local one as well, because if an estimate holds for the whole of A it will certainly hold for every subset of the type $A}_{a,\epsilon$, thus for instance for $A}_{a\mathrm{,1}$. The sketch above shows that the reverse does not hold.

As we used the relations $\le$ and $\ge$ when stating [7.9.1] a constant function has a global maximum and a global minimum at each point simultaneously. This cannot happen however when strict extrema are involved.
It will be an important task to detect local extreme points. As noticed above, points with horizontal tangents will play a certain role in this quest and in fact this observation leads to a first existance criterion for local extreme points.
Proposition (necessary criterion): Let $f:A\to \mathbb{R}$ be differentiable at $a\in A$. If a is an
interior point
i 
$\begin{array}{rr}\hfill \text{i.e. there is an}\epsilon 0\text{such that}& ]a\epsilon ,a+\epsilon [\subset A\hfill \\ \hfill \iff & {A}_{a,\epsilon}=]a\epsilon ,a+\epsilon [\hfill \end{array}$

of A the following implication holds:
If f has a local extremum at a its derivative vanishes at a: ${f}^{\prime}(a)=0$.

[7.9.2] 
Proof: Assume f has a local maximum at a. So there is a relative εneighbourhood $A}_{a,\epsilon$ such that
$f(x)f(a)\le 0\text{for all}x\in {A}_{a,\epsilon}=]a\epsilon ,a+\epsilon [$.
This allows to calculate the sign of the difference quotient function:
$m}_{a}(x)=\frac{f(x)f(a)}{xa}\text{\hspace{0.28em}}\{\begin{array}{l}\ge 0\text{, if}a\epsilon xa\hfill \\ \le 0\text{, if}axa+\epsilon \hfill \end{array$.
With [6.9.1] and [6.9.4] we thus have:
$\underset{x\to a}{\mathrm{lim}}{m}_{a}(x)=\underset{x\to a}{\mathrm{lim}}{m}_{a}]a,a+\epsilon [(x)\le 0\le \underset{x\to a}{\mathrm{lim}}{m}_{a}]a\epsilon ,a[(x)=\underset{x\to a}{\mathrm{lim}}{m}_{a}(x)$,
which finally results in: ${f}^{\prime}(a)=\underset{x\to a}{\mathrm{lim}}{m}_{a}(x)=0$.

Consider:

The necessary criterion confirms the presumed behavior: Tangents attached to interior local extreme points are always horizontal.

The reverse of [7.9.2] is not true (read: [7.9.2] is not sufficient). The function $\mathrm{X}}^{3$ for instance has no local extremum at 0 irrespective of $({\mathrm{X}}^{3}{)}^{\prime}(0)=0$. Actually the necessary criterion only filters out points with horizontal tangents.
Looking for suitable sufficient criteria is thus worthwhile. [7.9.17] at the end of this part is a first example for such a criterion.

The necessary criterion is not valid at boundary points: The restriction $\mathrm{X}{\mathbb{R}}^{\ge 0}$ has a local (even global) minimum at 0, but its derivative number at 0 is 1.

[7.9.2] is often read as " ${f}^{\prime}(a)=0$ is a necessary existence condition for interior local extreme points". Thus only the first derivative's zeros come into question when searching for local extreme points.
Theorem (Rolle's theorem): Let f be continuous on the closed interval $[a,b]$ and differentiable on its interior $]a,b[$, i.e. $f\in {\mathcal{C}}^{0}([a,b])\cap {\mathcal{D}}^{1}(]a,b[)$.
If $f(a)=f(b)$, then there is an $\tilde{x}\in ]a,b[$ such that
${f}^{\prime}(\tilde{x})=0$

[7.9.3] 
Proof: f has a global maximum and a global minimum due to the extreme value theorem [6.6.5]. (Note that f is continuous on a closed interval!) Thus there are two numbers $\underset{\xaf}{x},\overline{x}\in [a,b]$ such that
$f(\underset{\xaf}{x})\le f(x)\le f(\overline{x})\text{for all}x\in [a,b]$. [0]
If one of these numbers is an interior point of $[a,b]$, it is certainly a zero for $f}^{\prime$ as a result of the necessary criterion [7.9.2].
Otherwise we would know that $\underset{\xaf}{x},\overline{x}\in \{a,b\}$, thus $f(\underset{\xaf}{x})=f(\overline{x})$ as $f(a)=f(b)$ due to the premise. [0] now forces f to be a constant function, $f}^{\prime$ thus vanishes everywhere.

Consider:

Rolle's theorem is a pure existance theorem. It does not provide any information on uniqueness and on the precise location of $\tilde{x}$.

The continuity at a and at b is compulsory. As an example take the function $f:[\mathrm{0,1}]\to \mathbb{R}$ defined by $f(x)\u2254\{\begin{array}{l}x\text{, if}x\ne 1\hfill \\ 0\text{, if}x=1\hfill \end{array}$. We have ${f}^{\prime}(x)=1$ for all $x\in ]\mathrm{0,1}[$, although $f(0)=f(1)$. But f is discontinuous at 1.

As $f(a)=f(b)$, the line segment joining $(a,f(a))$ and $(b,f(b))$ is horizontal. Thus Rolle's theorem is often stated in a more geometrical manner: There is an interior point $\tilde{x}$ with a tangent parallel to the line segment which joins the end point of f. The sketch to the right illustrates this for the function $\frac{2}{3}{\mathrm{X}}^{3}+\frac{4}{3}{\mathrm{X}}^{2}\frac{2}{3}\mathrm{X}\frac{5}{6}$ on the interval $[\mathrm{2,1}]$. Apart from the marked position $\tilde{x}=\frac{2\sqrt{7}}{3}\approx 1.55$ there is obviously another option in this case for a horizontal tangent, namely at $\frac{2+\sqrt{7}}{3}\approx 0.22$.
It is tempting to ask if this geometric property is still valid if the restriction $f(a)=f(b)$ is lifted. The function $\frac{2}{3}{\mathrm{X}}^{3}+\frac{4}{3}{\mathrm{X}}^{2}+\frac{1}{2}$ on $[\mathrm{2,1}]$ depicted below suggests that the answer might be "yes". Luckily we are able to prove that "yes" is the actual answer.
Theorem (mean value theorem): There is an $\tilde{x}\in ]a,b[$ for each $f\in {\mathcal{C}}^{0}([a,b])\cap {\mathcal{D}}^{1}(]a,b[)$ such that
$f}^{\prime}(\tilde{x})=\frac{f(b)f(a)}{ba$

[7.9.4] 
Proof: We apply Rolle's theorem [7.9.3] to a modification of f. The function


$g\u2254f\frac{f(b)f(a)}{ba}(\mathrm{X}a)$
is certainly continuous on $[a,b]$ and differentiable on $]a,b[$. As
$g(b)=f(b)\frac{f(b)f(a)}{ba}(ba)=f(a)=g(a)$
the special condition of Rolle's theorem is met, so that due to [7.9.3] there is an $\tilde{x}\in ]a,b[$ such that
$0={g}^{\prime}(\tilde{x})={f}^{\prime}(\tilde{x})\frac{f(b)f(a)}{ba}$.
This however is the assertion.

Consider:

On the one hand the mean value theorem is clearly an implication of Rolle's theorem due to the structure of its proof. We find on the other hand that Rolle's theorem comes as a special case from the mean value theorem, because if, in addition, $f(a)=f(b)$ we get $f(\tilde{x})=\frac{f(b)f(a)}{ba}=0$. Both theorems are thus equivalent:
Rolle's theorem$\text{\hspace{1em}}\iff \text{\hspace{1em}}$mean value theorem
With the mean value theorem one of the major results in calculus is now at our disposal. There are a lot of nontrivial applications supporting this rating. Our present notation [7.9.4] however seldom proves to be suitable for a direct application. We will thus benefit from another, equivalent version which is more taylored to application purposes. As
$\frac{f(b)f(a)}{ba}=\frac{f(a)f(b)}{ab}$
[7.9.4] is always valid, wheather or not a is actually left of b. Thus we will subsequently use the phrase "$\tilde{x}$ lies in between a and b" as an abbreviation for
$\begin{array}{l}\tilde{x}\in ]a,b[\text{, if}ab\hfill \\ \tilde{x}\in ]b,a[\text{, if}ab\hfill \end{array}$
Solving [7.9.4] for $f(b)$ provides a new version of the mean value theorem:
Let I be an arbitrary
interval
i 
We understand this as a common notation for open and closed intervals. In the open case we also allow the values $\infty$ for the right and $\infty$ for the left boundery. Thus $\mathbb{R}$ and e.g. $\mathbb{R}}^{>0$ are regarded as intervals as well.

and let $a,b\in I$ be any two different points of I. If $f\in {\mathcal{D}}^{1}(I)$, then there is an $\tilde{x}$ in between a and b such that
$f(b)=f(a)+(ba)\cdot {f}^{\prime}(\tilde{x})$

[7.9.5] 
Consider:

The closed interval generated by a and b is a subset of I. Any function differentiable on I is as well differentiable, and thus also continuous, on that closed subinterval. The conditions of [7.9.4] are thus satisfied.

[7.9.5] extends (on intervals) the basic representation theorem [7.5.1] as all the values $r(x)$ now prove to be derivative numbers of f.

The mean value theorem is only valid for intervals. The
Heaviside step function H
i 
$\mathrm{H}(x)=\{\begin{array}{l}1\text{, if}x\ge 0\hfill \\ 0\text{, if}x0\hfill \end{array}$

for example is $\mathcal{D}}^{1$ on $\mathbb{R}}^{\ne 0$. But as ${\mathrm{H}}^{\prime}(x)=0$ for all $x\ne 0$ there will be no $\tilde{x}$ such that
$\mathrm{H}(1)=\mathrm{H}(1)+(11)\cdot {\mathrm{H}}^{\prime}(\tilde{x})$,
because that would imply: $0=1$.
Our first application goes back to a promise made in the context of [7.5.3/4]. Now we are able to show that regular functions on intervals are always injective.
The way the mean value theorem is used in the subsequent proof is a characteristic one: The equation [7.9.5] allows to access properties of f as soon as global features of its derivative are available. In this case for example we know that the derivative values ${f}^{\prime}(x)$ are nonzero everywhere and thus certainly nonzero as well at the normally unknown point $\tilde{x}$ provided by the mean value theorem.
Proposition: If $f\in {\mathcal{C}}^{0}(I)$ is differentiable at each interior point of I we have:
${f}^{\prime}(x)\ne 0$ for all interior points x of I$\text{\hspace{1em}}\Rightarrow \text{\hspace{1em}}$f is injective on I.

[7.9.6] 
Proof: If x and y are any two different points of I there is an $\tilde{x}$ in between x and y according to [7.9.5] such that
$f(x)=f(y)+\underset{\ne 0}{\underbrace{(xy)}}\cdot \underset{\ne 0}{\underbrace{{f}^{\prime}(\tilde{x})}}$.
With $(xy)\cdot {f}^{\prime}(\tilde{x})\ne 0$ we now see: $f(x)\ne f(y)$.

A second example will classify the constant functions using only their derivative behaviour.
Proposition: For any function $f\in {\mathcal{D}}^{1}(I)$ the following holds:
$f(x)=c\text{for all}x\in I\text{\hspace{1em}}\iff \text{\hspace{1em}}{f}^{\prime}(x)=0\text{for all}x\in I$

[7.9.7] 
Proof: Whereas the direction "$\Rightarrow$" is trivial (see [7.3.6]) the reverse one "$\Leftarrow$" turns out to be the actual task. And again this is a characteristic context for the mean value theorem. If we choose a fixed point $a\in I$ [7.9.5] guarantees an $\tilde{x}$ in between a and x for each $x\in I$ different from a such that
$f(x)=f(a)+(xa)\cdot \underset{=0}{\underbrace{{f}^{\prime}(\tilde{x})}}=f(a)$
Taking $c\u2254f(a)$ now proves the assertion.

An appropriate statement for polynomials extends [7.9.7] considerably.
Proposition: Let $n\in \mathbb{N}$ be an arbitrary natural number. For any $\mathcal{D}}^{n+1$function $f:\mathbb{R}\to \mathbb{R}$ we have:
f is a ploynomial of degree $\le n\text{\hspace{1em}}\iff \text{\hspace{1em}}{f}^{(n+1)}=0$

[7.9.8] 
Proof: "$\Rightarrow$" is an immediate consequence from [7.8.14]. We prove "$\Leftarrow$" by induction. As the base step ($n=0$) is already done by [7.9.7] it remains to prove the induction step. To that end let f be a $\mathcal{D}}^{n+2$function such that
$({f}^{(n+1)}{)}^{\prime}={f}^{(n+2)}=0$.
According to [7.9.7] the differentiable function $f}^{(n+1)$ is a constant one (the weird notation of c is due to [7.8.14]):
$f}^{(n+1)}=c=(\frac{c}{(n+1)!}{\mathrm{X}}^{n+1}{)}^{(n+1)$.
Now we consider the $\mathcal{D}}^{n+1$function $p\u2254f\frac{c}{(n+1)!}{\mathrm{X}}^{n+1}$. As obviously
${p}^{(n+1)}=(f\frac{c}{(n+1)!}{\mathrm{X}}^{n+1}{)}^{(n+1)}={f}^{(n+1)}(\frac{c}{(n+1)!}{\mathrm{X}}^{n+1}{)}^{(n+1)}=0$
we know that p is a polynomial of degree $\le n$ due to the induction hypothesis. But that means: $f=p+\frac{c}{(n+1)!}{\mathrm{X}}^{n+1}$ is a polynomial of degree $\le n+1$.

[7.9.8] allows to spot the polynomials within the $\mathcal{C}}^{\infty$functions on $\mathbb{R}$ simply by checking if the zero function is one of their derivatives. The derivatives calculated in [7.8.1113] thus prove that exp, sin and cos are no polynomials.
When dealing with continuity we introduced one of its special forms in [6.5.6], the so called lipschitzcontinuity. From the mean value theorem we now get the information that a differentiable function with a bounded derivative will be lipschitzcontinuous automatically.
Task: If $f\in {\mathcal{D}}^{1}(I)$ and ${f}^{\prime}(x)\le c$ for all interior points x of I, then all $x,y\in I$ satisfy:
Proof:
?
Obviously we may assume that $x\ne y$. According to the mean value theorem there is an $\tilde{x}$ in between x and y such that $f(x)=f(y)+(xy)\cdot {f}^{\prime}(\tilde{x})$. But this leads to
$f(x)f(y)=\underset{\le c}{\underbrace{{f}^{\prime}(\tilde{x})}}\cdot xy\le c\cdot xy$.

Consider:

Every function $f\in {\mathcal{C}}^{1}([a,b])$ is lipschitzcontinuous, because now $f}^{\prime$ is a continuous function on a closed interval and thus bounded due to [6.6.4].

As sin and cos have only values between −1 and 1, we get for all $x,y\in \mathbb{R}$:
$\begin{array}{l}\mathrm{sin}x\mathrm{sin}y\le xy\hfill \\ \mathrm{cos}x\mathrm{cos}y\le xy\hfill \end{array}$
We are now going to extend the mean value theorem. Two options are at our disposal: We could try to find a version for two functions, as we did successfully with the intermediate value theorem (see [6.6.3]), and we could check if the mean value theorem reveals additional features if repeatedly differentiable functions are involved.
Proposition (second mean value theorem): For any two functions $f,g\in {\mathcal{C}}^{0}([a,b])\cap {\mathcal{D}}^{1}(]a,b[)$ there is an $\tilde{x}\in ]a,b[$ such that
$(f(b)f(a))\cdot {g}^{\prime}(\tilde{x})=(g(b)g(a))\cdot {f}^{\prime}(\tilde{x})$

[7.9.10] 
Proof: It is easily calculated that the function
$h\u2254(f(b)f(a))\cdot g(g(b)g(a))\cdot f\in {\mathcal{C}}^{0}([a,b])\cap {\mathcal{D}}^{1}(]a,b[)$
satifies $h(a)=f(b)\cdot g(a)f(a)\cdot g(b)=h(b)$. According to Rolle's theorem [7.9.3] we thus find an $\tilde{x}\in ]a,b[$ such that
$0={h}^{\prime}(\tilde{x})=(f(b)f(a))\cdot {g}^{\prime}(\tilde{x})(g(b)g(a))\cdot {f}^{\prime}(\tilde{x})$,
which in fact is the assertion.

Consider:

If we interchange a and b in [7.8.10], i.e. if we multiply the equation by −1, [7.8.10] is still valid. The second mean value theorem thus does also not depend on wheather or not a is actually left of b.
With the second mean value theorem we will get L'Hôpital's rule, a very efficient technique for calculating certain limits. We start with the following observation:
Let $f,g:A\to \mathbb{R}$ be differentiable at a and assume ${g}^{\prime}(a)\ne 0$. If $f(a)=g(a)=0$ then $\frac{f}{g}$ is continuously continuable at a by the limit
$\underset{x\to a}{\mathrm{lim}}\frac{f(x)}{g(x)}=\frac{{f}^{\prime}(a)}{{g}^{\prime}(a)}$.

[7.9.11] 
Proof: At first we consider the representation (see [7.5.1] for details)
$g=g(a)+(\mathrm{X}a)r=(\mathrm{X}a)r$
to show that a is an
accumulation point
i 
According to [6.4.4] we would manage this by creating a sequence $($ in $\{$ such that $a}_{n$. As a is already an accumulation point of A (otherwise there would be no function differentiable at a) it is sufficient to that end to find a relative εneighbourhood $A}_{a,\epsilon$ satisfying $A}_{a,\epsilon$.

of $\{x\in Ag(x)\ne 0\}$. As r is continuous at a the information $r(a)={g}^{\prime}(a)\ne 0$ yields a relative εneighbourhood $A}_{a,\epsilon$ such that $g(x)\ne 0$ for all $x\in {A}_{a,\epsilon}\backslash \{a\}$.
As $f}^{\prime}(a)=\underset{x\to a}{\mathrm{lim}}\frac{f(x)f(a)}{xa$ and $g}^{\prime}(a)=\underset{x\to a}{\mathrm{lim}}\frac{g(x)g(a)}{xa$ [7.9.11] now follows immediately with [6.9.8] from the equation
$\frac{f(x)}{g(x)}=\frac{f(x)f(a)}{g(x)g(a)}=\frac{\frac{f(x)f(a)}{xa}}{\frac{g(x)g(a)}{xa}}$
which holds for all $x\in {A}_{a,\epsilon}\backslash \{a\}$.
Proposition (L'Hôpital's rule): Let I be an interval, $a\in I$ and $f,g\in {\mathcal{C}}^{0}(I)\cap {\mathcal{D}}^{1}(I\backslash \{a\})$ such that ${g}^{\prime}(x)\ne 0$ for all $x\in I\backslash \{a\}$. If $\frac{{f}^{\prime}}{{g}^{\prime}}$ is continuously continuable at a the following holds:

$f(a)=g(a)=0\text{\hspace{1em}}\Rightarrow \text{\hspace{1em}}$
$\frac{f}{g}$ is continuously continuable at a by $\underset{x\to a}{\mathrm{lim}}\frac{f(x)}{g(x)}=\underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}$.

[7.9.12] 

$\underset{x\to a}{\mathrm{lim}}\frac{1}{g(x)}=0\text{\hspace{1em}}\Rightarrow \text{\hspace{1em}}$
$\frac{f}{g}$ is continuously continuable at a by $\underset{x\to a}{\mathrm{lim}}\frac{f(x)}{g(x)}=\underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}$.

[7.9.13] 
Proof: We go ahead with the sequence criterion [6.8.4] and take a sequence $({a}_{n})$ in $I\backslash \left\{a\right\}$ such that ${a}_{n}\to a$. We may assume that $g({a}_{n})\ne 0$ for all n, because: As ${g}^{\prime}(x)\ne 0$, [7.9.6] allows at most one zero for g left of a, i.e. within the interval $I\cap {\mathbb{R}}^{<a}$ and at most one zero within the right subinterval $I\cap {\mathbb{R}}^{>a}$. According to the second mean value theorem [7.9.10] there is now an $\tilde{x}}_{n$ in between a and $a}_{n$ , that means ${\tilde{x}}_{n}a\le {a}_{n}a$, for each n such that
$(f({a}_{n})f(a))\cdot {g}^{\prime}({\tilde{x}}_{n})=(g({a}_{n})g(a))\cdot {f}^{\prime}({\tilde{x}}_{n})$. [1]
If $({a}_{n})$ converges to a the same is true for $({\tilde{x}}_{n})$ (see the nesting theorem [5.5.8]!) which guarantees the convergence $\frac{{f}^{\prime}({\tilde{x}}_{n})}{{g}^{\prime}({\tilde{x}}_{n})}\to \underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}$ due to the premise. From [1] we thus get:
1. ►
$\frac{f({a}_{n})}{g({a}_{n})}=\frac{f({a}_{n})f(a)}{g({a}_{n})g(a)}=\frac{{f}^{\prime}({\tilde{x}}_{n})}{{g}^{\prime}({\tilde{x}}_{n})}\to \underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}$.
2. ►
$\frac{f({a}_{n})}{g({a}_{n})}=\underset{\to 0}{\underbrace{\text{\hspace{0.17em}}\frac{f(a)}{g({a}_{n})}\text{\hspace{0.17em}}}}+(1\underset{\to 0}{\underbrace{\text{\hspace{0.17em}}\frac{g(a)}{g({a}_{n})}\text{\hspace{0.17em}}}})\cdot \frac{{f}^{\prime}({\tilde{x}}_{n})}{{g}^{\prime}({\tilde{x}}_{n})}\to \underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}$.

Consider:

Of course are we allowed to iterate the above rules: Take e.g. $f,g\in {\mathcal{C}}^{1}(I)\cap {\mathcal{D}}^{2}(I\backslash \{a\})$ such that $f(a)=g(a)=0$, ${f}^{\prime}(a)={g}^{\prime}(a)=0$ and ${g}^{\prime}(x),{{g}^{\prime}}^{\prime}(x)\ne 0$ for all $x\in I\backslash \left\{a\right\}$. If the limit $\underset{x\to a}{\mathrm{lim}}\frac{{{f}^{\prime}}^{\prime}(x)}{{{g}^{\prime}}^{\prime}(x)}$ exists the following ones exist as well and are all the same:
$\underset{x\to a}{\mathrm{lim}}\frac{f(x)}{g(x)}=\underset{x\to a}{\mathrm{lim}}\frac{{f}^{\prime}(x)}{{g}^{\prime}(x)}=\underset{x\to a}{\mathrm{lim}}\frac{{{f}^{\prime}}^{\prime}(x)}{{{g}^{\prime}}^{\prime}(x)}$.
Example: Many problems could already be solved using only [7.9.11]. The logarithm function ln from the third example is introduced in [8.7.1].



[7.9.14] 
This justifies the phrase "If $0<x\to 0$, then the logarithm function ln approaches $\infty$ slower than every positive power of X".
An analogue statement is true for the exponential function exp. "If $x\to \infty$, then exp approaches $\infty$ quicker than every positive power of X":


[7.9.15] 
Proof: L'Hôpital's rule is not necessary in this case. We just need the
series representation of exp
i 
$\mathrm{exp}(x)=\sum _{i=0}^{\infty}\frac{{x}^{i}}{i!}$ See [5.9.18] for details

.
Choosing a $k\in \mathbb{N}$ such that $k>a+1$ will yield the assertion as the following estimate is true for all $x\ge 1$:
$0\le \frac{{x}^{a}}{\mathrm{exp}(x)}=\frac{{x}^{a}}{\sum _{i=0}^{\infty}\frac{{x}^{i}}{i!}}\le k!\frac{{x}^{a}}{{x}^{k}}=k!\frac{1}{{x}^{ka}}\le k!\frac{1}{x}$.


We now turn to the second promised extension of the mean value theorem. This will, amongst others, result in a special method (Taylor polynomials) to study analytical functions.
Theorem (Taylor's theorem): Take $n\in \mathbb{N}$. For any function $f\in {\mathcal{C}}^{n}([a,b])\cap {\mathcal{D}}^{n+1}(]a,b[)$ there is an $\tilde{x}\in ]a,b[$ such that Taylor's formula holds:
$f(b)=\sum _{i=0}^{n}\frac{{f}^{(i)}(a)}{i!}{(ba)}^{i}+\frac{{f}^{(n+1)}(\tilde{x})}{(n+1)!}{(ba)}^{n+1}$

[7.9.16] 
Proof: It is Rolle's theorem again that will do the trick. As a start we find a real number c such that
$f(b)=\sum _{i=0}^{n}\frac{{f}^{(i)}(a)}{i!}{(ba)}^{i}+\frac{c}{(n+1)!}{(ba)}^{n+1}$[2]
by simply solving the linear equation [2] for c. So it remains to find an $\tilde{x}$ such that $c={f}^{(n+1)}(\tilde{x})$. To that end we consider the function
$g\u2254\sum _{i=0}^{n}\frac{{f}^{(i)}}{i!}{(b\mathrm{X})}^{i}+\frac{c}{(n+1)!}{(b\mathrm{X})}^{n+1}$.
Due to the premise g is continuous on $[a,b]$ and differentiable on $]a,b[$. We use the product rule to calculate its derivative and observe that the resulting series is a telescopic one collapsing to a single difference:
$\begin{array}{ll}{g}^{\prime}\hfill & ={f}^{\prime}+\sum _{i=1}^{n}(\frac{{f}^{(i+1)}}{i!}{(b\mathrm{X})}^{i}\frac{{f}^{(i)}}{(i1)!}{(b\mathrm{X})}^{i1})\frac{c}{n!}{(b\mathrm{X})}^{n}\hfill \\ \hfill & ={f}^{\prime}+\frac{{f}^{(n+1)}}{n!}{(b\mathrm{X})}^{n}\frac{{f}^{(1)}}{0!}{(b\mathrm{X})}^{0}\frac{c}{n!}{(b\mathrm{X})}^{n}\hfill \\ \hfill & =\frac{{f}^{(n+1)}}{n!}{(b\mathrm{X})}^{n}\frac{c}{n!}{(b\mathrm{X})}^{n}\hfill \end{array}$[3]
$g(b)=f(b)$ is obvious and according to [2] $g(a)=f(b)$ is valid as well. Thus Rolle's theorem [7.9.3] provides an $\tilde{x}\in ]a,b[$ such that
$0={g}^{\prime}(\tilde{x})=\frac{{f}^{(n+1)}(\tilde{x})}{n!}{(b\tilde{x})}^{n}\frac{c}{n!}{(b\tilde{x})}^{n}\text{\hspace{1em}}\iff \text{\hspace{1em}}c{(b\tilde{x})}^{n}={f}^{(n+1)}(\tilde{x}){(b\tilde{x})}^{n}$.
As $b\tilde{x}\ne 0$, we see that $c={f}^{(n+1)}(\tilde{x})$.

Consider:

If $n=0$ [7.9.16] and [7.9.5] coincide. Thus Taylor's theorem is indeed an extension of the mean value theorem.

And again we find that the actual order of a and b is of no relevance for the validity of Taylor's theorem. The proof however is a bit more tricky this time. We need to introduce the linear function
$g\u2254b+(a\mathrm{X})$
and to apply Taylor's formula to the composit $f\circ g$. Noting that $(f\circ g)}^{(i)}(x)={f}^{(i)}(g(x))\cdot {(1)}^{i$, especially $f\circ g(b)=f(a)$ and $(f\circ g)}^{(i)}(a)={f}^{(i)}(b)\cdot {(1)}^{i$, we get
$\begin{array}{ll}f(a)\hfill & =\sum _{i=0}^{n}\frac{{f}^{(i)}(b)\cdot {(1)}^{i}}{i!}{(ba)}^{i}+\frac{{f}^{(n+1)}(g(\tilde{x}))\cdot {(1)}^{n+1}}{(n+1)!}{(ba)}^{n+1}\hfill \\ \hfill & =\sum _{i=0}^{n}\frac{{f}^{(i)}(b)}{i!}{(ab)}^{i}+\frac{{f}^{(n+1)}(g(\tilde{x}))}{(n+1)!}{(ab)}^{n+1}\hfill \end{array}$
Thus Taylor's formula holds for $f(a)$ as well when we take $g(\tilde{x})$ as the "new" $\tilde{x}$. So we may restate Taylor's theorem the following way:
For any function $f\in {\mathcal{D}}^{n+1}(I)$ and any two different points $a,b\in I$ there is an $\tilde{x}$ in between a und b that satisfies [7.9.16].

If $f\in {\mathcal{D}}^{n+1}(I)$ and $a\in I$ we call the polynomial
$T}_{a,n}\u2254\sum _{i=0}^{n}\frac{{f}^{(i)}(a)}{i!}{(\mathrm{X}a)}^{i$
the nth Taylor polynomial and the function $R}_{a,n}:I\to \mathbb{R$ defined by
$R}_{a,n}(x)\u2254\{\begin{array}{l}\frac{{f}^{(n+1)}(\tilde{x})}{(n+1)!}{(xa)}^{n+1}\text{, if}x\ne a\hfill \\ 0\text{, if}x=a\hfill \end{array$
the nth remainder of f with respect to a. The function $R}_{a,n$ is well defined: There might be several $\tilde{x}$ for a fixed x satisfiying Taylor's formula, but the value ${f}^{(n+1)}(\tilde{x})$ is the unique solution c of [2] belonging to x and a. $R}_{a,n$ is sometimes referred to as the Lagrange form of the remainder.

If $f\in {\mathcal{C}}^{\infty}(I)$ and $a\in I$ we call the power series $(\sum _{i=0}^{n}\frac{{f}^{(i)}(a)}{i!}{(\mathrm{X}a)}^{i})$ the Taylor series of f with respect to a. If the Taylor series is convergent with r as its radius of convergence and if $f{I}_{a,r}$ is its limit function, i.e.
$f(x)=\sum _{i=0}^{\infty}\frac{{f}^{(i)}(a)}{i!}{(xa)}^{i}$ for all $x\in ]ar,a+r[\cap I$, [4]
the equation [4] is said to be the Taylor expansion of f at a. $C}^{\infty$functions, which allow a Taylor expansion for each $a\in I$ are thus analytical.
In chapter 8.10 we will access the Taylor formula in a different way. On this occasion we will learn how to test if a $\mathcal{C}}^{\infty$function is analytical.
We return to the quest for local extreme points. With Taylor's formula we are now able to state a first sufficient existence criterion for local extreme points that in many cases successfully tells apart the candidates provided by the necessary criterion [7.9.2]. It is however confined to high quality functions on intervals only. Criteria for weaker functions will follow in the next part.
Proposition (sufficient criterion for $\underset{\phantom{\rule{0ex}{1pt}}\xaf}{{\mathcal{C}}^{n+1}}$functions): For a function $f\in {\mathcal{C}}^{n+1}(I)$ and an interior point a of I with
${f}^{\prime}(a)=\dots ={f}^{(n)}(a)=0\text{\hspace{1em}}\wedge \text{\hspace{1em}}{f}^{(n+1)}(a)\ne 0$

[7.9.17] 
it depends on the kind of n + 1 wheather or not f has an extremum at a:

If n + 1 is odd, f has no local extremum at a.

If n + 1 is even, f has a strict local $\{\begin{array}{l}\text{maximum at}a\text{, if}{f}^{(n+1)}(a)0\hfill \\ \text{minimum at}a\text{, if}{f}^{(n+1)}(a)0\hfill \end{array}$
Proof: Let's say ${f}^{(n+1)}(a)>0$. As $f}^{(n+1)$ is continuous, there is an $\epsilon >0$ such that ${f}^{(n+1)}(x)>0$ for all $x\in {I}_{a,\epsilon}=I\cap ]a\epsilon ,a+\epsilon [$. [7.9.16] now provides an $\tilde{x}$ in between x and a for all those x different from a such that
$f(x)=\sum _{i=0}^{n}\frac{{f}^{(i)}(a)}{i!}{(xa)}^{i}+\frac{{f}^{(n+1)}(\tilde{x})}{(n+1)!}{(xa)}^{n+1}=f(a)+\frac{{f}^{(n+1)}(\tilde{x})}{(n+1)!}{(xa)}^{n+1}$.
As ${f}^{(n+1)}(\tilde{x})>0$ for all $x\in {I}_{a,\epsilon}\backslash \{a\}$, we may argue now as follows:
1. ► If n + 1 is odd the term $(xa)}^{n+1$  and thence $f(x)f(a)=\frac{{f}^{(n+1)}(\tilde{x})}{(n+1)!}{(xa)}^{n+1}$ as well  is less then zero for all x left of a, and greater than zero for all x right of a. As a is an interior point of I both typs of x actually occur in $I}_{a,\epsilon$. Thus f fails to have a local extremum at a.
2. ► Now, if n + 1 is even the term $(xa)}^{n+1$ is greater than zero for all $x\ne a$ and so we have: $f(x)f(a)>0$ for all $x\in {I}_{a,\epsilon},\text{\hspace{0.28em}}x\ne a$, which proves f to have a strict local minimum at a.

Consider:

If f is a $\mathcal{C}}^{2$function [7.9.17] turns into the "classical" criterion
$f}^{\prime}(a)=0\text{\hspace{1em}}\wedge \text{\hspace{1em}}{{f}^{\prime}}^{\prime}(a)\ne 0\text{\hspace{1em}}\Rightarrow \text{\hspace{1em}$f has a local extremum at a.

The proof of [7.9.17] shows that 2. is also valid for boundary points of I. A similar result for 1. however does not hold as is demonstrated by the restriction $\mathrm{X}{\mathbb{R}}^{\ge 0}$.

The reverse of 2. is not true, the criterion thus not necessary. The function $f:\mathbb{R}\to \mathbb{R}$ defined by
$f(x)\u2254\{\begin{array}{l}{e}^{\frac{1}{x}}\text{, if}x0\hfill \\ 0\text{, if}\le 0\hfill \end{array}$
is our counter example in this case. In 9.12 we will prove that f is a $\mathcal{C}}^{\infty$function with all its derivatives vanishing at 0 which is a local minimum point for f.
