I've never really understood the fundamental theorem of calculus. I mean, I passed Calc I, but I just memorized the proof and then forgot it. Since then I've felt alternately that it's either too obvious or too "analysis-y" to prove.
That's why I was happy to discover some intuition for it while designing an activity for my 12th grade physics students. Now I have a proof, or close enough to it, that makes sense to me and fills in the gaps I've always had.
If we want to predict the position at more distant times we can simply iterate the procedure. Take our approximation for $x(0.1)$ and we can predict $x(0.2) = x(0.1)+v(0.1)\times 0.1$. We can take this as far as we want.
And if the approximation isn't good, we can take intermediate steps. Go in steps of $0.01$ seconds instead of $0.1$ so that
$$x(0.1) = x(0)+0.01\times\left(v(0)+v(0.01)+\dots +v(0.99)\right)$$.
You can see how this procedure works to approximate a curve in the image below. You start at $(a,f(a))$, and then use the slope of $f$ at $a$ to move to an approximation for $f(a+\Delta x)$. You then move the slope there to move on to the next step, and so on.
The sum on the right hand side is a Riemann sum with $\Delta t = 0.01$, so in the limit that the step-size goes to zero this becomes $x(a) = x(b) + \int_a^b v(t) dt$. Identifying $v(t)$ with $dx/dt$ shows that this is the fundamental theorem of calculus.
The thing to prove is that this approximation procedure does actually work when we let the step-size go to zero. That part is non-obvious and is the real content of the fundamental theorem of calculus. That it should work has always felt obvious to me in physics and so I've often felt confused as to what is actually there to prove.
$$f_N(b) = f(a)+\Delta x_N\times \sum_{i=0}^{N-1} f'(a+i\times\Delta x_N)$$.
The sum on the right is a Riemann sum, so $\lim_{N\rightarrow \infty} f_N(b) = f(a) + \int_a^b f'(x) dx$.
Since $f'$ gives the best affine approximation at every point, $|f(b) - f(a+(N-1)\Delta x_N)-f'(a+(N-1)\Delta x_N)\Delta x_N| \in o( \Delta x_N)$. Likewise for $|f(a+(N-1)\Delta x_N)-f(a+(N-2)\Delta x_N) - S(a+(N-2)\Delta x_N)\Delta x_N|$ and so on. We successively approximate $f$ at each intermediate step by the affine approximation for $f$ at the previous step. Each time we do so we pick up an error term that is in $o(\Delta x_N)$. Hence,
$$|f(b)-f_N(b)| \in o(\Delta x_N)$$.
The only catch is that since we pick up $N$ error terms we can say that there is an $m$ such that $|f(b)-f_N(b)| \le m N\Delta x_N^2$.
Since $N\Delta x_N^2 = (b-a)^2/N$ in the limit that $N\rightarrow \infty$ we have $\lim_{N\rightarrow \infty} |f(b)-f_N(b)|= 0$, and hence $f_\infty(b) = f(b) = f(a) +\int_a^b f'(x) dx$, QED
That's why I was happy to discover some intuition for it while designing an activity for my 12th grade physics students. Now I have a proof, or close enough to it, that makes sense to me and fills in the gaps I've always had.
The idea
The basic idea is using successive linear approximation to predict future values of a function. In terms of physics, if you know the position of an object and its velocity you can approximate its position in $0.1$ seconds via $x(0.1) = x(0)+v(0) \times 0.1$. This is only an approximation for $x(0.1)$, but if $0.1$ is small compared to the rate at which the velocity is changing then it can be a pretty good one.If we want to predict the position at more distant times we can simply iterate the procedure. Take our approximation for $x(0.1)$ and we can predict $x(0.2) = x(0.1)+v(0.1)\times 0.1$. We can take this as far as we want.
And if the approximation isn't good, we can take intermediate steps. Go in steps of $0.01$ seconds instead of $0.1$ so that
$$x(0.1) = x(0)+0.01\times\left(v(0)+v(0.01)+\dots +v(0.99)\right)$$.
You can see how this procedure works to approximate a curve in the image below. You start at $(a,f(a))$, and then use the slope of $f$ at $a$ to move to an approximation for $f(a+\Delta x)$. You then move the slope there to move on to the next step, and so on.
|  | 
| An illustration of the step-forward technique for approximating $f(b)$ from $f(a)$ with two steps, five steps, and twenty steps. The approximation gets better as the number of steps increases. | 
The sum on the right hand side is a Riemann sum with $\Delta t = 0.01$, so in the limit that the step-size goes to zero this becomes $x(a) = x(b) + \int_a^b v(t) dt$. Identifying $v(t)$ with $dx/dt$ shows that this is the fundamental theorem of calculus.
The thing to prove is that this approximation procedure does actually work when we let the step-size go to zero. That part is non-obvious and is the real content of the fundamental theorem of calculus. That it should work has always felt obvious to me in physics and so I've often felt confused as to what is actually there to prove.
The lemma
The proof I came up with relies on the fact that the derivative of $f$ at a point is the slope of the best affine approximation for $f$ around that point. The process I described uses successive affine approximations of $f$ at intermediate points. This process only converges if we use the best such approximation at each point.
To be formal, by "best affine approximation" I mean the $A,B$ such that in neighborhoods around $x_0$, $f(x) = A+B(x-x_0)+R(x-x_0)$ where $R(x-x_0)\in o(x-a)$. The little $o$ notation means that $R(x-x_0)/(x-x_0) \rightarrow 0$ as $x\rightarrow x_0$. (There is a slightly more formal definition, but this is good enough.)
This Math StackExchange answer proves that the best affine approximation has $A=f(x_0)$ and $B = f'(x_0)$. Thanks for doing my work for me. It also proves that $f$ has a best affine approximation at every point in an interval as long as it is differentiable on that interval.
The proof
Consider a function $f$ that is differentiable on the interval $[a,b]$. Define $\Delta x_N = (b-a)/N$ and$$f_N(b) = f(a)+\Delta x_N\times \sum_{i=0}^{N-1} f'(a+i\times\Delta x_N)$$.
The sum on the right is a Riemann sum, so $\lim_{N\rightarrow \infty} f_N(b) = f(a) + \int_a^b f'(x) dx$.
Since $f'$ gives the best affine approximation at every point, $|f(b) - f(a+(N-1)\Delta x_N)-f'(a+(N-1)\Delta x_N)\Delta x_N| \in o( \Delta x_N)$. Likewise for $|f(a+(N-1)\Delta x_N)-f(a+(N-2)\Delta x_N) - S(a+(N-2)\Delta x_N)\Delta x_N|$ and so on. We successively approximate $f$ at each intermediate step by the affine approximation for $f$ at the previous step. Each time we do so we pick up an error term that is in $o(\Delta x_N)$. Hence,
$$|f(b)-f_N(b)| \in o(\Delta x_N)$$.
The only catch is that since we pick up $N$ error terms we can say that there is an $m$ such that $|f(b)-f_N(b)| \le m N\Delta x_N^2$.
Since $N\Delta x_N^2 = (b-a)^2/N$ in the limit that $N\rightarrow \infty$ we have $\lim_{N\rightarrow \infty} |f(b)-f_N(b)|= 0$, and hence $f_\infty(b) = f(b) = f(a) +\int_a^b f'(x) dx$, QED
Alternative proof
I originally came up with a slightly different proof using Taylor series. This is less mathematically elegant and limited to analytic functions, but it's slightly more intuitive for me.
Consider that
$$f(b) = f(b-\Delta x_N) + \sum_{m=1}^\infty \frac{\Delta x_N^m}{m!}f^{(m)}(b-\Delta x_N) \\
= f(b-2\Delta x_N) + \sum_{m=1}^\infty \frac{\Delta x_N^m}{m!}(f^{(m)}(b-\Delta x_N)+f^{(m)}(b-2\Delta x_N))\\
= f(a) + \sum_{m=1}^\infty \frac{\Delta x_N^m}{m!}\sum_{i=0}^{N-1} f^{(m)}(a+i\Delta x_N)$$
In the limit that $N\rightarrow \infty$ the $m=1$ term in the sum becomes the integral from $a$ to $b$ of $f'$. The other terms in the sum vanish as long as all the other derivatives of $f$ are bounded on $[a,b]$. When they are bounded $|f(a+i\Delta x_N)| < M$ for all $i$, and so $|\sum_{i=0}^{N-1} f^{(m)}(a+i\Delta x_N)| < M\times N$ for all $m > 1$. But then $\Delta x_N^m \times M \times N = M(b-a)^m/N^{m-1}$, which vanishes as $N\rightarrow \infty$, and hence all the sums vanish for $m \ge 2$.
This proof is slightly different because when we use the full Taylor series the calculation of $f(b)$ is exact at every $N$, but for every finite $N$ there non-zero higher-derivative corrections. In the limit that $N\rightarrow \infty$ the calculation stays exact for $f(b)$ but the higher-derivative corrections vanish and only the integral term is necessary.
