%% LyX 1.4.1 created this file.  For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass[a4paper,english]{amsbook}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\setcounter{tocdepth}{2}
\setlength\parskip{\medskipamount}
\setlength\parindent{0pt}
\usepackage{graphicx}
\usepackage{amssymb}
\IfFileExists{url.sty}{\usepackage{url}}
                      {\newcommand{\url}{\texttt}}

\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
 \theoremstyle{plain}
\newtheorem{thm}{Theorem}[section]
  \theoremstyle{definition}
  \newtheorem{defn}[thm]{Definition}
  \theoremstyle{remark}
  \newtheorem{rem}[thm]{Remark}
 \theoremstyle{definition}
  \newtheorem{example}[thm]{Example}
  \theoremstyle{remark}
  \newtheorem{notation}[thm]{Notation}
  \theoremstyle{plain}
  \newtheorem{lem}[thm]{Lemma}
  \theoremstyle{plain}
  \newtheorem{cor}[thm]{Corollary}
  \theoremstyle{plain}
  \newtheorem{algorithm}[thm]{Algorithm}
\newenvironment{lyxcode}
{\begin{list}{}{
\setlength{\rightmargin}{\leftmargin}
\setlength{\listparindent}{0pt}% needed for AMS classes
\raggedright
\setlength{\itemsep}{0pt}
\setlength{\parsep}{0pt}
\normalfont\ttfamily}%
 \item[]}
{\end{list}}
  \theoremstyle{definition}
  \newtheorem{condition}[thm]{Condition}
  \theoremstyle{remark}
  \newtheorem{claim}[thm]{Claim}
  \theoremstyle{plain}
  \newtheorem{conjecture}[thm]{Conjecture}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\hyphenation{satis-fying}
\sloppypar
\usepackage{fancyhdr}
\chead[]{blabla}
\pagestyle{fancy}

\usepackage{babel}
\makeatother
\begin{document}
\renewcommand\maketitle{
\parindent=0pt
\thispagestyle{empty}
\begin{center}
\renewcommand{\arraystretch}{1.8}
{
\Huge \sc 
Proof interpretations and applications \\
}




\vspace{0.1cm}
\vfill
\fbox{\rule[-4mm]{0cm}{1cm}
\newline{}
\begin{tabular}{lp{10.0cm}}
Document type & \Large Diploma thesis \\
Author & \Large Holger Blasum \\
Supervisor & \Large Prof. Dr. Helmut Schwichtenberg \\
Filing date & \Large 02 October 2006 \\
Department & \Large Mathematisches Institut \\
University & \Large Ludwig-Maximilians-Universit\"{a}t M\"{u}nchen \\
Summary & \Large Two different proof interpretations of intuitionistic logics (mr, Dialectica) are given and different systems are introduced to represent real numbers and functions. These notions are explored by discussing the portability from interval analysis to modulus analysis of the components of a heuristic algorithm (Moore's $k$-th order method) for approximating ordinary differential equations. It is confirmed that one-step methods for initial value problems are mr-realizable and concluded that the representability of Lipschitz continuity is sufficient for a change of representation that is lossless in the exponential term. \\
Electronic version & \url{http://www.blasum.net/holger/wri/math/an/constructive/}
\end{tabular}}


\end{center}

\newpage}


\title{Proof interpretations and applications}


\author{Holger Blasum}


\keywords{proof interpretations, mr-realizabilility, interval analysis, modulus
analysis, ordinary differential equations}

\maketitle
\tableofcontents{}

\fancyhead{}
\fancyfoot{}
\fancyhead[EC]{\footnotesize \rightmark}
\fancyhead[OC]{\footnotesize \rightmark}
\fancyhead[EL]{\footnotesize \thepage}
\fancyhead[OR]{\footnotesize \thepage}
\renewcommand{\headrulewidth}{0pt}
%\chead{\sc Hah \chaptername\ \thechapter.\ }


Introductory note: section \ref{sec:Notions-of-constructivity} is
a recapitulation about goals and methods of constructive mathematics,
section \ref{sec:Proof-interpretations} explains proof interpretations.
Then, within the modified realizability proof interpretation, we juxtapose
the constructions of interval analysis and modulus analysis in section
\ref{sec:Interval-and-modulus}. This will be the basis to discuss
the portability of a numerical interval analysis algorithm to modulus
analysis in section \ref{sec:Some-MR-realizable-numerical} (still
wholly contained in modified realizability). It is only in section
\ref{sec:Some-Dialectica-realizable-numerical} that we will break
out of the cage of modified realizability. The supplementary material
(section \ref{sec:Supplementary-Material}) serves for additional
illustration (but has been moved out the main body of the text).

The aim of the work is to view some applied mathematics from a logical
perspective, if one comes from the logical side, sections \ref{sec:Notions-of-constructivity},
\ref{sec:Proof-interpretations}, \ref{sec:Some-Dialectica-realizable-numerical},
\ref{sec:Interval-and-modulus}, \ref{sec:Some-MR-realizable-numerical}
might be a good reading plan, if one comes from the applied side,
it also should be possible to start out right from \ref{sec:Interval-and-modulus},
\ref{sec:Some-MR-realizable-numerical}.

It is impossible to hide that the layout of this work has been attempted
rather broad, as a result some sections are sketchy and lack the desired
rigor, this is not meant as an offense to the reader (with hindsight
a tighter exposition on a more limited scope would have been better).
I have tried to work at least some simple things out, in other cases
I have tried at least to give references to where others have done
it out much more neatly.


\section{Notions of constructivity\label{sec:Notions-of-constructivity}}

\begin{quote}
At all events, improving the internal logic of the Rules of Procedure
is no easy task, given that, as pointed out by the Legal Service in
a note drawn up in December 2000, there is no single binding logic
and several different categories of logics may give rise to widely
differing results. \cite{Committee2002}
\end{quote}
There are not only different categories of logics, but also many notions
of mathematical logics and constructivity, for a detailed overview
see e.g. \cite{Troelstra1988constructivism}. In this introductory
section we mainly clarify some notions for further use. The emphasis
on witnesses and predicativity are common to most schools of intuitionism,
however different sets of the admissible axioms are allowed.


\subsection{Witnesses for disjunction\label{sub:Witnesses-for-disjunction}}

The set $a=\{\frac{e}{\pi},\frac{\pi^{2}}{e}\}$ contains an irrational
number. Proof: $\frac{e}{\pi}\cdot\frac{\pi^{2}}{e}=\pi$. The product
of two rationals is a rational, so by irrationality of $\pi$ at least
one element of $a$ exists that is irrational. 

Unfortunately, that excluded middle kind of proof does not allow to
specify an element of $a$ that is irrational, a \emph{witness} for
the statement. A proof is constructive if for every disjunction and
existential statement a witness is provided. Otherwise, it is to be
considered non-constructive. (In a more satisfying way, for the specific
example, with much more effort it can be shown that both elements
of $a$ must be irrational \cite{Nesterenko1996modular}.) 


\subsection{Impredicativity and types\label{sub:Impredicativity-and-types}}

An impredicative definition of an object depends on a set the object
itself is member of. Types are an instrument to harness impredicativity
(e.g. to avoid the famous Russell paradox in set theory). 


\subsubsection{General house-holding}

By a type system I understand a \emph{stamping of objects according
to which operations are allowed on them.} Given that conventionally
one considers a finite set of operators on an infinite set of operands,
for each operation it seems better to encode such admissibility information
into the operands than into the operators (the other alternative).
We give some instances where type systems have been used in mathematics/computer
science.

\begin{itemize}
\item Russell's and Whitehead's Principia use types to avoid $\{ x|x\notin x\}$(Russell's
paradox).
\item Church introduced typed lambda calculus in order to avoid non-terminating
expressions like $(\lambda x.xx)(\lambda x.xx)$.
\item Imperative programming languages such as FORTRAN use types for the
different storage requirements of variables (eg \emph{complex} as
opposed to \emph{float}).
\item Theorem proof verifiers such as Automath have used types for house-holding
(eg to check types in lambda-calculus). Especially, normalized proof
terms will yield terms with the type of the formula to be proved.
This representation of proofs for formulas has been used in many theorem
provers, the earliest probably being Automath \cite{deBrujin1968themathematical,Wiedijk2002}.
While terms are hand-coded in Automath (which simply provides the
proof-checking machinery), more recent provers such as Coq and minlog
offer an interface that allows the user to code a proof tree by choosing
{}``tactics'' which can be either instantiations of single rules
of eg natural deduction or longer sequences of such single instantiations.
The idea is to keep the type-checking kernel of the prover small and
independent from the user interface. When it comes to program extraction,
minlog offers the additional functionality of delivering soundness
proofs for the extracted terms.
\end{itemize}

\subsubsection{Types and computability / effectivity}

Schwichtenberg \cite[p. 3]{Schwichtenberg2005constructive} suggests
obtaining witnesses of low type level, e.g. that a continuous function
can be determined by its values of rationals giving a type-one object
(rather than a type-two real-to-real function), that the Cauchy-Euler
approximation can be seen as a type-one process (as opposed to type-four
as in \cite[p. 9]{Longley}). In the same way, we want to define a
derivative not as a type-two object, but as a type-one object.


\subsection{Systems and axioms for constructive mathematics}

Generally, among vario us intuitionistic approaches, there is awareness
that excluded middle (section \ref{sub:Witnesses-for-disjunction})
and impredicativity (section \ref{sub:Impredicativity-and-types})
should be avoided or be limited to some extent. However, for different
means this is done to different extents. In this section, we will
superficially present several systems and approaches, exact definitions
will follow in the next chapter.


\subsubsection{Intuitionistic logic and arithmetic}


\paragraph{Logic}

Intuitionistic propositional logic is a classical propositional logic
without excluded middle, in natural deduction systems the difference
is just the omission of $\neg\neg A\rightarrow A$. There are no differences
concerning to the predicate rules. Care will however to be taken to
choose the right degree of predicativity in equational definitions.
Following \cite{Kohlenbach2006proof,Troelsta1973metamathematical},
we will denote the extensionality properties by prefix, e.g. WE-IL
for weakly extensionable intuitionistic logic.


\paragraph{Arithmetic}

Classical Peano arithmetic (PA) is obtained by adding the Peano axioms
to a system of classical logic, likewise (intuitionistic) Heyting
arithmetic (HA) is obtained by adding these.


\subsubsection{Constructivity fine-tuning by slightly non-constructive axioms}

We can add different degrees of non-constructivity to arithmetical
systems, such as Markov's principle (M), independence of premise (IP),
axiom of choice (AC).


\subsection{Practical constructive analysis}

Taking a standard textbook like \cite[p. 9]{Rudin1964principles},
impredicativity is built in straight into the definition of real numbers,
once it is proved that every set partition of the entire set of reals
either has an infimum in the upper part or a supremum in the lower
part (SUP). (In Rudin's book this is proved, because he follows Dedekind's
introduction of reals as cuts, alternatively a purely axiomatic treatment
just will add this to the list of axioms.)

Other theorems that are usually proved non-constructively:

\begin{itemize}
\item fundamental theorem of algebra (FTA);
\item intermediate value theorem (IMV);
\item Peano existence theorem for ODEs;
\item Hahn-Banach theorem
\end{itemize}

\subsubsection{Extraction of algorithms and error bounds\label{extr-error-bounds}}

Often non-constructive notions in classical mathematics can be replaced
by constructive versions (FTA) or more restrictions to less general
cases (IMV). Because e.g. IMV and SUP are fundamental for analysis,
making analysis constructive has made necessary a reformulation of
many concepts of analysis, which is however viable to a large extent
\cite{Bishop}. In addition to the foundational interest, finding
witnesses can have very practical motivations, e.g. in case of the
FTA a witness-based proof gives a numerical approximation to roots
of complex polynomials.


\subsubsection{Extraction of complexity bounds}

Instead of rewriting the proofs to make them constructive, alternatively
take the classical proofs as they are, and still extract some computational
content (eg error bounds instead of an algorithm) out of it. This
aspect ({}``proof mining'') of constructive mathematics has been
followed by e.g. Kreisel and Kohlenbach \cite{Kohlenbach2006proof}.


\subsection{Further outline of the work}

We will now concretize what has been very loosely said in this section
and introduce both the modified realizability and Dialectica interpretation
(section \ref{sec:Proof-interpretations}). In section \ref{sec:Some-MR-realizable-numerical}
we will discuss the modified realizability proof interpretation in
the light of a numerical application from interval analysis.


\section{Proof interpretations\label{sec:Proof-interpretations}}

\begin{quote}
\textit{\emph{\char`\"{}Realisieren\char`\"{} war ein beliebter Ausdruck
bei den Spielern, und als Weg vom Werden zum Sein, vom Möglichen zum
Wirklichen empfanden sie ihr Tun}}\emph{.} ({}``Realizing'' was
a popular expression among the players, and they considered their
effort to be a path from becoming to being, from the possible to the
actual. \emph{\cite[p. 35]{Hesse2001glasperlenspiel}})
\end{quote}
\begin{defn}
Let $T_{0}$, $T_{1}$ be logical systems, let $C_{0}$ be a formula
in $T_{0}$, $C_{1}$ be a formula in $T_{1}$. A \emph{realizability
interpretation} is defined as two mappings \cite{Oliva2005unifying}:
\end{defn}
\begin{itemize}
\item a \emph{formula interpretation} that maps formulas of $T_{0}$ to
terms of $T_{1}$, written as $C_{1}\ r\ C_{0}$ ($r$ to be read
as {}``realizes''),
\item and a \emph{soundness proof} that constructs a $T_{1}$-term $[\![M_{0}]\!]$
from a derivation $M_{0}$ of $C_{0}$ and shows that $[\![M_{0}]\!]\ r\ C_{0}$.
\end{itemize}
\begin{rem}
$T_{0}$, $T_{1}$ may be the same system (if it is powerful enough)
or different systems (language, axioms, rules of $T_{0}$, $T_{1}$
each may, but not need to be, subsets of each other). The soundness
proof may be executed in $T_{1}$ if $T_{1}$ is powerful enough but
also may be done in some metasystem.

Formulas: In the special case of mr-realizability for IL, $T_{1}$
must contain expressions for higher-order terms ($T_{0}$ does not
need to be higher-order), $T_{1}$ does not need to have existential
quantification. In the special case of Dialectica, again, $T_{0}$
does not need to be higher-order, and $T_{1}$ must contain expressions
for higher-order terms and its formulas are in the form $\exists x\forall yA_{qf}$,
where $A_{qf}$ is quantifier-free.
\end{rem}

\subsection{Systems}

\begin{rem}
We will define types, terms, logic, arithmetic, and some axioms.
\end{rem}

\subsubsection{Types}

\begin{defn}
The set of types $\mathcal{T}$ is defined inductively: let $U,T,B$
denote constants of ground type. Let $\kappa,\rho,\sigma,\tau,\kappa_{0},\rho_{0},\sigma_{0},\tau_{0},\kappa_{1},\rho_{1},\sigma_{1},\tau_{1},\ldots$
denote metavariables for any type in $\mathcal{T}$. If $\sigma,\tau\in\mathcal{T}$,
then the arrow type $\sigma\rightarrow\tau\in\mathcal{T}$ and the
pair type $\sigma\times\tau\in\mathcal{T}$. Types are indicated by
superscripts: {}``$t^{\tau}$'' reads {}``object $t$ is of type
$\tau$''. Conversely, let $\tau(t^{\tau}):=\tau$ where $\tau()$
is a mapping from a term to its type (not to be confused with the
$\tau$ without brackets which is just one of the types).
\end{defn}

\subsubsection{Lambda terms }

\begin{defn}
The language $\Lambda$ of \textit{lambda terms} is defined inductively:
let the set $X$ consist of countably infinitely many object variables
$a^{\alpha},p^{\pi},x^{\rho},a_{0}^{\alpha_{0}},p_{0}^{\pi_{0}},x_{0}^{\rho_{0}},a_{1}^{\alpha_{1}},p_{1}^{\pi_{1}},x_{1}^{\rho_{1}},\ldots$
where the superscripted $\alpha,\pi,\rho,\alpha_{0},\pi_{0},\rho_{0},\alpha_{1},\pi_{1},\rho_{1},\ldots$
denote types (the types do not need to be mutually different). Let
the set $K$ consist of countably many object constants $c^{\kappa},c_{0}^{\kappa_{0}},c_{1}^{\kappa_{1}},\ldots$
where the superscripted $\kappa,\kappa_{0},\kappa_{1},\ldots$ denote
types (again, the types do not need to be mutually different). Each
element of $X$ is in $\Lambda$. Let $s,t,u,s_{0},t_{0},s_{1},t_{1},\ldots$
denote terms. Terms may be annotated by types. If $x^{\rho}\in X$
and $t^{\tau}\in\Lambda$, then $(\lambda x^{\rho}.t^{\tau})^{\rho\rightarrow\tau}\in\Lambda$,
where $x$ is called a bound variable. Terms that differ only in the
names of the bound variables are identified. If $s^{\tau\rightarrow\sigma}\in\Lambda$
and $t^{\tau}\in\Lambda$, then $(st)^{\sigma}\in\Lambda.$ If $s^{\sigma}\in\Lambda$
and $t^{\tau}\in\Lambda$, then $\langle s,t\rangle^{\sigma\times\tau}\in\Lambda$.
If $s^{\sigma\times\tau}\in\Lambda$, then $(s0)^{\sigma}\in\Lambda$
and $(s1)^{\tau}\in\Lambda$. The function $type:\ \Lambda\rightarrow\mathcal{T}$
extracts the type of a lambda term. The \emph{substitution} of $x$
by $s$ on a lambda term $t$ (in short: $t[x:=s]$) is defined inductively:
base cases: $x[x:=s]:=s$, $x_{0}[x:=s]:=x_{0}$ if $x\ne x_{0}$,
step cases: $(t_{0}t_{1})[x:=s]:=t_{0}[x:=s]t_{1}[x:=s]$, $(\lambda x_{0}.t)[x:=s]:=\lambda x_{0}.(t[x:=s])$
provided (1) $x_{0}\ne x$ and (2) $x_{0}\notin FV(s)$, $(\lambda x.t)[x:=s]:=\lambda x.t$.
The provisions (1) and (2) for the bound variable $x_{0}$ can always
be satisfied by appropriate renaming (eg by de Bruijn's naming convention
\cite{deBrujin1968themathematical}). Define $\beta$-reduction to
be the axiom $(\lambda x.s)t=s[x:=t].$
\end{defn}

\subsubsection{Intuitionistic logic IL }

\begin{defn}
The symbols of IL are the object variables $x,x_{0},x_{1},\ldots$,
the propositional variables $A,B,A_{0},B_{0},A_{1},B_{1},\ldots$,
the propositional constants $\bot$, $\top$ and the logical connectives
$\wedge,\rightarrow,\forall,\exists$, and the equality symbol $=$.
Terms of IL are variables and applications of a variable to another
variable. Formulas of IL are $(x_{0}=x_{1}),\top,\bot$ and if $A$,
$B$ are formulas and $x$ is an object variable then $(A\wedge B)$,
$(A\rightarrow B)$, $\forall xA$, $\exists xA$ are formulas. Axioms
of IL: $\top$ is axiom. For existence, add $\exists^{+}:\forall x^{\rho}(A\rightarrow\exists x^{\rho})$,
$\exists^{-}:\exists x^{\rho}A\rightarrow\forall x^{\rho}(A\rightarrow B)\rightarrow B$.
For equality, add $=_{\rho}^{refl}:\forall x_{\rho}:x=_{\rho}x$,
$=_{\rho}^{symm}:\forall x_{\rho},y_{\rho}:x_{\rho}=_{\rho}y_{\rho}\rightarrow y_{\rho}=_{\rho}x_{\rho}$,
$=_{\rho}^{trans}:\forall x_{\rho},y_{\rho},z_{\rho}:x_{\rho}=_{\rho}y_{\rho}\rightarrow y_{\rho}=_{\rho}z_{\rho}\rightarrow x_{\rho}=_{\rho}z_{\rho}$.
\end{defn}

\paragraph{Derivations in $IL$}

\begin{rem}
In natural deduction \cite{Prawitz1965natural}, derivations are usually
represented as trees built from (1) top-formulas (no rule is applied
above it, a top-formula can be an assumption or an axiom), and application
of instance inference rules (2). Hence the (representation-independent)
definition of a derivation is inductive with (1) as base cases, (2)
as step cases.
\end{rem}
\begin{defn}
Inductively, let us define a \emph{derivation}: Let $FA(M)$ denote
the set of the free assumption variables. Let $u_{A}$ be the assumption
variable representing an assumption having the formula $A$, let $c_{A}$
be the term representing an axiom having the formula $A$.

Base definitions (axioms, assumptions): 

If we have an assumption variable $A$, then $M\colon A:=u_{A}$,
$FA(M\colon A):=\{ u_{A}\}$. 

If we have an axiom $A$, then $M\colon A:=c_{A}$, $FA(M\colon A):=\emptyset$.

Step definitions (rules):

(The step cases are the rules of natural deduction.)

$\wedge I_{IL}:=\frac{A\quad B}{A\wedge B}$. Let $M_{0}$ be a derivation
of $A$, let $M_{1}$ be a derivation of $B$. Then $M\colon A\wedge B:=\langle M_{0}\colon A,M_{1}\colon B)\rangle$,
$FA(M\colon A\wedge B):=FA(M_{0}\colon A)\cup FA(M_{1}\colon B)$.

$\wedge E_{IL}^{0}:=\frac{A\wedge B}{A}$. Let $M_{0}$ be a derivation
of $A\wedge B$. Then $M\colon A:=(M_{0}\colon A\wedge B)0$, $FA(M\colon A):=FA(M_{0}\colon A\wedge B))$.

$\wedge E_{IL}^{1}:=\frac{A\wedge B}{B}$. Let $M_{0}$ be a derivation
of $A\wedge B$. Then $M(B):=(M_{0}\colon A\wedge B)1$, $FA(M\colon B):=FA(M_{0}\colon A\wedge B))$.

$\rightarrow\! I_{IL}:=\frac{B}{A\rightarrow B}$ where $A$ is a
free assumptionwith assumption variable $u_{A}$ {[}that may occur
in the derivation of $B$]. Let $M_{0}$be a derivation of $B$. Then
$M\colon A\rightarrow B:=\lambda u_{A}.(M_{0}\colon B)$ where $u_{A}\in FA(M_{0}\colon B)$,
$FA(M\colon A\rightarrow B):=FA(M_{0}\colon B)-\{ u_{A}\}$.

$\rightarrow\! E_{IL}:=\frac{A\quad A\rightarrow B}{B}$. Let $M_{0}$
be a derivation of $A$, let $M_{1}$ be derivation of $A\rightarrow B$.
Then $M\colon B:=(M_{0}\colon A)(M_{1}\colon A\rightarrow B)$, $FA(M\colon B):=FA(M_{0}\colon A)\cup FA(M_{1}\colon A\rightarrow B)$. 

$\forall I_{IL}:=\frac{A}{\forall x^{\rho}A}$ satisfying $x^{\rho}\notin FA(M_{0}\colon A)$.
Let $M_{0}$ be derivation of $A$. Then $M\colon\forall x^{\rho}A:=\lambda x^{\rho}.M_{0}\colon A$,
$FA(M\colon\forall x^{\rho}A):=FA(M_{0}\colon A)$.

$\forall E_{IL}:=\frac{\forall x^{\rho}A}{A(t)}$. Let $M_{0}$ be
derivation of $\forall x^{\rho}A$. Then $M\colon A:=(M_{0}\colon\forall x^{\rho}A)t$,
$FA(M\colon A(t)):=FA(M_{0}\colon\forall x^{\rho}A).$
\end{defn}

\paragraph{Assigning programs to derivations \label{par:Assigning-programs-to}}

\begin{defn}
Let $\mathcal{U}^{U}$ be a term constant of type $U$. \label{def:u}To
each derivation $M$ of $A$ inductively assign a term from $\Lambda\cup\{ U^{U}\}$,
called \emph{program extracted from $M$}, written as $[\![M]\!]$
of $\tau(A)$.

Base cases: 

P0 For each assumption variable term $u_{A}$ assign the variable
$[\![u_{A}]\!]:=p_{A}^{\tau(A)}$. The subscript $A$ remembers the
uniqueness of the assignment from $u_{A}$ to $p_{A}^{\tau(A)}$. 

P1 For any axiom term $c_{A}$ assign $[\![c_{A}]\!]:=\mathcal{U}^{U}$. 

Step cases: 

P2 $[\![\langle M_{0},M_{1}\rangle]\!]:=\langle[\![M_{0}]\!],[\![M_{1}]\!]\rangle$.

P3 $[\![(M)0]\!]:=([\![M]\!])0$.

P4 $[\![(M)1]\!]:=([\![M]\!])1$.

P5 $[\![\lambda xM]\!]:=\lambda x.[\![M]\!]$.

P6 $[\![MN]\!]:=[\![M]\!][\![N]\!]$.
\end{defn}

\subsubsection{Heyting arithmetic}

\begin{defn}
We obtain Heyting arithmetic HA from IL by adding to the symbols of
language of IL the symbols $0,S$, for the axioms the Peano axioms
$Sx\ne0$, $Sx=Sy\rightarrow x=y$, $A0\wedge\forall x(Ax\rightarrow A(Sx))\rightarrow\forall xA(x)$,
and its (non-equality) deduction rules.

For equality, for objects $x,y,z$ of ground type, for objects of
the ground type, we add the axioms $=_{0}^{refl}:\forall x_{0}:x=_{0}x$,
$=_{0}^{symm}:\forall x_{0},y_{0}:x_{0}=_{0}y_{0}\rightarrow y_{0}=_{0}x_{0}$,
$=_{0}^{trans}:\forall x_{0},y_{0},z_{0}:x_{0}=_{0}y_{0}\rightarrow y_{0}=_{0}z_{0}\rightarrow x_{0}=_{0}z_{0}$.
Add $s^{\sigma}=_{\sigma}t^{\sigma}\rightarrow(rs)^{\tau}=_{\tau}(rt)^{\tau}$
to obtain E-HA, and add a restricted axiom to obtain WE-HA.
\end{defn}

\subsubsection{Some axioms}

\begin{defn}
We define the following axioms:
\end{defn}
\begin{itemize}
\item Markov principle (M): $M^{\omega}:$$(\forall yA_{0}\rightarrow B_{0})\rightarrow\exists y(A_{0}\rightarrow B_{0})$,
$y\notin FV(B_{0})$. 
\item Independence of premise ($IP_{\exists-free}^{\omega}$): $(A\rightarrow\exists x^{\rho}B)\rightarrow\exists x^{\rho}(A\rightarrow B(x))$
for $A$ $\exists$-free and $x\notin FV(A)$.
\item Axiom of choice (AC): $\forall x^{\rho}\exists y^{\sigma}A(x,y)\rightarrow\exists f^{\rho\rightarrow\sigma}\forall x^{\rho}A(x,f(x))$
for any $\rho,\sigma$.
\item $Ax_{\forall}$ denotes universally quantified axioms without constructive
content.
\end{itemize}
\begin{rem}
Some of these axioms are \emph{not} universally accepted by all constructive
mathematics, and where needed we will explicitly say this.
\end{rem}

\subsection{mr-Realizability\label{sub:Definition:-realizability-interpretation} }

\begin{defn}
Let $T_{0}$ be IL, We define the \emph{type of a $T_{0}$-formula}
to be for $\top$, $B$ for $\bot$, $U$ for equations, $\tau(A\wedge B):=\tau(A)\times\tau(B)$,
$\tau(A\rightarrow B):=\tau(A)\rightarrow\tau(B)$, $\tau(\forall x^{\rho}A):=\rho\rightarrow\tau(A)$,
$\tau(\exists x^{\rho}A):=\rho\times\tau(A)$.
\end{defn}

\subsubsection{The typed system $T_{1}$}

We now define $T_{1}$:


\paragraph{Terms}

\begin{defn}
$T_{1}:=\Lambda\cup\{\mathcal{U}^{U}\}\cup IL_{\exists\mbox{-}free}$
where $\mathcal{U}^{U}$ is the term constant (of \ref{def:u}).
\end{defn}

\paragraph{Formulas}

\begin{defn}
The ground formulas of $T_{1}$ are the formulas built from the atoms
$\top^{T}$,$\bot^{B}$, equations between terms of equal type ($t^{\tau}=_{\tau}s^{\tau}$). 

Composite formulas of $T_{1}$ are formulas built from the ground
formulas by (repeated) application of the propositional connectives
$\wedge,\rightarrow$ and the quantifiers $\forall$.
\end{defn}

\paragraph{Axioms, rules}

\begin{defn}
An axiom of $T_{1}$ is $\top^{U}$. Rules of $T_{1}$ are the rules
of intuitionistic propositional calculus. For equality, in the same
way as with $T_{0}$, add $=_{\rho}^{refl}:\forall x_{\rho}:x=_{\rho}x$,
$=_{\rho}^{symm}:\forall x_{\rho},y_{\rho}:x_{\rho}=_{\rho}y_{\rho}\rightarrow y_{\rho}=_{\rho}x_{\rho}$,
$=_{\rho}^{trans}:\forall x_{\rho},y_{\rho},z_{\rho}:x_{\rho}=_{\rho}y_{\rho}\rightarrow y_{\rho}=_{\rho}z_{\rho}\rightarrow x_{\rho}=_{\rho}z_{\rho}$.
\end{defn}

\subsubsection{$mr$-Realizability \label{sub:mr-Realizability}}

\begin{defn}
\emph{$mr$-realizability} is a partial recursive function: ($T_{1}$-term$\times$
$T_{0}$-formula)$\rightarrow T_{1}$-formula.

Inductive definition on the structure of $A$.

Base cases: R0: If $A$ is an equality between terms of identical
types, then $(t^{U}\ mr\ (t_{0}^{\tau}=t_{1}^{\tau})):=(t_{0}^{\tau}=t_{1}^{\tau})$.
If $A$ is $\top$ then $(t^{U}\ mr\ \top):=\top$. If $A$ is prime
formula of type $P$ then $(t^{P}\ mr\ A):=A$.

Step cases: Assume $A^{\sigma},B^{\tau}$; $s,t\in T_{1}$.

R1: $t^{\sigma\times\tau}\ mr\ A\wedge B:=(t0\ mr\ A)\wedge(t1\ mr\ B)$ 

R2: $t^{\sigma\rightarrow\tau}\ mr\ A\rightarrow B:=\forall s^{\sigma}((s\ mr\ A)\rightarrow(ts\ mr\ B))$

R3: $t^{\rho\rightarrow\tau}\ mr\ \forall xA:=\forall x^{\rho}(tx\ mr\ A)$

R4: $t^{\rho\times\tau}\ mr\ \exists xA:=t1\ mr\ A[x^{\rho}:=t0]$
\end{defn}
\begin{example}
$t^{\rho\rightarrow(\rho\times U)}\ mr\ \forall x_{0}^{\rho}\exists x_{1}^{\rho}:x_{0}=x_{1}$
(recursively) entails by R3 $\forall x_{0}^{\rho}((t^{\rho\rightarrow(\rho\times U)}x_{0})^{\rho\times U}\ mr\ \exists x_{1}:x_{0}=x_{1})\ $(recursively)
entails by R4 $(\forall x_{0}^{\rho}((tx_{0})^{\rho\times U}1)^{U}\ mr\ x_{0}=(tx_{0})0$
(recursively) entails by R0 $x_{0}=(tx_{0})0$.
\end{example}
\begin{rem}
The complete unfolding of the recursive definition of $mr$ as illustrated
by the example, shows that the 2-ary predicate $mr$, written in infix
notation, is part of the metalanguage.\label{rem:mr-in-metalang}
\end{rem}

\subsubsection{Soundness}

\begin{thm}
For every $T_{0}$-derivation $M$ of a formula $C$ with $FA(M)=\{ u_{0}^{A_{0}},u_{1}^{A_{1}},\ldots,u_{n-1}^{A_{n-1}}\}$
there is 
\end{thm}
\begin{itemize}
\item a $T_{1}$-term $t_{M}$ of type $[\![C]\!]$ with free variables
$\{ p_{0}^{[\![A_{0}]\!]},p_{1}^{[\![A_{1}]\!]},\ldots,p_{n-1}^{[\![A_{n-1}]\!]}\}$ 
\item a derivation of $t_{M}\ mr\ C$ that may use the base realization
formulas $\{ p_{0}\ mr\ A_{0},p_{1}\ mr\ A_{1},\ldots,p_{n-1}\ mr\ A_{n-1}\}$.
\end{itemize}
\begin{proof}
Proof by induction on the derivation $M$ of the formula $C$, distinction
by cases on the last rule that has been applied.

Base case: $M$ is an assumption variable $u_{i}^{A_{i}}$. Hence
$C$ is of the form $A_{i}$. Then $t_{M}$ is $p_{i}^{[\![A_{j}]\!]}$
and also of type $[\![C]\!]$. The free variables are $\{ p_{j}^{[\![A_{j}]\!]}\}$.
The derivation that $t_{M}\ mr\ C$ follows from the base realizations
$p_{i}\ mr\ A_{i}$. 

Step cases: 

$\wedge I$: $C$ is of the form $B_{0}\wedge B_{1}$. Hence $t_{M_{0}}$
of type $B_{0}$ with free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfies $t_{M_{0}}\ mr\ B_{0}$ from the base realization formulas
$\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$
and $t_{M_{1}}$ of type $B_{1}$ with free variables $P_{1}=\{ p_{10}^{[\![A_{10}]\!]},p_{11}^{[\![A_{11}]\!]},\ldots,p_{1(m-1)}^{[\![A_{1(m-1)}]\!]}\}$
satisfies $t_{M_{1}}\ mr\ B_{1}$ from the base realization formulas
$\mathcal{P}_{1}=\{ p_{10}\ mr\ A_{10},p_{11}\ mr\ A_{11},\ldots,p_{1(m-1)}\ mr\ A_{1(m-1)}\}$.
For $t_{M}$ choose $\langle t_{M_{0}},t_{M_{1}}\rangle$ with the
free variables $P=P_{0}\cup P_{1}$. By $\wedge I_{IL}$ obtain $(t_{M_{0}}\ mr\ B_{0})\wedge(t_{M_{1}}\ mr\ B_{1})$
from $\mathcal{P}_{0}\cup\mathcal{P}_{1}$. (Strictly speaking we
use $((\vec{A}\rightarrow B)\wedge(\vec{C}\rightarrow D))\rightarrow((\vec{A}\wedge\vec{C})\rightarrow(B\wedge D))$.)
By R1 this is $\langle t_{M_{0}},t_{M_{1}}\rangle\ mr\ C$ from $\mathcal{P}_{0}\cup\mathcal{P}_{1}$.

$\wedge E_{0}$: $C$ was formed by $\wedge E_{0}$ from another formula
$B$. Hence there must be a $t_{M_{0}}$ of type $[\![B]\!]$ with
free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfying $t_{M_{0}}\ mr\ B$ from the base realization formulas
$\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$.
Define $t_{M}:=t_{M_{0}}0$ and $t_{rest}:=t_{M_{0}}1$. We can apply
R1 on $t_{M_{0}}=\langle t_{M},t_{M_{rest}}\rangle$, thus $t_{M_{0}}\ mr\ B:=(t_{M}\ mr\ B0)\wedge(t_{M_{rest}}\ mr\ B1)$
from $\mathcal{P}_{0}$. By $\wedge E_{IL}^{0}$ (Strictly speaking:
$\vec{(A}\rightarrow(B\wedge C))\rightarrow(\vec{A}\rightarrow B)$.)
obtain $(t_{M}\ mr\ B0)$ from $\mathcal{P}_{0}$.

$\wedge E_{1}$: $C$ was formed by $\wedge E_{1}$ from another formula
$B$. Hence there must be a $t_{M_{0}}$ of type $[\![B]\!]$ with
free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfying $t_{M_{0}}\ mr\ B$ from the base realization formulas
$\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$.
Define $t_{M}:=t_{M_{0}}0$ and $t_{rest}:=t_{M_{0}}1$. We can apply
R1 on $t_{M_{0}}=\langle t_{M_{rest}},t_{M}\rangle$, thus $t_{M_{0}}\ mr\ B:=(t_{M_{rest}}\ mr\ B0)\wedge(t_{M}\ mr\ B1)$
from $\mathcal{P}_{0}$. By $\wedge E_{IL}^{1}$ obtain $(t_{M}\ mr\ B1)$
from $\mathcal{P}_{0}$.

$\rightarrow\! I$: $C$ is of the form $B_{1}\rightarrow B_{0}$.
Hence $t_{M_{0}}$ of type $B_{0}$ with free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfies $t_{M_{0}}\ mr\ B_{0}$ from the base realization formulas
$\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$.
Let $p_{1}^{[\![B_{1}]\!]}$ be the extracted program of $B_{1}$
and let $p_{1}\ mr\ B_{1}$. By $\rightarrow\! I$ get $(p_{1}\ mr\ B_{1})\rightarrow(t_{M_{0}}\ mr\ B_{0})$
from $\mathcal{P}_{0}\backslash(p_{1}\ mr\ B_{1})$. Now construct
$t_{M}$ as $\lambda p_{1}.t_{M_{0}}$, hence its set of free variables
is $P=P_{0}\backslash p_{1}$. Conversely $t_{M_{0}}=t_{M}p_{1}$
by $\beta$-reduction. Hence get $(p_{1}\ mr\ B_{1})\rightarrow(t_{M}p_{1}\ mr\ B_{0})$
with the set of free variables $P_{0}\backslash p_{1}$ from $\mathcal{P}_{0}\backslash(p_{1}\ mr\ B)$.
When by $\forall I$ obtaining $\forall p((p\ mr\ B_{1})\rightarrow(t_{M}p\ mr\ B_{0}))$
with the set of free variables $P_{0}\backslash p_{1}$ from $\mathcal{P}_{0}\backslash(p_{1}\ mr\ B)$
we can be assured that the free variable condition is not violated,
because it has been assured that $p_{1}$ does not occur as free variable
in the term. Application of R2 gives that $t_{M}\ mr\ C$ from $\mathcal{P}_{0}\backslash(p_{1}\ mr\ B)$.

$\rightarrow\! E:$$C$ was formed by an $\rightarrow\! E$. Assume
that it was derived from $(B\rightarrow C)$ and \textbf{$B$}. \textbf{}Hence
in the previous step there must have been a $t_{M_{0}}\ $ with free
variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfying $t_{M_{0}}\ mr\ (B\rightarrow C)$ from the base realization
formulas $\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$
and $t_{M_{1}}$ with with free variables $P_{1}=\{ p_{10}^{[\![A_{10}]\!]},p_{11}^{[\![A_{11}]\!]},\ldots,p_{1(m-1)}^{[\![A_{1(m-1)}]\!]}\}$
satisfying $t_{M_{1}}\ mr\ B$ from the base realization formulas
$\mathcal{P}_{1}=\{ p_{10}\ mr\ A_{10},p_{11}\ mr\ A_{11},\ldots,p_{1(m-1)}\ mr\ A_{1(m-1)}\}$.
Let $t_{M}:=t_{M_{0}}t_{M_{1}}$with free variables $P_{0}\cup P_{1}$.
Apply R2 in the form $(t_{M_{0}}\ mr\ B\rightarrow C)=\forall p((p\ mr\ B)\rightarrow(t_{M_{0}}p\ mr\ C))$.
Do $\forall E$ with $p:=t_{M_{1}}$. Then $(t_{M_{1}}\ mr\ B)\rightarrow(t_{M_{0}}t_{M_{1}}\ mr\ C)$.
Do $\rightarrow\! E$ to obtain $t_{M}\ mr\ C$ from $\mathcal{P}_{0}\cup\mathcal{P}_{1}$.

$\forall I:$ $C$ is of the form $\forall p_{1}B$. Already $t_{M_{0}}$
of type $B$ with free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfies $t_{M_{0}}\ mr\ B$ from the base realization formulas $\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$.
Let $t_{M}:=\lambda p_{1}.B$ with free variables $P:=P_{0}\backslash p_{1}$.
Then by $\forall I$ we get $\forall p:t_{M_{0}}[p_{1}:=p]\ mr\ B$
from $\mathcal{P}_{0}$. The $\forall I$ is admissible because it
has been assured that $p_{1}$ is not a free variable in the derivation
of $t_{M_{0}}.$ By R3 obtain $t_{M}\ mr\ \forall p_{1}B$ from $\mathcal{P}_{0}$.

$\forall E:$ $C$ was formed by $\forall E$. Assume that it was
derived from $\forall xB$ and a term $t$ with free variables $P_{1}=\{ p_{10}^{[\![A_{10}]\!]},p_{11}^{[\![A_{11}]\!]},\ldots,p_{1(m-1)}^{[\![A_{1(m-1)}]\!]}\}$
from the base realization formulas $\mathcal{P}_{0}=\{ p_{10}\ mr\ A_{10},p_{11}\ mr\ A_{11},\ldots,p_{1(m-1)}\ mr\ A_{1(m-1)}\}$.
\textbf{}Hence in the previous step there must have been a $t_{M_{0}}\ $
with free variables $P_{0}=\{ p_{00}^{[\![A_{00}]\!]},p_{01}^{[\![A_{01}]\!]},\ldots,p_{0(n-1)}^{[\![A_{0(n-1)}]\!]}\}$
satisfying $t_{M_{0}}\ mr\ \forall xB$ from the base realization
formulas $\mathcal{P}_{0}=\{ p_{00}\ mr\ A_{00},p_{01}\ mr\ A_{01},\ldots,p_{0(n-1)}\ mr\ A_{0(n-1)}\}$.
Let $t_{M}:=t_{M_{0}}t$ with free variables $P_{0}\cup P_{1}$. Unfolding
R3 gives $(t_{M_{0}}\ mr\ \forall xC)=\forall x(t_{M_{0}}x\ mr\ C)$.
Defining $x:=t$ in $\forall E$ and $t_{M}:=t_{M_{0}}t$ with free
variables $P_{0}\cup P_{1}$ gives the desired $(t_{M}\ mr\ C)$ from
$\mathcal{P}_{0}\cup\mathcal{P}_{1}$. 
\end{proof}

\subsection{Functional interpretation }

\begin{rem}
This section closely follows \cite{HS2006proof}, for a more general
treatment see also \cite{Oliva2005unifying}. Gödel assigned to every
formula $A$ of $T_{0}$ a new one $A^{D}:=\exists A_{1}(x)$ of $T_{1}$
with $A_{1}(x)$ universal formula, of the form $\forall y|A_{2}|_{y}^{x}$
with $A_{D}:=A_{2}$ (of $T_{qf}$) quantifier-free. To denote witnesses
and challengers, assign the terms $\tau^{+}(A)$, $\tau^{-}(A)$ to
every formula $A$.
\end{rem}

\subsubsection{Theories involved}

\begin{rem}
The metatheory is $WE\mbox{-}HA^{\omega}+AC+IP_{\forall}^{\omega}+M^{\omega}+Ax_{\forall}$,
$T_{0}$ is $WE\mbox{-}HA^{\omega}+Ax_{\forall}$, $T_{1}$ is $T_{0}$
restricted to $A^{D}$-formulas, $T_{qf}$ is $WE\mbox{-}HA^{\omega}$
restricted to $A_{D}$-formulas, quantifier-free induction and substitution
schemes.
\end{rem}

\subsubsection{Cleaning}

\begin{defn}
In order to avoid case distinctions on the computational content of
the types we use the unit type $U$. Define \begin{eqnarray*}
 &  & \tau^{^{+}}(P(\vec{s}))=U,\\
 &  & \tau^{-}(P(\vec{s})):=U,\\
 &  & \tau^{+}(\forall x^{\rho}A):=\rho\rightarrow\tau^{+}(A),\\
 &  & \tau^{-}(\forall x^{\rho}A):=\rho\times\tau^{-}(A),\\
 &  & \tau^{+}(\exists x^{\rho}A):=\rho\times\tau^{+}(A),\\
 &  & \tau^{-}(\exists x^{\rho}A):=\tau^{-}(A),\\
 &  & \tau^{+}(A\rightarrow B):=(\tau^{+}(A)\rightarrow\tau^{+}(B))\times(\tau^{+}(A)\rightarrow\tau^{-}(B)\rightarrow\tau^{-}(A),\\
 &  & \tau^{-}(A\rightarrow B):=\tau^{+}(A)\times\tau^{-}(B).\end{eqnarray*}
If $\tau^{+}(A)\ne U$, then $A$ has positive computational content,
if $\tau^{-}(A)\ne U$, then $A$ has negative computational content. 
\end{defn}

\subsubsection{Gödel translation}

\begin{defn}
For every formula $A$ and terms $r$ of type $\tau^{+}(A)$ and $s$
of type $\tau^{-}(A)$ inductively define a new formula, the Gödel
kernel, $A_{D}:=|A|_{r}^{s}$ that is quantifier-free.

$|P(\vec{s})|_{s}^{r}:=P(\vec{s})$, $|\forall xA(x)|_{s}^{r}:=|A(s0)|_{s1}^{r(s0)}$,
$|\exists xA(x)|_{s}^{r}:=|A(r0)|_{s}^{r1}$, $|A\rightarrow B|_{s}^{r}:=|A|_{r1(s0)(s1)}^{s0}\rightarrow|B|_{s1}^{r0(s0)}$.
The Gödel translation is the closed form $\exists x\forall y|A|_{y}^{x}$.
\end{defn}

\subsubsection{Soundness}

\begin{thm}
For every $WE\mbox{-}HA^{\omega}+AC+IP_{\forall}^{\omega}+M^{\omega}+Ax_{\forall}$-derivation
$M$ of a formula $A$ with $FA(M)=\{ u_{i}^{C_{i}}\}$, where $i=0,\ldots,n-1$,
with $x_{i}$ of type $\tau^{+}(C_{i})$ (variables for realizers
of the assumptions), and $y$ of type $\tau^{-}(A)$ (variable for
the challenge of the goal), we can construct
\end{thm}
\begin{itemize}
\item terms $[\![M]\!]^{+}$ of type $\tau^{+}(A)$ with $y\notin FV(t)$,
terms $[\![M]\!]^{-}$ of type $\tau^{-}(A)$ 
\item $\mu(M)$ in $WE\mbox{-}HA^{\omega}+Ax_{\forall}$ for the formula
$|A|_{y}^{t}$ from assumptions $\bar{u}_{i}:|C_{i}|_{r_{i}}^{x_{i}}$.
\end{itemize}
\begin{proof}
Proof by induction on the derivation $M$ of the formula $C$, distinction
by cases on the last rule that has been applied.

Base case $u:A$. Let $x$ of type $\tau^{+}(A)$ be a variable for
a realizer of the assumption $u$. Define $[\![u]\!]^{+}:=x$ and
$[\![u]\!]^{-}:=y$.

Step case $\lambda u^{A}M^{B}$. By IH obtain a derivation of $|B|_{z}^{t}$
from $\bar{u}:|A|_{r}^{x}$ and $\bar{u}_{i}:|C_{i}|_{r_{i}}^{x_{i}}$,
where $\bar{u}:|A|_{r}^{x}$ may be absent. Substitute $y0$ for $x$
and $y1$ for $z$. By $(\rightarrow^{+})$ obtain $|A|_{r[x,z:=y0,y1]}\rightarrow|B|_{y1}^{t[x:=y0]}$,
which is (modulo $\beta$-conversion) $|A\rightarrow B|_{y}^{\lambda xt,\lambda x,zr}$
from $\bar{u}'_{i}:|C_{i}|_{r_{i}[x,z:=y0,y1]}^{x_{i}}$. Here the
canonical inhabitant of the type $\tau^{-}(A)$ in case $\bar{u}:|A|_{r}^{x}$
is absent. So, assuming that $u^{A}$ is $u_{1}$, we can define the
required terms by $[\![\lambda uM]\!]^{+}:=(\lambda x,z[\![M]\!]_{1}^{-}$),
$[\![\lambda uM]\!]_{i}^{-}:=[\![M]\!]_{i+1}^{-}[x,z:=y0,y1]$.

For other step cases and more detail see \cite{HS2006proof}.
\end{proof}

\section{Error containment by interval and modulus analysis\label{sec:Interval-and-modulus}}

\begin{quotation}
Consultum tamen non erit intervalla nimis magna constitui (Be advised
though not to build the intervals too large \cite{Euler1913institutio})
\end{quotation}
\begin{rem}
In the introduction we have mentioned that constructive representations
allow to give error bounds (section \ref{extr-error-bounds}). In
the last section we have introduced different proof interpretations.
On top of these proof interpretations, mathematical objects (such
as real numbers, continuous functions on the reals) have to be represented
by some structures. If (even within the same proof interpretation)
we want to {}``port'' computational content from one structure to
another, some understanding of the general notions of numbers and
functions in the chosen representation is needed. In this section
we will convert between different representations.
\end{rem}
\begin{notation}
\emph{Axiomatic analysis} (AA) will be the reals as axiomatically
represented in introductory textbooks of analysis (eg \cite{Rudin1964principles}),
\emph{modulus analysis} (MA) the modulus representation in \cite{Schwichtenberg2005constructive,Schwichtenberg2006program,Schwichtenberg2006inverting},
\emph{interval analysis} (IA) be the representation by intervals \cite{Jaulin2001applied,Moore1965theautomatic,Moore1966interval}.
\end{notation}
\begin{rem}
AA included non-constructive completeness principles, whereas care
will be taken to keep the definitions of MA mr-realizable \cite{Schwichtenberg2005constructive}.
As in section \ref{sec:Some-MR-realizable-numerical} an algorithm
presented in IA will be looked at, we are interested in the portability
of definitions in IA to MA. A restriction will be made to concepts
that will be really needed, for example one can integrate differential
equations without a definition of a derivative. However, if one has
the inclination, also the derivative and Taylor approximation can
be formulated in IA, some of this is reported in section \ref{sub:Some-calculus-in}.
\end{rem}

\subsection{Numbers/ranges}


\subsubsection{IA intervals}

\begin{defn}
An \emph{interval} $[a,b]$ is defined by two rationals $a\le b$
and denotes all real numbers between $a$ and $b$. Let $I=[a,b],\ J=[c,d]$
be intervals. $I$ is \emph{included} in $J$, $I\subseteq J$, if
$c\le a$ and $b\le d$. Define the \emph{width} of an interval $w(I)$
to be $b-a$. The interval $[a,a]$ of width $0$ is understood to
represent the rational $a$. Define $(a,b]:=\{ x\in[a,b]\backslash a,$
$[a,b):=\{ x\in[a,b]\}\backslash b$, $(a,b):=\{ x\in[a,b]\}\backslash\{ a,b\}$.
Moreover, define $|a,b|:=max(|a|,|b|)$, $[a,b]\le[c,d]:=b\le c$.
\label{def:IAintervals}
\end{defn}
\begin{lem}
Absolute bounds. $w(I)\le2|I|$; $w(IJ)\le|I|w(J)+|J|w(I)\le4|I||J|$.\label{lem:IA-abs}
\end{lem}
\begin{proof}
Let $I=[a,b],$ $J=[c,d]$. $w(I)=b-a\le\max(|2a|,|2b|)=2|I|$; $w(IJ)=bd-ac\le\max(|a|,|b|)(d-c)+\max(|c|,|d|)(b-a)=|I|w(J)+|J|w(I)\le4\cdot|I||J|$. 
\end{proof}
\begin{lem}
Absolute bounds for intervals not changing sign. $w(I)=|b-a|$.\label{lem:IA-abs-sign}
\end{lem}
\begin{proof}
$w(I)=b-a=|b-a|$.
\end{proof}

\subsubsection{MA reals}

\begin{rem}
Let $x$ be a real number. A straightforward approach \cite{Harrison1997introduction}
to express an approximation $x$ as modulus $M$:$int\rightarrow int$
is:$|M(k)-x\cdot2^{k+1}|\le1$. E.g. for $x=\pi$ this determines
a mapping: $\{0\rightarrow3,1\rightarrow6,2\rightarrow13,3\rightarrow25,4\rightarrow50,\ldots\}$
that also can be represented by the sequence $(3,6,13,25,50,\ldots)$.
The sequence would look more natural as $(3,\frac{6}{2}=3,\frac{13}{4}=3.25,\frac{25}{8}=3.125,\frac{50}{16}=3.125,\ldots)$
but that modulus would be of type $M:int\rightarrow rational$. This
justifies the following definition.
\end{rem}
\begin{defn}
Let $M$ be of type $int\rightarrow int$, let $(a_{n})$ be a Cauchy
sequence $(a_{n})$, and it holds:

\[
\forall k,m,n\colon m,n\ge M(k)\rightarrow a_{m}-a_{n}\le2^{-k}\]


We define a \emph{real} to be a pair $((a_{n})_{n\in\mathbb{N}},M)$
where $(a_{n})\in\mathbb{Q}$ and $M:\mathbb{N}\rightarrow\mathbb{N}$
that $\forall k,n,m:n,m\ge M(k)\rightarrow|a_{n}-a_{m}|\le2^{-k}$.
Two reals $a,b$ are equal if $\forall k\exists N\forall m,n\ge N:|a_{n}-b_{m}|\le2^{-k}$.
Define $x<_{k}y:=\forall n:n\ge max(M(k+2),N(k+2))\rightarrow2^{-k}\le b_{n}=a_{n}$.
\end{defn}
\begin{rem}
The $(a_{n})$ of the previous definition are a Cauchy sequence ($\forall\varepsilon\exists N:m,n>N\rightarrow|a_{n}-a_{m}|\le\varepsilon$),
because $2^{-k}$ can get arbitrarily small.
\end{rem}
\begin{example}
\label{exa:Euler's-number-e:}Euler's number $e$: A well-known $AA$
representation is $e:=\sum_{i=0}^{\infty}\frac{1}{i!}$, $i\in\mathbb{N}$.
For an $MA$ representation, pick $e_{MA}=(a_{n},M(k))$ where $a_{n}:=\sum_{i=0}^{n}\frac{1}{i!}$,
and $M(k):=k+2$. For an $IA$ representation, choose $e_{IA}=[1,3]$.
Without loss of generality, assume $m\le n$. We want to show that
the chosen representations in IA and MA contain AA rational approximations
of $e$, at first we observe:

\begin{eqnarray}
\forall m,n\ge0: &  & 0\le\sum_{i=0}^{n}\frac{1}{i!}-\sum_{i=0}^{m}\frac{1}{i!}\le2^{-(m-2)}\label{eq:e}\end{eqnarray}
The left hand side of inequality \ref{eq:e} is satisfied, because
the sum only contains positive terms, and the right hand side is satisfied
by: \begin{eqnarray*}
\sum_{i=0}^{n}\frac{1}{i!}-\sum_{i=0}^{m}\frac{1}{i!} & = & \sum_{i=m+1}^{n}\frac{1}{i!}\le\sum_{i=m+1}^{n}\frac{1}{2^{i-1}}\le2^{-(m-2)}\end{eqnarray*}


This justifies the choice of $e_{MA}$. For $e_{IA}$, observe that
$\sum_{i=0}^{0}\frac{1}{i!}=1$, and by instantiating \ref{eq:e}
with $m:=0$,$n:=\infty$ obtain $\sum_{i=0}^{\infty}\frac{1}{i!}-\sum_{i=0}^{0}\frac{1}{i!}=\sum_{i=1}^{\infty}\frac{1}{i!}\le2$.
\end{example}
\begin{rem}
It is not hard to think of better approximations: $e_{IA}^{'}:=[2,3]$,
$e_{MA}^{'}:=(a_{n}^{'},M'(k))$ where $a_{n}^{'}:=a_{n}$, $M'(k):=M(k)$
if $k<10$, $M'(k):=\frac{k}{2}+2$ for $k\ge10$. Proof: check the
definitions as in the previous example, with case distinction on $k<10$.
\end{rem}
\begin{defn}
Contracting sequence of intervals: A sequence of intervals $I_{n}$
is \emph{contracting} with contraction modulus $\alpha$ if the intersection
of all intervals has a common point $\cap(I_{n})$ and for all $I_{m}$,
$I_{n}$: $I_{m}\subseteq I_{n}$,$w(I_{m})\cdot\alpha\ge w(I_{n})$
if $m\le n$.

Dyadic logarithm: Let $ld(x):\mathbb{Q}\rightarrow\mathbb{N}$ for
a rational $x$ return the least $k\in\mathbb{N}$ such that $2^{k}\ge|x|$.\label{def:dyadiclog}

Dyadic interval: For $a\le b$ let the \emph{dyadic interval} be $[\frac{a-b}{2}-c,\frac{a-b}{2}+c]$,
with $c:=ld(b-a)$
\end{defn}
\begin{lem}
Every modulus real can be represented as a sequence of intervals,
every contracting sequence of intervals can be represented as a modulus
real.
\end{lem}
\begin{proof}
Modulus real to sequence of intervals: $([a_{n}-2^{-M(n)},a_{n}+2^{-M(n)}])$.
Contracting sequence of intervals to modulus reals: $((\cap(I_{n})),\lambda n.ld(\frac{1}{\alpha}n))$.
\end{proof}

\subsection{Functions: properties}

\begin{rem}
Let $f(x)$ be a real function (recall that this can be defined as
a mapping of elements $x$ from a domain to a range as e.g. in \cite{Rudin1964principles}). 
\end{rem}

\subsubsection{IA functions\label{sub:IA-functions}}

\begin{defn}
We define an interval-valued function as a mapping $X\mapsto F(X)$,
where $X$ is an interval and $F(X)$ is an interval such that $\forall x\in X:f(x)\in F(X)$.
The mapping $F$ thus has the property of being \emph{inclusion monotonic:}
if $\forall I,J:I\subseteq J\rightarrow F(I)\subseteq F(J)$.
\end{defn}
\begin{rem}
As a special case, by \ref{def:IAintervals}, $\forall x\in\mathbb{Q}\cap I\rightarrow f(x)\in F(I)$.\label{rem:Funonrat}
\end{rem}
\begin{example}
\label{exa:ia-fun}Define an approximation to the exponential function
to be $F[a,b]=[2^{a},3^{b}]$ for each interval $[a,b]$.
\end{example}
\begin{rem}
Alternatively, \cite{Moore1966interval} chooses the interval function
as basic definition, and then by definition of the $0$-length interval
\emph{$[x,x]$} for the rational \emph{$x$}, obtain \emph{$f(x)=F([x,x])$}
on $x\in\mathbb{Q}$.
\end{rem}
\begin{defn}
$F(X)$ is an \emph{interval inclusion} of $f(x)$ if $\forall x\in X:f(x)\in F(x)$.

In contexts where this is useful, a \emph{system of functions} $F_{0},\ldots,F_{n-1}$
can be designated $F$.

A \emph{rational function} is a function made up of the elementary
arithmetical operations $+$, $-$, $*$, $/$ (division by $0$ is
not allowed). \label{sub:IA-rational-interval}For the rational functions,
we define the \emph{arithmetic operations}: \begin{eqnarray*}
[a,b]+[c,d] & := & [a+c,b+d]\\
{}[a,b]-[c,d] & := & [a-d,b-c]\\
{}[a,b]*[c,d] & := & [min(ac,ad,bc,bd),max(ac,ad,bc,bd)]\\
{}[a,b]/[c,d] & := & [a,b]*[\frac{1}{c},\frac{1}{d}]\end{eqnarray*}

\end{defn}
\begin{lem}
If $F$ is defined by rational operations, $F$ is inclusion monotonic
(hence an interval function).\label{lem:F-rational} 
\end{lem}
\begin{proof}
Check the definitions of the rational operations (which have been
constructed to ascertain this property).
\end{proof}
\begin{rem}
Note that a representation of $f(x)$ by an $F(x)$ is not unique,
as different terms might be used for the same function, e.g. $f(x)=0$
or $f(x)=x-x$. 

Monotonically increasing functions \cite[p. 21]{Moore1965theautomatic}:
If $F$ is monotonically increasing on an interval, one can choose
a term $F(x,a_{0},\ldots,a_{n-1})$ for $f(x)$ containing the term
$f$, the bound variable $x$, and the free variables $a_{0},\ldots,a_{n-1}$.
Let $X,A_{0},\ldots,A_{n-1}$ be intervals. When $x\in X$, $a_{0}\in A_{0}$$,\ldots,$
$a_{n-1}\in A_{n-1}$, then the term $F=[\min(F(x,a_{0},\ldots a_{n-1})),\max(F(x,a_{0},\ldots a_{n-1}))]$
can be interpreted as an interval operation that maps intervals to
intervals. $F$ is defined if there is no valuation $(x,a,\ldots,a_{n-1})$
such that a division by $0$ in a subterm occurs. If the meaning of
the free variables is known from context, then we write $F(x)$ for
$F(x,a_{0},\ldots,a_{n-1})$. Usage of this definition for more general
functions requires that one identifies intervals where $F$ is monotonically
increasing/decreasing, e.g. $\cos([0,2\pi])$ should be evaluated
as $\cos([0,\pi])\cup\cos([\pi,2\pi])$.
\end{rem}

\subsubsection{IA continuous functions}

\begin{defn}
An IA function $F$ is continuous if $\forall\varepsilon\exists\delta:F(x+\delta)-F(x)\le\varepsilon$
where $x$ is an interval.
\end{defn}

\subsubsection{MA continuous functions}

\begin{defn}
A continuous function $f:D\rightarrow\mathbb{R}$ on a compact interval
with rational end points $a,b$ is given by: 
\end{defn}
\begin{itemize}
\item an approximating map $h_{f}:(D\cap\mathbb{Q})\times\mathbb{N}\rightarrow\mathbb{Q}$ 
\item a weakly increasing map $\alpha_{f}:\mathbb{N}\rightarrow\mathbb{N}$
such that $(h_{f}(a,n))_{n}$ is a Cauchy sequence with uniform modulus
$\alpha_{f}$
\item a weakly increasing map $\omega_{f}:\mathbb{N}\rightarrow\mathbb{N}$
(modulus of uniform continuity), which satisfies $|a-b|\le2^{-\omega_{f}(k)+1}\rightarrow|h_{f}(a,n)-h_{f}(b,n)|\le2^{-k}$
for $n\ge\alpha_{f}(k)$.
\end{itemize}
\begin{example}
\label{exa:IA-exp} From AA, we know that for $D:=\mathbb{R}$ the
exponential function can be represented as $exp(x):=\sum_{i=0}^{\infty}\frac{x^{i}}{i!}$.
We want to look at $exp(x)$ on the interval $[1,4]$. As approximating
map choose $h_{exp}(x,n):=\sum_{i=0}^{n}\frac{x^{i}}{i!}$, $\alpha_{exp}(k):=k+2$,
and $\omega_{exp}(k):=k+7$. We have to check that these choices satisfy
the specification: $h_{exp}(1,n))_{n}$ is indeed a Cauchy sequence
with modulus $\alpha_{exp}(k):=k+2$ as was shown in example \ref{exa:Euler's-number-e:}.
For the slope of $exp(x)$ we have $(\sum_{i=0}^{\infty}\frac{x^{i}}{i!})'=\sum_{i=0}^{\infty}\frac{x^{i}}{i!}$,
attaining its maximum $\sum_{i=0}^{\infty}\frac{4^{i}}{i!}$ at $4$.
$\sum_{i=0}^{\infty}\frac{4^{i}}{i!}=1+4+\frac{16}{2!}+\frac{64}{3!}+\frac{256}{4!}+...\le1+4+8+11+11\sum_{i=0}^{\infty}(\frac{4}{5})^{i}\le24+5*11\le2^{7}$.
\end{example}
\begin{rem}
Better moduli are possible, e.g. $\omega_{exp}(k):=k+7$ if $k<10$,
$\omega_{exp}(k):=k+4$ if $k\ge10$. (Mnemotechnical) $\omega$ originally
stood for {}``oscillation'' \cite{Wikipedia2006modulus}.
\end{rem}

\subsection{Application and boundedness of functions}


\subsubsection{IA application}

\begin{defn}
Application of a function to the rational endpoints of an interval:
$f([a,b]):=[f(a),f(b)]$.

(Generalized) application of a function to an interval: Let $X_{i}=[a_{i},b_{i}]$
and $X=[a,b]=\cup X_{i}$ then $f([a,b])=f(\cup[a_{i},b_{i}])=\cup f([a_{i},b_{i}])$.
\end{defn}
\begin{example}
Recall $F$ from example \ref{exa:ia-fun}. Then $F([2,3])=[4,27]$.
\end{example}
\begin{thm}
Theorem: Every rational function where each argument occurs at most
once is Lipschitz bounded, that is for each $a,b$ there exists an
$L$ such that $f(a,b)\le L*|a-b|$. 
\end{thm}
\begin{proof}
Define $L$ inductively to be 

\begin{eqnarray*}
a & if & f(a)=a\\
max(|a|,|b|) & if & f(a,b)=a+b\\
max(|a|,|b|) & if & f(a,b)=a-b\\
a*b & if & f(a,b)=a*b\\
a/b & if & f(a,b)=a/b.\end{eqnarray*}

\end{proof}
\begin{cor}
The argument carries over to functions where arguments occur multiple
times. 
\end{cor}
\begin{proof}
Whenever in the function term an argument $x$ occurs multiple times
(e.g. $f(x)=x*x$), index the argument by $x_{1},x_{2},\ldots,x_{n}$
to give a new function $\bar{f}$ and define $h(x_{1},x_{2},\ldots,x_{n})=\bar{f}(f(x))$.
\end{proof}
\begin{thm}
Rational approximations are bounded \cite[p. 69]{Moore1965theautomatic}:
Let $F$ be a rational interval function defined on an interval $A$.
Then for each subinterval $X\subseteq A$ there is a positive real
number $k$ independent of the method of subdivision, such that $f(X)\subseteq\cup_{i=1}^{n}F(X_{i})$
and $|w(\cup_{i=1}^{n}F(X_{i}))-w(f(X))|\le k\cdot max_{i}(w(X_{i}))$. 
\end{thm}
\begin{proof}
$f(X)\subseteq\cup_{i=1}^{n}F(X_{i})$ follows from the inclusion
property. For the bound on the approximation prove that for each $y_{1}\in F(X_{i})$
we have a $y_{2}\in f(X_{2})$ such that $|y_{1}-y_{2}|\le k\cdot max_{i}(w(X_{i}))$,
this follows from boundedness of rational functions on an interval.
\end{proof}

\subsubsection{MA application }

\begin{defn}
The application of a continuous function $f:I\rightarrow\mathbb{R}$
with $h_{f},\alpha_{f},\omega_{f}$ to a real $x:=((a_{n})_{n},M)$
is defined to be $(h_{f}(a_{n},n))_{n}$ with $M(fx):=max(\alpha_{f}(k+2),M(\omega_{f}(k+1)-1))$. 
\end{defn}
\begin{rem}
This is a modulus, proof see \cite[p. 27]{Schwichtenberg2005constructive}\cite[p. 9]{Schwichtenberg2006program}. 
\end{rem}
\begin{example}
We want to know $exp(e)$, with $e$ defined as in example \ref{exa:Euler's-number-e:}
and $exp(x)$ defined as in example \ref{exa:IA-exp}. Hence obtain
\begin{eqnarray*}
(a)_{n}=(h_{exp}(a_{n},n))_{n} & = & \sum_{i=0}^{n}\frac{\sum_{j=0}^{n}\frac{x^{j}}{j!}}{i!}\\
M(fx)=max(\alpha_{exp}(k+2),M(\omega_{exp}(k+1)-1)) & = & max(k+4,k+7)=k+7.\end{eqnarray*}

\end{example}

\subsection{Modulus of increase}

\begin{defn}
$l\in\mathbb{N}$ is a \emph{uniform modulus of increase} for $f:[a,b]\rightarrow\mathbb{R}$
if for all $c,d\in[a,b]$ and all $m\in\mathbb{N}$: $|c-d|\le2^{-m}\rightarrow|f(c)-f(d)|\le2^{-m-l}$. 
\end{defn}

\section{Some mr-realizable numerical analysis: IVP problems \label{sec:Some-MR-realizable-numerical}}

\begin{quotation}
By clever heuristics choose $h$ and $A$ so that $w(Y(x))$ is {}``as
small as possible as long as possible''. \cite{HS2006estimating}
\end{quotation}
\begin{rem}
The motivation for this section is to look at the translation between
mr-realizable methods. While numerical algorithms as such are by their
very nature constructively realizable, in numerical analysis one often
does not go for exact a priori error bounds but is rather content
with the order of the error (because the error terms become too pessimistic
or unwieldy or on is more interested in other properties of the system).
On the other hand, IA (see \ref{sec:Interval-and-modulus}) retains
the emphasis on explicit a priori bounds, and by design, MA is also
focussed on a priori bounds.

There are many methods to approximate ODE IVP problems \cite{Hairer1987solving,Hairer1991solving}.
After some definitions and algorithms, we will present a general framework
for one-step methods \ref{sub:Convergence-from-local}. We will then
look at different aspects of Moore's $k$-th order algorithm and discuss
its {}``portability'' as well as implementation aspects.

As a general remark, we will be most interested in convergence questions,
we will ignore errors in floating point arithmetic rounding. This
is justified because those are comparatively small in the examples
for IA, and on the other hand, the practical implementation of MA
in minlog uses (arbitrarily precise) symbolic rationals anyway.

Moreover, while for some example equations it could be meaningful
to discuss stability (such as in the sense of Axelsson's A-stability
\cite{Axelsson1969aclass}), for others this is not an applicable
concept (especially not for the explosion equation that Moore uses
to illustrate his examples), so it is not discussed (nor has Moore
in \cite{Moore1966interval}).
\end{rem}

\subsection{General notions}


\subsubsection{Notation\label{sub:General-notions}}

\begin{notation}
Let $f=y(x)$. For derivatives, we use Leibniz ($\frac{dy}{dx}$)
and Lagrange ($f'$) notation. The dependent variable is always $y$,
the independent variable is $x$. Dot notation ($\dot{x},\ddot{x}$)
is used for instances of variables, it does \emph{not} refer to derivatives.
\end{notation}

\subsubsection{Definitions\label{sub:diffeqdef}}

\begin{defn}
We begin with some definitions on the axiomatic reals in the theory
AA. \label{sub:Definition:-exact-solution}

Consider an \emph{ordinary differential equation} (ODE) $y'(x)=\frac{dy}{dx}=f(x,y)$
where $f:(A\subseteq\mathbb{R}^{2})\rightarrow\mathbb{R}$ is a continuous
function. When $A$ is a rectangle let $A=R\times S$. A \emph{solution}
on an interval $I$ is a continuous function $u:I\rightarrow\mathbb{R}$
with a continuous derivative $u'$ such that $\forall x\in I$: $(x,u(x))\in A$
and $u'=f(x,u(x)).$

An \emph{initial value problem} (IVP problem) is to find a solution
$u$ such that $u(x_{0})=y_{0}$.

A function is \emph{uniformly continuous} if $|f(\dot{x})-f(\ddot{x})|\le M|\dot{x}-\ddot{x}|$.
$M$ is called \emph{continuity constant}. (Hence $\omega_{f}(h)\le Mh$.)

A differential equation satisfies a \emph{Lipschitz condition} if
for different $\dot{y},\ddot{y}$ we have $f(x,\dot{y})-f(x,\ddot{y})\le L|\dot{y}-\ddot{y}|$.
$L$ is called \emph{Lipschitz constant}.

A differential equation of \emph{order} $n$ is a differential equation
with a continuous $n$-th derivative on the left-hand-side and continuous
derivatives up to $n-1$ on the right-hand side: $y^{(n)}(x)=f(x,y,y',\ldots,y^{(n-1)})$.
It can be replaced by a \emph{system of differential equations} $g'(x)=f(x,y_{0},y_{1},\ldots,y_{n-1})$,
$y'_{0}=y$, $y'_{1}=y_{0}$,$\ldots,y'_{n-2}=y_{n-1}$.

A \emph{partition} of the interval $H:=[a=x_{0},b]$ into $N$ subintervals
by $N+1$ points can be given by the points $\{ a=x_{0},x_{0}+h_{1},\ldots,x_{0}+h_{N-1},x_{0}+h_{N}=b\}$
and the intervals $\{[a=x_{0},x_{0}+h_{0}],[x_{0}+h_{0},x_{0}+h_{1}],\ldots,[x_{0}+h_{N-1},x_{0}+h_{N}]\}$.
If the partition is \emph{equidistant}, then $h_{n}=h\cdot n$ where
$h=\frac{b-a}{N}$. With respect to a method we speak of \emph{fixed
step width} if it has to operate on an equidistant partition, of \emph{variable
step width} otherwise. Where the domain of $f$ is a rectangle, and
a continuity constant $M$ has been given, it is assumed that is has
been ensured that $H$ is within the domain where $f$ is defined.

We denote the \emph{maximal error} after the $n$th iteration step
by $e_{n}$. $e_{0}$ refers to the error before the iteration starts.
However, by $e$ (no subscript) we denote Euler's constant.
\end{defn}

\subsection{Some iteration algorithms that do not use higher order\label{sub:First-order-methods}}

\begin{algorithm}
Classical Euler. Assume the initial value $y_{0}=x_{0}$. Then march
through a partitioned interval ($n$ partitions of equal distance
$h$) by approximating the partition by the tangent to the slope at
the starting point of each partition: $y_{n+1}=y_{n}+hf(y_{n},x_{n})$.\label{alg:Euler-iteration}
\end{algorithm}
\begin{rem}
The above-mentioned stepwise approximation of a differential equation
by tangents is natural when one considers that the concept of differential
equation can be seen as the inverse of the problem of differentiation:
the task of differentiation is to find a tangent for a function, the
task of integrating on ODE is to find a function for a tangent. Among
others, the algorithm has been described by Euler, the assumptions
for convergence are attributed to Cauchy and Lipschitz, hence in the
literature it is also denoted as {}``Cauchy-Lipschitz'' \cite[pp. 76-80]{Goursat1911cours,Ince1927ode},
{}``Cauchy-Euler'' in \cite{Moore1965theautomatic}. (For historical
details see \cite{Tournes1997lintegration}). 
\end{rem}
\begin{algorithm}
Modified Euler \cite[p. 67]{Henrici1962discrete}. Assume the initial
value $y_{0}=x_{0}$. Then march through a partitioned interval ($n$
partitions) by: $y_{n+1}=y_{n}+hf(x_{n}+\frac{h}{2},y_{n}+\frac{h}{2}f(x_{n},y_{n}))$.\label{alg:modif-euler}
\end{algorithm}
\begin{rem}
We can generalize the modified Euler to an interval version: replace
$x_{n}+\frac{h}{2}$ by $X_{n}$, $y_{n}$ by $Y_{n}$ and $\frac{h}{2}f(x_{n},y_{n})$
by $[0,h]F(A)$.
\end{rem}
\begin{algorithm}
Moore first order iteration. Assume the initial value $y_{0}=x_{0}$.
Choose $A$ such that $y_{0}\in A$. Then march through a partitioned
interval ($n$ partitions) by: $X_{n}=[0,h]+n\cdot h$, $Y_{n}(0)=y_{0}$,
$Y_{n+1}=Y_{n}+hF(X_{n},Y_{n}+[0,h]F(A))$.\label{alg:Moore-first-order}
\end{algorithm}

\subsubsection{Moore first order iteration: pseudocode version}

\begin{rem}
Pseudocode (Numbering of assignments (10-...) follows \cite{Moore1966interval}.
Implementation see \cite{Blasum2006moore}.)
\end{rem}
\begin{lyxcode}
function~approx($F,A,y_{0},h)$~\{~

~~~~~~$\sigma:=x+h\cdot F(A)$~~//10-10,~inner~evaluation

~~~~~~$Y:=x+h\cdot F(\sigma)$~//10-09

~~~~~~return~$Y$

\}



function~main~()~\{

~~~~~~~//$f$~denotes~the~ODE,~$A$~its~domain,~$y_{0}$~the~IC

~~~~~~~//$F$~the~discrete~approximation

~~~~~~~//$N$~the~number~of~partitions,~$H$~the~desired~width

~~~~~~~construct~$F$~from~$f$

~~~~~~~$h:=\frac{H}{N}$

~~~~~~~for~each~$i$~in~$0,\ldots,N-1$

~~~~~~~~~~~~~$y_{0}:=approx(F,D,y_{0},h)$

\}
\end{lyxcode}
\begin{rem}
When one wants to use the algorithm, in particular one has to select
an interval for approximation $H$. Moore does not give a pseudocode
version of the first-order method, but in the example \cite[p. 97]{Moore1966interval}
he selects the interval such that $\forall x:y_{0}+xF(A)\subseteq F(A)$.
The construction of $F$ is representing $f$ as an interval-valued
function on the upper and lower bound value of an interval, in the
examples we can do this by rational functions (cf section \ref{sub:IA-rational-interval}).
\end{rem}
\begin{example}
First-order method \cite[p. 97 (10-21)]{Moore1966interval} \label{par:Example-[Moore1966interval,-p.}
\end{example}
Take $A=[0,2]$ with $y(0)=y_{0}=1$ interior to $A$. For the interval
extension take $F(Y)=Y^{2}$, for the number of steps $N:=2$. An
appropriate choice for $H$ is $\frac{1}{4}$ satisfying $1+HF(A)=1+H[0,4]\subseteq[0,2]$.
For $x_{1}$ obtain $x_{0}+h=x_{0}+\frac{1}{8}=\frac{1}{8}$, for
$x_{2}$ obtain $x_{0}+2h=\frac{1}{4}$. Obtain $Y(x_{1})=1+[\frac{4}{32},\frac{9}{32}]=[1.125,1.28125]$
and $Y(x_{2})=1+[\frac{4}{32},\frac{9}{32}]+\frac{1}{8}[\frac{1296}{1024},\frac{3249}{1024}]\subseteq[1.283,1.678]$. 


\subsection{Algebraic foundation}

\begin{rem}
For further use for error bounds, we give a purely algebraic (hence
representation-independent) notion.
\end{rem}

\subsubsection{Recurrent inequality\label{sub:rec-ineq-our}}

\begin{rem}
The following inequality was given in \cite[p. 18]{Henrici1962discrete}.
We choose the representation $a^{0}+\ldots+a^{n-1}(=1+a+a^{2}+\ldots+a^{n-1})$
as opposed to the telescoped $\frac{a^{n}-1}{a-1}$ in order to avoid
the case distinction whether $a=1$ (for comparison, see section \ref{sub:recineq-henrici}). 
\end{rem}
\begin{thm}
If the \emph{recurrent inequality} $|\xi_{n+1}|\le a|\xi_{n}|+b$
holds for all $n\in1,\ldots,N-1$, then $|\xi_{n}|\le a^{n}|\xi_{0}|+(a^{0}+\ldots+a^{n-1})b$.\label{thm:rec-ineq}
\end{thm}
\begin{proof}
Induction on $n$. 

base: $|\xi_{1}|\le a|\xi_{0}|+1\cdot b$. 

Step: by IH $|\xi_{n}|\le a^{n}|\xi_{0}|+(a^{0}+\ldots+a^{n-1})b$.
apply assumption, then \begin{eqnarray*}
|\xi_{n+1}| & \le & a(a^{n}|\xi_{0}|+(a^{0}+\ldots+a^{n-1})b)+b\\
 & = & a^{n+1}|\xi_{0}|+(a(a^{0}+\ldots+a^{n-1})+1)b\\
 & = & a^{n+1}|\xi_{0}|+((a^{1}+\ldots+a^{n})+a^{0})b\\
 & = & a^{n+1}|\xi_{0}|+(a^{0}+\ldots+a^{n})b\end{eqnarray*}
 which is the IH with index $n$ increased by $1$.
\end{proof}
\begin{notation}
We denote $a$ as the \emph{dependent parameter}, $b$ as the \emph{independent
parameter}. (This is motivated by the distinction of the dependent
and the independent variable in section \ref{sub:General-notions}).
\end{notation}
\begin{lem}
The growing part of the error term depends only on the dependent parameter.\label{lem:The-exponential-part}
\end{lem}
\begin{proof}
$a^{n+1}|\xi_{0}|+(a^{0}+\ldots+a^{n})b=a^{n+1}|\xi_{0}|+\frac{a^{n+1}-1}{a-1}b\le a^{n+1}(|\xi_{0}|+\frac{b}{a-1})$.
\end{proof}

\subsection{Global bounds from the local discretization error\label{sub:Convergence-from-local}}

\begin{rem}
For each of the algorithms of the previous section, one can give bounds
by elementary proofs. (For an elementary proof of \ref{alg:Euler-iteration},
see e.g. \cite[p. 27]{Henrici1962discrete} or section \ref{sub:CLrem},
for an elementary proof of \ref{alg:Moore-first-order} see section
\ref{sub:IA-approximate-solution}. However, we think it is less cumbersome
(and more enlightening w.r.t. portings) to use a more general framework.
\end{rem}

\subsubsection{General one-step theory: Fixed stepsize\label{sub:Fixed-stepsize-variety}\label{sub:A-priori-bound}}

\begin{rem}
Mostly based on \cite[pp. 64-65,73-74]{Henrici1962discrete}, our
$M$ is his $N$, some terminology has been adopted from \cite{Chartres1972ageneral,Schaefer2006skriptum}.
\end{rem}
\begin{defn}
The following definitions are useful.
\end{defn}
\begin{itemize}
\item \emph{Region}: Let $A$ be a strip such that $a=x_{0}\le x\le b$,
$-\infty<y<\infty$.
\item \emph{Target IVP:} Let $y(x)$ be the solution of $y'=f(x,y)$, $y(x_{0})=y_{0}$.
\item \emph{Adjacent vector field IVPs}: If $(x,y)$ is an arbitrary point
in $A$, for $t\in[x,b]$ denote the IVP by $z'=f(t,z)$, $z(x)=y$.
(Remark: existence and uniqueness is guaranteed by the Picard-Lindelöf
theorem once a Lipschitz condition on the exact solution is satisfied,
see conditions below).
\item \emph{Exact increment}\index{exact increment}: Let $\Delta_{h}(x,y):=\frac{y(x+h)-y}{h}$,
for $h>0$, $f(x,y)$ if $h=0$. 
\item \emph{Approximated increment}\index{approximated increment}: Let
$\Phi_{h}(x_{n},y_{n}):=\frac{y_{n+1}-y_{n}}{h}$.
\end{itemize}
\begin{condition}
Moreover, we need the following:
\end{condition}
\begin{itemize}
\item \emph{Continuity}: Let $f(x,y)$ be defined and continuous in $A$.
Let $\Phi_{h}(x_{n},y_{n})$ be continuous (jointly as a function
of its three arguments) in the region defined by $A$, $0\le h\le H$
where $H>0$.
\item \emph{Asymptotic stability:} Let there exist a constant $L$ such
that $|\Phi_{h}(x,z)-\Phi_{h}(x,y)|\le L|z-y|$ for all $(x,y,h)$
and $(x,z,h)$ in $A$, as a special case this includes a \emph{Lipschitz
condition on the exact solution:} there is an $L_{1}\le L$ such that
for $x\in[a,b]$ any two $y,z$: $|f(x,z)-f(x,y)|\le L_{1}|z-y|$. 
\item \emph{Consistency condition on local discretization error}: Let there
exist constants $M\ge0$ and $p\ge0$ such that $\tau_{h}:=|\Phi_{h}(x,y(x))-\Delta_{h}(x,y(x))|\le Mh^{p}$
where $x\in[a,b];h\le h_{max}$. 
\end{itemize}
\begin{thm}
(Consistency + asymptotic stability $\rightarrow$ bound): In the
situation given by the definitions and where the conditions hold,
then for $x_{n}\in[a,b]$ and arbitrary $h\le H$, $|y_{n}-y(x_{n})|\le h^{p}ME_{L}(x_{n}-x_{0})$
where $E_{L}$ is a \emph{Lipschitz function}, that is $E_{L}(x)=\frac{e^{Lx}-1}{L}$
for $L>0$ and $E_{L}(x)=x$ for $L=0$. \label{thm:fmwk}
\end{thm}
\begin{proof}
We have (1) $y_{n+1}-y_{n}=h\Phi_{h}(x_{n},y_{n})$, (2) $y(x_{n+1})=y(x_{n})+h\Delta(x_{n},y(x_{n}))$,
(3) $e_{n}=y_{n}-y(x_{n})$, and (4) $e_{n+1}=y_{n+1}-y(x_{n}+1)$.
Subtract (2) from (1) to get $y_{n+1}-y_{n}-y(x_{n+1})=h\Phi_{h}(x_{n},y_{n})-y(x_{n})-h\Delta(x_{n},y(x_{n}))$,
plug in (3) and (4) to get 

\begin{eqnarray*}
e_{n+1} & = & e_{n}+h\Phi_{h}(x_{n},y_{n})-h\Delta_{h}(x_{n},y(x_{n}))\\
 & = & e_{n}+h[\Phi_{h}(x_{n},y_{n})-\Phi_{h}(x_{n},y(x_{n}))+\Phi_{h}(x_{n},y(x_{n}))-\Delta_{h}(x_{n},y(x_{n}))]\\
 & = & e_{n}+h[\Phi_{h}(x_{n},y_{n})-\Phi_{h}(x_{n},y(x_{n}))+\tau_{h}]\end{eqnarray*}
By $|\Phi_{h}(x,z)-\Phi_{h}(x,y)|\le L|z-y|$ and $\tau_{h}\le Mh^{p}$
obtain $|e_{n+1}|\le|e_{n}|+hL|e_{n}|+h^{p+1}M$. This inequality
is of the form $|e_{n+1}|\le a|e_{n}|+b$ where $a=1+hL$, $b=h^{p+1}M$. 

Now apply the recurrent inequality \ref{thm:rec-ineq}, and use that
$e_{0}=0$:

Case $L\ne0$:

\begin{eqnarray*}
|e_{n}| & \le & |e_{0}|(1+hL)^{n}+((1+hL)^{0}+\ldots+(1+hL)^{n-1})\cdot h^{p+1}M\\
 & = & ((1+hL)^{0}+\ldots+(1+hL)^{n-1})\cdot h^{p+1}M\\
 & \le & h^{p+1}M\frac{e^{nhL}-1}{Lh}\\
 & = & h^{p}M\frac{e^{nhL}-1}{L}\\
 & = & h^{p}M\frac{e^{(x_{n}-x_{0})L}-1}{L}\end{eqnarray*}


Case $L=0$:

\begin{eqnarray*}
|e_{n}| & \le & nh^{p+1}M\end{eqnarray*}

\end{proof}
\begin{rem}
Thus the order of convergence is the order of consistency: $|e_{n}|\in O(h^{p})$.
(In the case $L=0$ it is even better.) 

If, e.g. for constructive motivations or for the following corollary,
one wants to avoid the case distinction on $L=0$, then one always
can choose $L\ge2^{-k}>0$.
\end{rem}
\begin{cor}
Given a specified error limit $e_{max}$ and $L>0$, one can choose
\begin{eqnarray*}
N & \ge & (\frac{x_{n}-x_{0}}{(e_{max})^{\frac{1}{p}}})M\frac{e^{(x_{n}-x_{0})L}-1}{L}\end{eqnarray*}
 to create an equidistant partition $x_{0}<x_{1}=x_{0}+h<\ldots<x_{N}=x_{0}+N\cdot h=b$
such that for all $n=1,\ldots,N$ the total error after the $n$th
step $e_{n}$ is less than \textup{$e_{max}$.}
\end{cor}
\begin{proof}
We want that $h^{p}M\frac{e^{(x_{n}-x_{0})L}-1}{L}=(\frac{x_{n}-x_{0}}{N})^{p}M\frac{e^{(x_{n}-x_{0})L}-1}{L}\le(e_{max})^{\frac{1}{p}}$,
hence $(\frac{x_{n}-x_{0}}{N})M\frac{e^{(x_{n}-x_{0})L}-1}{L}\le(e_{max})^{\frac{1}{p}}$,
this is satisfied by $N\ge(\frac{x_{n}-x_{0}}{(e_{max})^{\frac{1}{p}}})M\frac{e^{(x_{n}-x_{0})L}-1}{L}$.
\end{proof}

\subsubsection{Application of \ref{thm:fmwk}.}

\begin{thm}
\label{thm:modeulerL} Let $L_{k}=\sup_{x\in[a,b],y\in[-\infty,\infty]}|\frac{\partial}{\partial y}f^{(k)}(x,y)$.
For the modified Euler method, $L\le(1+\frac{hL_{0}}{2})L_{0}$.
\end{thm}
\begin{proof}
\cite[p. 74-75]{Henrici1962discrete}. \begin{eqnarray*}
L & |\dot{y}-y|\le & |f(x+\frac{h}{2},\dot{y}+\frac{h}{2}f(x,\dot{y})-f(x+\frac{h}{2},y+\frac{h}{2}f(x,y))|\\
 & \le & L_{0}|\dot{y}-y+\frac{h}{2}(f(x,\dot{y})-f(x,y))|\\
 & \le & (1+\frac{hL_{0}}{2})L_{0}|\dot{y}-y|\end{eqnarray*}

\end{proof}
\begin{thm}
Let $L_{k}=\sup_{x\in[a,b],y\in[-\infty,\infty]}|\frac{\partial}{\partial y}F^{(k)}(x,y)$.
For Moore's first order method, $L\le(1+hA)L_{0}$.
\end{thm}
\begin{proof}
Adapt \ref{thm:modeulerL} to interval analysis: \begin{eqnarray*}
L & |Y_{n}|\le & |F(X_{n},[0,h]\cdot F(A))|\\
 & \le & L_{0}|Y_{n}+[0,h]\cdot F(A)|\\
 & \le & (1+hA)L_{0}|Y_{n}|\end{eqnarray*}

\end{proof}
\begin{thm}
For the modified Euler method, an $M$ can be found.
\end{thm}
\begin{proof}
Given in \cite[p. 76]{Henrici1962discrete}.
\end{proof}
\begin{claim}
We also can find an $M$ for the Moore first order method. Moreover,
any one step method that can be analysed by the framework can be ported
from $IA$ to $AA$ and $MA$.

Rationale: \cite{Henrici1962discrete} gives $L$s and $M$s for quite
a few similar methods of both low and higher order (although it is
tedious). We also see that the framework is mainly based on the algebraic
(representation-independent) property of \ref{thm:rec-ineq}. Conversion
of a proof in a similar form based on the fundamental lemma from $AA$
to $MA$ is demonstrated in \cite{Schwichtenberg2005constructive}.
\end{claim}

\subsection{Higher order methods}


\subsubsection{Taylor theorem\label{sub:Taylor's-theorem-kth}}

\begin{thm}
Taylor theorem with Lagrange remainder, one dimension: \label{thm:taylor1dim}Let
$f:I\rightarrow\mathbb{R}$ be $(k+1)$ times differentiable. Moreover
let $x_{0}\in I$ be fixed. Then for each $x\in I$ there is at least
one $\xi$ between $x_{0}$ and $x$ such that

$\varphi(x)=\sum_{i=0}^{k}\frac{\varphi^{(i)}(x_{0})}{i!}(x-x_{0})^{i}+R_{k+1}(x)$

where

$R_{k+1}(x)=\frac{\varphi^{(k+1)}(\xi)}{(k+1)!}(x-x_{0})^{k+1}$ is
the remainder.
\end{thm}
\begin{proof}
If $x=x_{0}$ then $R_{k+1}(x)=0$ and the theorem holds.

Let $x\ne x_{0}$, then we are looking for a $c\in\mathbb{R}$ such
that: 

$\varphi(x)=\sum_{i=0}^{k}\frac{\varphi^{(i)}(x_{0})}{i!}(x-x_{0})^{i}+c(x-x_{0})^{k+1}$. 

Now fix $x$ and $c$, then we get 

$\tilde{\varphi}(z)=\sum_{i=0}^{k}\frac{\varphi^{(i)}(z)}{i!}(x-z)^{i}+c(x-z)^{k+1}$
on $I$. Hence $\tilde{\varphi}(x)=\varphi(x)$ and $\tilde{\varphi}(x_{0})=\varphi(x)$,
hence $\tilde{\varphi}(x)=\tilde{\varphi}(x_{0})$. According to Rolle's
theorem there exists a $\xi$ between $x$ and $x_{0}$ with $\tilde{\varphi}'(\xi)=0$.
Then $\tilde{\varphi}'(z)$ for arbitrary $z\in I$ has the value
\begin{eqnarray*}
\tilde{\varphi}'(z) & = & (\frac{\varphi^{(0)}(z)}{0!}+\frac{\varphi^{(1)}(z)}{1!}(x-z)+\ldots+\frac{\varphi^{(k)}(z)}{k!}(x-z)^{k}+c(x-z)^{k+1})'\\
 & = & (\frac{\varphi^{(1)}(z)}{0!}+(\frac{\varphi^{(2)}(z)}{1!}(x-z)+\frac{\varphi^{(1)}(z)}{1!}(-1))\\
 &  & +(\frac{\varphi^{(3)}(z)}{2!}(x-z)^{2}+\frac{\varphi^{(2)}(z)}{2!}(-2)(x-z))\ldots\\
 &  & +(\frac{\varphi^{(n+1)}(z)}{n!}(x-z)^{n}+\frac{\varphi^{(k)}(z)}{k!}(-k)(x-z)^{k})\\
 &  & +(0+c(x-z)^{k+1}(-k-1)))\\
 & = & \frac{\varphi^{(k+1)}(z)}{k!}(x-z)^{k}-c(k+1)(x-z)^{k}\end{eqnarray*}
When $z=\xi$ we have $\tilde{\varphi}'(z)=0,$ hence $c=\frac{\varphi^{(k+1)}(z)}{(k+1)!}$.
\end{proof}
\begin{rem}
Proofs of the Taylor theorem are of course ubiquitous (although those
leading to the Cauchy integral remainder seem to be somewhat more
in fashion than the Lagrange variety), for the record the above version
has been based on \cite{Barwolff2006hohere}.
\end{rem}

\subsubsection{Runge lemma to compute Taylor coefficients $\varphi^{(k)}(x)$.}

\begin{rem}
At a first look, because $f(x,y)$ is a function of two variables,
a two-dimensional version of the Taylor theorem might be considered.
However, given that $y(x)$ is dependent on $x$, we define $\varphi(x)=f(x,y(x)),$
so that it is sufficient to look at how the system evolves depending
on $x$, and once we have the derivatives $\frac{d^{k}\varphi}{dx^{k}}$
we can directly apply the Taylor theorem for one dimension. (Incidentally,
proofs of the Taylor theorem for several dimensions such as in \cite{Corwin1979calculus}
follow a similar strategy: introduce a parametrization of the several
variables to get a dependence on a single variable, then also apply
the Taylor theorem for one dimension.)
\end{rem}
\begin{lem}
\label{lem:runge}(Runge) Let $\varphi(x)=f(x,y(x))$. Then $\frac{d\varphi}{dx}=\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f)$.
\end{lem}
\begin{proof}
Let $Y(x)=y(x)$ and $X(x)=x$. Then $\varphi(x)=f(Y(x),X(x))$. Hence
by the chain rule, $\frac{d}{dx}\varphi=\frac{\partial f}{\partial X}\frac{dX}{dx}+\frac{\partial f}{\partial Y}\frac{dY}{dx}=\frac{\partial f}{\partial x}\frac{dx}{dx}+\frac{\partial f}{\partial y}\frac{dy}{dx}=\frac{\partial f}{\partial x}+\frac{\partial f}{\partial y}f$. 
\end{proof}
\begin{cor}
Let $\varphi(x)=f(x,y(x))$. Then $\frac{d^{k+1}}{dx^{k+1}}\varphi=\frac{\partial}{\partial x}(\frac{d^{k}}{dx^{k}}\varphi)+(f)\cdot\frac{\partial}{\partial y}(\frac{d^{k}}{dx^{k}}\varphi)$.
\end{cor}
\begin{proof}
Induction, base case see \ref{lem:runge}. Step case: again, let $Y(x)=y(x)$
and $X(x)=x$. Let $\varphi(x)=\frac{d^{k}}{dx^{k}}\varphi$. Hence
by the chain rule, $\frac{d^{k+1}}{dx^{k+1}}\varphi=\frac{\partial}{\partial X}(\frac{d^{k}}{dx^{k}}\varphi)\frac{dX}{dx}+\frac{\partial}{\partial Y}(\frac{d^{k}}{dx^{k}}\varphi)\frac{dY}{dx}=\frac{\partial}{\partial x}(\frac{d^{k}}{dx^{k}}\varphi)\frac{dx}{dx}+\frac{\partial}{\partial y}(\frac{d^{k}}{dx^{k}}\varphi)\frac{dy}{dx}=\frac{\partial}{\partial x}(\frac{d^{k}}{dx^{k}}\varphi)+(f)\cdot\frac{\partial}{\partial y}(\frac{d^{k}}{dx^{k}}\varphi)$. 
\end{proof}
\begin{example}
Let us look at some examples of nested taylor coefficients:\label{exa:horner-terms}
\end{example}
\begin{itemize}
\item $\varphi(x)=f(x,y(x))$
\item $\frac{d\varphi}{dx}=\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f)$
\item $\frac{d^{2}\varphi}{dx^{2}}=\frac{\partial}{\partial x}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f))+(f)\cdot\frac{\partial}{\partial y}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f))$
\item $\frac{d^{3}\varphi}{dx^{3}}=\frac{\partial}{\partial x}(\frac{\partial}{\partial x}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f))+(f)\cdot\frac{\partial}{\partial y}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f)))+(f)\cdot\frac{\partial}{\partial y}(\frac{\partial}{\partial x}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f))+(f)\cdot\frac{\partial}{\partial y}(\frac{\partial}{\partial x}(f)+(f)\cdot\frac{\partial}{\partial y}(f)))$
\end{itemize}
\begin{algorithm}
Extracting the computational content of \ref{lem:runge} gives the
\emph{Runge operator}: $\mathcal{R}:=\frac{\partial}{\partial x}+f\frac{\partial}{\partial y}$.\label{alg:Runge-op}
\end{algorithm}
\begin{rem}
We have called \ref{lem:runge} the \emph{Runge lemma} because its
operator \ref{alg:Runge-op} is used in \cite[p. 287]{Runge1924vorlesungen}.
\end{rem}

\subsubsection{Detour: Implementation and higher-order Taylor coefficients as monomials\label{sub:Generation-of-Taylor}}

\begin{rem}
For \cite{Blasum2006moore}, we have first (taylor\_tower.py) generated
Taylor coefficients as strings (by repeated application of the Runge
operator \ref{alg:Runge-op}) and then used the GiNaC \cite{Bauer2006ginac}
computer algebra system (CAS) via its swiginac python interface \cite{Skavhaug2006}
for symbolic differentiation and floating-point evaluation of the
directly terms. Once the terms had been evaluated into floating point,
further calculation was done with standard python floating point IEEE-754
{}``double precision'' operations. Matplotlib \cite{Hunter2006matplotlib}
has been used for generation of plots.

One can see that the general terms become huge soon: The general one-and-half
pages long ninth derivative of which a special case is used in the
example by Moore \cite[p. 101]{Moore1966interval} is given in the
section \ref{taylor9}.

Moreover, \ref{exa:Direct-generation-of}we have also implemented
a representation of Taylor terms as a sum of monomials (taylor\_flat.py).
The expressions are even more unwieldy:
\end{rem}
\begin{example}
\label{exa:Direct-generation-of}Direct generation of a sum of monomials
($\varphi$ as in \ref{exa:horner-terms})
\end{example}
\begin{itemize}
\item $\frac{d\varphi}{dx}=(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot1)+(({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot1)$
(2 terms)
\item $\frac{d^{2}\varphi}{dx^{2}}=(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{0}\partial y^{2}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{1}\partial y^{1}}})\cdot2)+(({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot1)+(({\frac{\partial^{2}(f)}{\partial x^{2}\partial y^{0}}})\cdot1)$
(5 terms)
\item $\frac{d^{3}\varphi}{dx^{3}}=(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{3}(f)}{\partial x^{0}\partial y^{3}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{0}\partial y^{2}}})\cdot4)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{3}(f)}{\partial x^{1}\partial y^{2}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{3}(f)}{\partial x^{1}\partial y^{2}}})\cdot2)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{1}\partial y^{1}}})\cdot3)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{1}\partial y^{1}}})\cdot2)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{0}\partial y^{2}}})\cdot3)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{3}(f)}{\partial x^{2}\partial y^{1}}})\cdot1)+(({\frac{\partial^{0}(f)}{\partial x^{0}\partial y^{0}}})\cdot({\frac{\partial^{3}(f)}{\partial x^{2}\partial y^{1}}})\cdot2)+(({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot1)+(({\frac{\partial^{1}(f)}{\partial x^{0}\partial y^{1}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{2}\partial y^{0}}})\cdot1)+(({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{1}\partial y^{1}}})\cdot1)+(({\frac{\partial^{1}(f)}{\partial x^{1}\partial y^{0}}})\cdot({\frac{\partial^{2}(f)}{\partial x^{1}\partial y^{1}}})\cdot2)+(({\frac{\partial^{3}(f)}{\partial x^{3}\partial y^{0}}})\cdot1)$
(15 terms)
\end{itemize}
\begin{rem}
For the fourth, fifth, sixth, seventh, eighth differential of $\varphi$
we counted 51,156,579,2175 and 8131 terms respectively, hence these
will not be reproduced here. We have not further followed this alley.
\end{rem}
The generation of Taylor coefficients as described by \cite{Barton1971taylor,Chavez2006alibrary,Gibbons1960aprogram,Moore1965theautomatic}
uses a different method: differentiation is directly applied to the
terms of the formula's parse tree. That method has the drawback that
for each elementary function (sin, cos, exp, log, $\ldots$) a derivation
has to be implemented. Of course, it runs on much less memory than
our wasteful approach where in many examples the higher derivatives
will just vanish, e.g. in the explosion equation $y'=y^{2}$ that
was chosen as an example by Moore (see below).

However, our brute-force approach also revealed a weakness of the
Taylor coefficient method where (when the ODE depends on a right-hand
side containing both the dependent as well as independent variable)
most of the terms will actually have to be produced.

In a practical implementation, instead of writing stand-alone code
for each elementary function, one also could attempt to use built-in
derivations and series expansions of a CAS (such as provided by GiNaC),
but these do not (yet) always allow a distinction between the dependent
and independent variable.


\subsubsection{General $k$-th order methods\label{sub:kmoore}}

\begin{defn}
$k$-th order Approximation of an IVP equation by Taylor's theorem:

$\Phi(x)=\sum_{i=0}^{k}\frac{f^{(i-1)}(x_{0})}{i!}(x-x_{0})^{i}+R_{k+1}(x)$
where $R_{k+1}(x)=\frac{f^{(k)}(\xi)}{(k+1)!}(x-x_{0})^{k+1}$ is
the remainder.
\end{defn}
\begin{rem}
Rationale for the definition: set $f^{(i)}=\varphi^{(i+1)}$ in \ref{thm:taylor1dim}. 

For the approximation of linear problems, the Taylor series approximation
is a generalization of the approximation by tangents (the approximation
by tangents can be seen as a Taylor development of order $1$). Already
Euler had considered using a Taylor approximation of order $k$ instead
of a tangent approximation in \cite{Euler1913institutio}. 
\end{rem}

\subsubsection{Runge-Kutta: Motivation and generalities \label{sub:Runge-Kutta-methods}}

\begin{rem}
We have seen that methods with high local accuracy also may carry
over to methods with high global accuracy. Therefore, one can use
Taylor methods to achieve high local accuracy.

However, in section \ref{sub:Generation-of-Taylor} it was shown that
Taylor expansions of high order soon become difficult to handle. For
motivation (which is not needed for the numerical analyst, but might
be of interest to the logician), we follow \cite{Hairer1987solving}:

When there is solely an independent variable (quadrature problem),
an estimation on an equidistant partition of an interval $x_{0},x_{1},x_{2},\ldots$
of accuracy in $O(h^{2})$ can be obtained by the midpoint rule:

\begin{eqnarray*}
y_{1} & = & y_{0}+hf(x_{0}+\frac{h}{2})\\
y_{2} & = & y_{1}+hf(x_{1}+\frac{h}{2})\\
 & \cdots\\
y_{n-1} & = & y_{n-2}+hf(x_{n-2}+\frac{h}{2})\end{eqnarray*}


When this is generalized to a differential equation, we would like
to include the dependent variable, to have:

\begin{eqnarray*}
y_{1} & = & y_{0}+hf(x_{0}+\frac{h}{2},y_{0}(x+\frac{h}{2}))\\
y_{2} & = & y_{1}+hf(x_{1}+\frac{h}{2},y_{1}(x+\frac{h}{2}))\\
 & \cdots\\
y_{n-1} & = & y_{n-2}+hf(x_{n-2}+\frac{h}{2},y_{n-2}(x+\frac{h}{2}))\end{eqnarray*}


By one small $(\frac{h}{2})$ Euler step obtain $y_{0}(x+\frac{h}{2}):=y_{0}+\frac{h}{2}f(x_{0},y_{0})$,
hence get 

\begin{eqnarray*}
k_{1} & = & f(x_{0},y_{0})\\
k_{2} & = & f(x_{1}+\frac{h}{2},y_{0}+\frac{h}{2}k_{1})\\
y_{1} & = & y_{0}+hk_{2}\end{eqnarray*}


This is also written as Butcher array as:

\begin{center} \renewcommand\arraystretch{2} 
\begin{tabular}{c|cc} $c_{1}=0$& & \tabularnewline $c_{2}=\frac{1}{2}$& $a_{21}=\frac{1}{2}$& \tabularnewline \hline & $b_{1}=0$& $b_{2}=1$\tabularnewline \end{tabular} 
\end{center} 

The left column entries of this Butcher array $(c_{1},c_{2})$ and
refer to the offset (times $h$) of $x$, the upper right square ($a_{21}$)
refers to the weight of the previous substeps in the next evaluation,
the bottom row entries $(b_{1},b_{2})$ refer to the weight to in
the next full step.

A popular higher-order method is RK4 where the Butcher array

\begin{center} \renewcommand\arraystretch{2} 
\begin{tabular}{c|cccc} $0$& & & & \tabularnewline $\frac{1}{2}$& $\frac{1}{2}$& & & \tabularnewline $\frac{1}{2}$& $0$& $\frac{1}{2}$& & \tabularnewline $1$& $0$& $0$& $1$& \tabularnewline \hline & $\frac{1}{6}$& $\frac{2}{6}$& $\frac{2}{6}$& $\frac{1}{6}$\tabularnewline \end{tabular}
\end{center} 

becomes:

\begin{eqnarray*}
\kappa_{1} & := & h\cdot f(x_{n},y_{n})\\
\kappa_{2} & := & h\cdot f(x_{n}+\frac{h}{2},y_{n}+\frac{1}{2}\kappa_{1})\\
\kappa_{3} & := & h\cdot f(x_{n}+\frac{h}{2},y_{n}+\frac{1}{2}\kappa_{2})\\
\kappa_{4} & := & h\cdot f(x_{n}+h,y_{n}+\kappa_{3})\\
y_{n+1} & := & y_{n}+\frac{\kappa_{1}+2\kappa_{2}+2\kappa_{3}+\kappa_{4}}{6}\end{eqnarray*}


The order of accuracy can be estimated by Taylor development of the
approximation versus the Taylor development of the exact formula (which
serves as construction guideline).
\end{rem}

\subsubsection{Taylor versus Runge-Kutta: $\frac{dy(x)}{dx}=y^{2}(x)$ and equations
from Hull test suite\label{sub:Hull}}

\begin{rem}
The aim of this subsection is to countercheck by implementation the
accuracy of our Taylor approximation, it is \emph{not} to provide
a serious benchmark. We take $\frac{dy(x)}{dx}=y^{2}(x)$ ({}``explosion
equation'') with the initial value $y(0)=1$ as an example.

In the diagram the line is the analytical solution $\frac{1}{1-x}$,
the circles are the fourth order Taylor approximation, the triangles
are from Runge-Kutta RK4. In the diagram given first the circles and
triangles overlap, only in the second diagram (error in comparison
to the analytical solution) they become more separated.
\end{rem}
\includegraphics[width=10cm]{ode/ode-plot-Mooreexplosion-4th_values}

\begin{lyxcode}
Error:
\end{lyxcode}
\includegraphics[width=10cm]{ode/ode-plot-Mooreexplosion-4th_error}

\begin{rem}
We see that our Runge-Kutta implementation is slightly more accurate
than our Taylor implementation (both are of order four).
\end{rem}
\begin{lyxcode}
\newpage{}
\end{lyxcode}
\begin{rem}
Moreover, Hull \cite{Hull1972comparing} gives examples for testing
ODE solvers, from which we have picked examples consisting of a single
equation (not of a system of equations). On most of the examples,
Runge-Kutta was slightly better, for the record the logistic curve
was an exception. (Again, as mentioned before, the aim of the implementation
was to check the correctness of Taylor coefficient generation, not
to do representative benchmarks - those should have also included
systems of equations.)

Logistic curve $y'=y/4(1-y/20)$

In the diagram the line is the analytical solution $\frac{20}{1+19\cdot e^{-x^{4}}}$,
the circles are the fourth order Taylor approximation, the triangles
are from Runge-Kutta RK4.
\end{rem}
\begin{lyxcode}
\includegraphics[width=10cm,keepaspectratio]{ode/ode-plot-Hulllogisticcurve-4th_values}

Error:

\includegraphics[width=10cm,keepaspectratio]{ode/ode-plot-Hulllogisticcurve-4th_error}
\end{lyxcode}
\begin{rem}
Henrici's framework \ref{thm:fmwk} also can be applied to Runge-Kutta
methods (see \cite{Henrici1962discrete}).

Interest in Runge-Kutta methods has been very broad. For surveys on
Taylor methods see \cite{Barton1971taylor,Moore1994numerical} and
\cite[pp. 146-151]{Butcher1987thenumerical}. Approximation of ODEs
by Taylor series is recommended by \cite{Fehlberg1963rungekutta,Fehlberg1964zurnumerischen}
if frequent change of step size in stiff equations makes Runge-Kutta
difference schemes infeasible. Also, more recently, physicists have
been using Taylor models in beam theory \cite{Chavez2006alibrary}.
\end{rem}
\begin{lyxcode}

\end{lyxcode}

\subsection{Variable stepsize}


\subsubsection{Runge-Kutta: Error estimation and step size adaption}

\begin{rem}
Calculating the exact Taylor development of a formula for error checking
is tedious. One can get an error estimate by calculating two approximate
solutions $y_{n}$ and $\hat{y}_{n}$ so that $y_{n}-\hat{y}_{n}$
gives an estimate of the less accurate of the two solutions (let $y_{n}$
designate less accurate one, $\hat{y_{n}}$ the more accurate one).
One may now specify a total error $e$, and choose a step size $\tilde{h}$
such that $(\frac{\tilde{h}}{h})^{k}|\hat{y}_{n}-y_{n}|\approx e$.
\end{rem}
\begin{defn}
The method of Dormand/Prince RK5(4)7M (also called: DOPRI 5(4)) \cite{Ascher1998computer,Dormand1980amethod}
is given by the following Butcher array. $\hat{b}_{i}$ is given in
the last row, $b_{i}$ in the preceding row. Use $\hat{b}_{i}$ for
carrying through calculations, $b_{i}$ only for error checking: evaluate
$k_{i}=h_{n}f(\hat{y}_{n}+\sum_{j=1}^{i-1}a_{ij}k_{j})$ and compare 

\begin{eqnarray*}
y_{n+1} & = & \hat{y}_{n}+\sum_{i=1}^{m}b_{i}k_{i}\\
\hat{y}_{n+1} & = & \hat{y}_{n}+\sum_{i=1}^{m}\hat{b}_{i}k_{i}\end{eqnarray*}


as defined in \begin{center} \renewcommand\arraystretch{2} \begin{tabular}{c|ccccccc} $0$& & & & & & & \tabularnewline $\frac{1}{5}$& $\frac{1}{5}$& & & & & & \tabularnewline $\frac{3}{10}$& $\frac{3}{10}$& $\frac{9}{40}$& & & & & \tabularnewline $\frac{4}{5}$& $\frac{44}{45}$& $-\frac{56}{15}$& $\frac{32}{9}$& & & & \tabularnewline $\frac{8}{9}$& $\frac{19372}{6561}$& $-\frac{25360}{2187}$& $\frac{64448}{6561}$& $-\frac{212}{729}$& & & \tabularnewline $1$& $\frac{9017}{3168}$& $-\frac{355}{33}$& $\frac{46732}{5247}$& $\frac{49}{176}$& $-\frac{5103}{18656}$& & \tabularnewline $1$& $\frac{35}{384}$& $0$& $\frac{500}{1113}$& $\frac{125}{192}$& $-\frac{2187}{6784}$& $\frac{11}{84}$& \tabularnewline \hline & $\frac{5179}{57600}$& $0$& $\frac{7571}{16695}$& $\frac{393}{640}$& $-\frac{92097}{339200}$& $\frac{187}{2100}$& $\frac{1}{40}$\tabularnewline \hline & $\frac{35}{384}$& $0$& $\frac{500}{1113}$& $\frac{125}{192}$& $-\frac{2187}{6784}$& $\frac{11}{84}$& $0$\tabularnewline \end{tabular} \end{center} 
to decide whether $y_{n+1}-\hat{y}_{n+1}$ is larger than a prescribed
error bound. If so, set $h_{n+1}=\frac{h_{n}}{2}$. On the other hand,
try double $h_{n+1}$ if within the error bound. More elaborate strategies
are possible.
\end{defn}
\begin{rem}
The $\hat{b}_{i}$ have order $4$, the $b_{i}$ have order $5$.
\end{rem}

\subsubsection{$k$-th order method by Moore \cite[p. 100]{Moore1966interval} \cite[p. 98]{Moore1965theautomatic}}

\begin{rem}
The $k$-th order method is a higher order variable stepsize method
with alterations on the interval $A$ where it is defined, so that
(unlike the first order method) even with $k=1$ it behaves like a
modified Euler method \ref{alg:modif-euler}.
\end{rem}
\begin{defn}
Define $F$ as in section \ref{sub:IA-approximate-solution}. In addition
assume differentiability and the Lipschitz condition $w(F^{(j)}(Y))\le L_{j}w(Y)$.
We now march through the interval:

Rather than subdividing the interval into equal sections as in \ref{sub:IA-approximate-solution}
we choose $h=(\frac{\epsilon|y_{o}|k!}{|F^{(k)}(y_{0})|})^{\frac{1}{k+1}}$
if $(\frac{\epsilon|y_{o}|k!}{|F^{(k)}(y_{0})|})^{\frac{1}{k+1}}<\frac{|y_{0}|}{|F^{(0)}(y_{0})|}$
or $h=(\frac{\epsilon|F^{(0)}(y_{0})|k!}{|F^{(k)}(y_{0})|})^{\frac{1}{k}}$
otherwise. 

We put $A^{(0)}=y_{0}+F^{(0)}(y_{0})[0,h]$.

If $Y([0,h])\subseteq A^{(0)}$, then set $A^{(1)}=Y([0,h])$ and
compute $\frac{F^{(K-1)}(A^{(1)})}{K!}\subseteq\frac{F^{(K-1)}(A^{(0)})}{K!}$.
We take as our interval solution in $[0,h]$ the interval-vector function
$Y(x)=y_{0}+\frac{F^{(K-1)}(A^{(1)})}{K!}x^{K}+\sum_{j=1}^{K-1}\frac{F^{(j-1)}(y_{0})}{j!}x^{j}$.

If $Y([0,h])\not\subseteq A^{(n)}$, then in alternating mode put
$A^{(n+1)}=Y([0,h])$ and (again) compute $Y([0,h])$ from $Y(x)=y_{0}+\frac{F^{(K-1)}(A^{(1)})}{K!}x^{K}+\sum_{j=1}^{K-1}\frac{F^{(j-1)}(y_{0})}{j!}x^{j}$;
otherwise replace $h$ by $\frac{h}{2}$ and replace $A^{(n+1)}$
by $Y([0,h])$ instead of $A^{(1)}$ until we reach quantities such
that $Y([0,h])\subseteq A^{(r)}$. (It is guaranteed that this condition
will be reached by the Lipschitz condition.) 

It may take several shrinkings of $h$ and increasements of $A$ until
a fitting pair of $h$,$A$ is selected.
\end{defn}
\begin{rem}
For the generation of Taylor coefficients, it suffices to have a notion
of derivatives of polynomials (such as given in \ref{sub:Derivative}).
\end{rem}

\paragraph{Pseudocode (Numbering of assignments (10-...) follows the description
(pp. 102-105) and the flow chart (p. 142) in \cite{Moore1966interval}.)}

\begin{lyxcode}
function~taylor($x,A,F,y_{0}$)~\{~

//$x$~is~point~of~approximation,~$A$~is~the~domain

//$F$~the~taylor~coefficients,~$y_{0}$~the~IC

~~~~$Y(x):=y_{0}+\frac{F^{(K-1)}(A)}{K!}x^{K}+\sum_{j=1}^{K-1}\frac{F^{(j-1)}(y_{0})}{j!}x^{j}$~~//10-40

~~~~return~$Y(x)$

\noindent \}



\noindent function~step($F,y_{0},\epsilon,h,p$)~\{

~~~~//~$F$~is~the~ODE,~$y_{0}$~the~IC,~$\epsilon$~the~error

~~~~//~$h$~the~progression~done,~$p$~are~the~desired~points

~~~~$alternate:=1$

~~~~//10-38

~~~~if~($(\frac{\epsilon\cdot|y_{0}|)}{|F^{(K)}(y_{0})/K!|})<\frac{|y_{0}|}{|F^{(0)}(y_{0})|}$)~then~

~~~~~~~~\{$h:=(\frac{\epsilon|y_{o}|}{|F^{(K)}(y_{0})/K!|})^{\frac{1}{K+1}}$\}

~~~~else

~~~~~~~\{$h:=(\frac{\epsilon|F^{(0)}(y_{0})|}{|F^{(K)}(y_{0})/K!|})^{\frac{1}{K}}$\}

~~~~$A:=F^{(0)}(y_{0})[0,h]$~//10-39

~~~~$Y:=taylor([0,h],A,F,y_{0})$~//10-40

~~~~while~$(Y\subseteq A)$~\{

~~~~~~~~//bad~luck,~$Y\not\subseteq A$,~try~correction~method

~~~~~~~~if~$(alternate=1)$~then~\{

~~~~~~~~~~~$A:=taylor([0,h],A,F,y_{0})$~//first~alternat.

~~~~~~~~~~~\textrm{$Y:=taylor$}$([0,h],A,F,y_{0})$

~~~~~~~~\}~else~\{

~~~~~~~~~~~~$h:=\frac{h}{2}$~//second~alternative

~~~~~~~~~~~~\textrm{$Y:=taylor$}$([0,h],A,F,y_{0})$

~~~~~~~~\}

~~~~~~~~$alternate:=3-alternate$~//swap~alternative~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//for~the~next~run

~~~~\}

~~~~for~each~point~$p_{i}$~in~$[0,h]$~\{~print~$(p_{j},Y(p_{i}))$~\}

~~~~$y_{0}:=taylor(h,A,F,y0,k)$

~~~~return~$(h,y_{0})$

~~~~\}

\}



\noindent function~main~()~\{

~~~~~//$f$~is~the~ODE,~$y_{0}$~the~IC,~$\epsilon$~the~allowed~error

~~~~~//$H$~the~desired~width,~$p$~vector~of~points~for~eval.

~~~~~//$k$~the~desired~width~of~Taylor~coefficients~

~~~~$F:=(F^{(0)},\ldots,F^{(K)}):=$taylor\_coefficients($f$,~$K$)~

~~~~~//$F$~is~the~approximated~ODE,~done~by~hand~in~exmpls

~~~~$h:=0$~

~~~~while~$(y_{0}[0]<H)$~\{

~~~~~~~$(progress,y_{0})$:=step~($F,y_{0},\epsilon,h$)

~~~~~$y_{0}:=y_{0}+progress$

~~~~~\}

\noindent \}
\end{lyxcode}
\begin{example}
(\cite{Moore1966interval}) Take $A=[0,2]$. Then $y(0)=y_{0}=1$
is interior to $A$. By the definition of the interval derivative
on polynomials (\ref{sub:Derivative}) we have $F(Y)=Y^{2}$, $F'(Y)=2!Y^{3},\ldots,F^{(n)}(Y)=(n+1)!Y^{n+2}$.
Then in 10-38 $h$ becomes $h=(\frac{\epsilon|y_{0}|}{|F^{(K)}(y_{0})/K!|})^{\frac{1}{K+1}}=(\frac{\epsilon*1}{|(K+1)!|/K!})^{\frac{1}{K+1}}=(\frac{\epsilon}{K+1})^{\frac{1}{K+1}}$
if $\epsilon<K+1$ (derived from $(\frac{\epsilon}{K+1})^{\frac{1}{K+1}}<1\rightarrow(\frac{\epsilon}{K+1})<1$).
Let $K>0$. Evaluation of $A$ (10-39) gives $A_{0}:=1+[0,h]=[1,1+h]$.
10-40 gives: \begin{eqnarray*}
Y_{0} & := & taylor(A_{0},([0,h]))\\
 & = & 1+[1,(1+h)^{K+1}][0,h]^{K}+\sum_{j=1}^{K-1}[0,h]^{j}\\
 & = & 1+[1,(1+h)^{K+1}h^{K}]+[0,h+h^{2}+\ldots+h^{K-1}]\\
 & = & [1,1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1}]\\
 & > & A_{0}\end{eqnarray*}
Hence bad luck, $Y_{0}\not\subseteq A_{0}$. 

The correction method branches into the first alternative to set $A_{1}:=Y_{0}=[1,1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1}]$.
\begin{eqnarray*}
Y_{1} & := & taylor(A_{1},([0,h]))\\
 & = & 1+[1,(1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1})^{K+1}][0,h]^{K}+\sum_{j=1}^{K-1}[0,h]^{j}\\
 & > & 1+[1,(1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1})^{2}h^{K}]\\
 & + & [0,h+h^{2}+\ldots+h^{K-1}]\\
 & > & 1+[1,(1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1})h^{K}+h^{K+1}]\\
 & + & [0,h+h^{2}+\ldots+h^{K-1}]\\
 & = & A_{1}+[0,h^{K+1}].\end{eqnarray*}


Hence again bad luck, $Y_{1}\not\subseteq A_{1}$.

The correction method now branches into the second alternative to
set $A_{1}:=Y_{0}=[1,1+h+h^{2}+\ldots+h^{K-1}+h^{K}(1+h)^{K+1}]$.

Set $h_{1}:=\frac{h}{2}$ and step through the step loop again. Now
10-40 gives: \begin{eqnarray*}
Y & := & taylor(Y([0,\frac{h}{2}]))\\
 & = & 1+[1,(1+\frac{h}{2})^{K+1}][0,\frac{h}{2}]^{K}+\sum_{j=1}^{K-1}[0,\frac{h}{2}]^{j}\\
 & = & 1+[1,(1+\frac{h}{2})^{K+1}(\frac{h}{2})^{K}]+[0,\frac{h}{2}+(\frac{h}{2})^{2}+\ldots+(\frac{h}{2})^{K-1}]\\
 & = & [1,1+\frac{h}{2}+(\frac{h}{2})^{2}+\ldots+(\frac{h}{2})^{K-1}+(\frac{h}{2})^{K}(1+\frac{h}{2})^{K+1}]\\
 & = & [1,\frac{1-(\frac{h}{2})^{k}}{1-\frac{h}{2}}+(\frac{h}{2})^{K}(1+\frac{h}{2})^{K+1}]\end{eqnarray*}


Assume $h\in(0,2]$. Then $Y\le[1,\frac{1-(\frac{(0,2]}{2})^{k}}{\frac{(0,2]}{2}}+(\frac{(0,2]}{2})^{K}(1+\frac{(0,2]}{2})^{K+1}]\le[1,\frac{1-(\frac{2}{2})^{k}}{\frac{2}{2}}+(\frac{2}{2})^{K}(1+\frac{2}{2})^{K+1}]$. 

$A=1+[0,h]$. For $\epsilon=10^{-9}$, $K=9$, we have $h=(10^{-10})^{\frac{1}{10}}=0.1$.
$w(Y(x))$ for $x\in[0,0.05]$ satisfies $w(Y(x))\le10^{-11}$.
\end{example}
\begin{rem}
At this point it would be nice (also w.r.t. to check our understanding
of Moore's algorithm) to run Moore's $k$-th order against DOPRI (not
done for lack of time).
\end{rem}

\subsection{Observations and conclusions}


\subsubsection{General one-step theory: Variable stepsize}

\begin{rem}
Again, little adoptions have to be made to look at methods of variable
stepsize \cite[p. 83]{Henrici1962discrete} via \ref{thm:fmwk}.
\end{rem}
\begin{lem}
Slope-based halving variable width-methods on a function with a Lipschitz
constant always terminate.
\end{lem}
\begin{proof}
Look at the worst-case behavior. Due to the Lipschitz constant $L$,
$h\ge2^{-ld(L)}$ ($ld$ defined in \ref{def:dyadiclog}).
\end{proof}
\begin{rem}
However, Moore's explosion example is exactly an example that is \emph{not}
globally bounded by a Lipschitz constant. It has a singularity at
$1$. Hence the algorithm can be run as long as one desires. Near
the singularity the process of the Moore expansion is dominated by
the behavior of $A$, hence it becomes an approximation in the vertical
direction. (To approximate singularities by analyzing $\frac{dx}{dy}$
instead of $\frac{dy}{dx}$ already had been proposed by \cite{Runge1924vorlesungen}).
\end{rem}

\subsubsection{Representation, realizability}

\begin{rem}
We have seen that some of the apparatus in \ref{thm:fmwk} is representation-independent.
Moreover, we note that all AA theorems on the convergence of Euler's
first order method from \cite{Hurewicz1958lectures} have been smoothly
ported to MA in \cite{Schwichtenberg2005constructive}. Moreover,
because the version of the fundamental equality is mr-realizable in
\cite{Schwichtenberg2005constructive} and this is what goes into
\ref{thm:fmwk}, we also can assume mr-realizability for one-step
ODE IVP methods.
\end{rem}
\begin{lem}
Fundamental representation lemma for IVP estimations. The exponential
part of the IVP problem error by two different representations is
identical if the Lipschitz condition is represented in the same way.
\end{lem}
\begin{proof}
Apply lemma \ref{lem:The-exponential-part}.
\end{proof}

\paragraph{Lipschitz conditions in AA, IA and MA}

\begin{defn}
In section \ref{sub:diffeqdef} for AA we stated that differential
equation satisfies a \emph{Lipschitz condition} if for different $\dot{y},\ddot{y}$
we have $f(x,\dot{y})-f(x,\ddot{y})\le L_{AA}|\dot{y}-\ddot{y}|$.

In IA this reads there is an $L_{IA}>0$ such that for all $Y\subseteq A$,
we have $w(F(Y))\le L_{IA}w(Y)$. \cite[p. 93]{Moore1966interval}

The MA version is $|f(x,\dot{y})-f(x,\ddot{y})|\le L_{MA}|\dot{y}-\ddot{y}|$.
\cite[p. 56]{Schwichtenberg2005constructive}.
\end{defn}
\begin{lem}
For intervals bounded by rationals, in IA, MA and AA the Lipschitz
condition is represented in the same way.
\end{lem}
\begin{proof}
$L_{MA}$ and $L_{AA}$ are exactly the same. Moreover, $w(F(Y))\le L_{IA}w(Y)$
implies $w(f(x,\dot{y})-f(x,\ddot{y}))\le L_{IA}w(\dot{y}-\ddot{y})$
for all $\dot{y},\ddot{y}$ by \ref{rem:Funonrat}. $f(x,\dot{y})-f(x,\ddot{y})\le L_{IA}|\ddot{y}-\dot{y}|$
if $\ddot{y}\ge\dot{y}$ (lemma \ref{lem:IA-abs-sign}). It is always
possible to pick $\dot{y},\ddot{y}$ appropriately.
\end{proof}
\begin{cor}
Representability of Lipschitz condition is sufficient for a representation
that is lossless in the exponential term.
\end{cor}

\section{Some Dialectica-realizable numerical analysis\label{sec:Some-Dialectica-realizable-numerical}}

\begin{quote}
Peano's theorem is equivalent over $RCA_{0}$ to weak König's lemma.
\cite[p. 155]{Simpson1999subsystems}
\end{quote}
In the previous section, some techniques for IVP have been omitted,
e.g. all implicit varieties of the methods. The reason for this omission
was that we did not want to have to solve nonlinear systems of equations
and stability was not an issue. However, because even the fundamental
theorem of algebra can be constructivized \cite{Kneser1981erganzung},
finding the roots of such systems is still possible in MR-realizable
analysis (although in practice using the bounds generated by constructive
FTA proof terms can lead to quite unwieldy terms). In partial differential
equations, constructive constructions for the heat and wave equations
has been given by \cite{Scedrov1984differential}. We now want to
have a take at less constructive principles.

Kohlenbach \cite{Kohlenbach1996analyzing} points out that many theorems
in classical analysis are of the form $\forall x\in X(F(x)=0\rightarrow G(x)=0)$
which in an approximated form reads as $\forall x\in X,k\in\mathbb{N}\exists n\in\mathbb{N}(|F(x)|\le2^{-n}\rightarrow|G(x)|<2^{-k})$.
As example among others he gives IVT, maximum value principle, Peano's
ODE existence theorem. It is immediate that this expression is monotone
that is when $n$ is replaced by $m\ge n$, then still $\forall x\in X,k\in\mathbb{N}\exists m\in\mathbb{N}(|F(x)|\le2^{-m}\rightarrow|G(x)|<2^{-k})$.
Kohlenbach singles out uniqueness theorems as candidates for realization
by the Dialectica interpretation. Of course all bounds of the form
$\forall n\ge N\rightarrow e<2^{-k}$ of the last chapter can be formulated
in uniqueness terms. However, the advantage of this procedure is that
it also can be applied where MR-realizability cannot be achieved.
To get a better feeling for the results of the Dialectica interpretation,
some of the Kohlenbach results in approximation theory are presented.


\subsection{Kohlenbach's bounds for approximation theory}

Kohlenbach gives examples for:

\begin{itemize}
\item Constructive bounds for $L_{\infty}$-approximation extracted from
a non-constructive proof by de la Vallée-Poussin \cite{Kohlenbach1993effective}.
\item Constructive bounds for $L_{1}$-approximation extracted from Cheney's
version of Jackson's proof \cite{Kohlenbach2002proof-l1}.
\end{itemize}
We will have a short look at the $L_{\infty}$-approximation case.


\subsubsection{$L_{\infty}$-approximation}

The existence problem for $L_{\infty}$(Chebyshev)-approximation is
handled by the Remes algorithm: arbitrarily pick $n+2$ points, solve
the system $f(x{}_{i})-p(x_{i})=E(-1)^{i}$ for $i=1,\ldots,n-2$
where $E$ is an error bound, $f(x_{i})$ the desired function, $p(x_{i})$
the approximating polynomial. Evaluate the function at other points
$p(\xi)$, and replace the nearest $x_{i}$ of the same error sign
by $\xi$ where $|f(\xi)-p(\xi)|\ge E$.

The uniqueness problem is more interesting: to what extent is a found
approximation unique? It has been covered in both a constructive setting
\cite{Bridges1981aconstructive} as well as in the Dialectica interpretation
\cite{Kohlenbach1993effective}. Because even the most pathological
examples of Chebyshev approximation have to be covered, one cannot
give direct uniqueness proofs of the polynomial itself but one can
give bounds on its unicity constant.


\subsection{Other results in approximation theory and numerical analysis that
have been first presented non-constructively, then constructively}

At this place we quickly list some results of a search for other results
of approximation theory and numerical analysis that have first been
posed non-constructively; afterwards iterative algorithms have been
developed that - given on has the time - might be interesting candidates
for proof-mining.

\begin{itemize}
\item $M$-splines have first been presented non-constructively \cite{Lucas1972msplines},
then constructivized \cite{Schaback1973konstruktion}.
\item The spectra of symmetric Toeplitz matrices have first been investigated
non-constructively \cite{Delsarte1994spectral}, then constructively
\cite{Chu1989canreal}.
\item Polynomial approximation of functions in Sobolev spaces has first
been treated non-constructively (Bramble-Hilbert lemma), then constructively
\cite{Dupont1980polynomial,Jovanic1987convergence}.
\end{itemize}

\subsection{Boundary value problems and the Dirichlet principle}

Henrici \cite[p. 345]{Henrici1962discrete} points out that for the
boundary problem $y''=f(x,y)$, $y(a)=A$, $y(b)=B$ it may happen
that there are infinitely many solutions for $y''+\pi^{2}y=0$, $y(0)=0$,
$y(1)=0$ ($y(x)=C\cdot\sin\pi x)$, but no solution for $y''+\pi^{2}y=0$,
$y(0)=0$, $y(1)=1$. This seems to point out that the within the
field of differential equations mathematical theory for boundary value
problems is much richer than for initial value problems. In general,
for boundary value problems \cite[p. 168]{Ascher1998computer} suggests
not to use explicit IVP methods but rather implicit methods (better
stability behavior).

Much constructive and non-constructive work has been done on the Dirichlet
principle. Originally, Dirichlet's problem is whether given a region
$G$ where $\Delta u=0$ and a harmonic boundary $\partial G$ where
$u=F$ always yields an existence of $u$. After being treated for
self-evident during long time 19th century potential theory, this
was subsequently revoked and refined to the condition that the boundary
had to be piecewise analytic \cite{Monna1975dirichlets}. Because
a lot of fine-tuning in the boundary conditions is possible (see \cite{Kellogg1953foundations}),
the minimization of functions on a compact function space by variational
principles looks like a good testbed for realizability. \cite{Bridges1998constructive}
uses the equivalent form $\Delta u=f$ on $G$ and $u=0$ on $\partial G$
where $f$ is an $L^{2}$-mapping on $G$. On the interval side, in
\cite[p. 102]{Moore1979methods} elliptic differential operators are
discussed.

\begin{conjecture}
\label{con:border} The Dialectica interpretation might be useful,
once we have non-local principles. One should be able to demonstrate
this on simple boundary value problems, or make use of the large amount
of work that has been done on the Dirichlet problem.
\end{conjecture}

\section{Conclusion and outlook}

We have tried to gather some evidence that the components of a (quite
randomly selected) heuristic method from interval analysis are representable
in modulus arithmetic in section and that they are representation-independent
as long as a Lipschitz condition can be expressed in \ref{sec:Some-MR-realizable-numerical}.
This could be more rigorously proven by actual implementation in a
theorem prover (it is hoped that our investigations are useful for
that). We have also quickly looked at non-realizable results in section
\ref{sec:Some-Dialectica-realizable-numerical}. Conjecture \ref{con:border}
might be worth further exploitation. (See also section \ref{sub:Original-content}
for side-products of this work).


\section{Supplementary Material\label{sec:Supplementary-Material}}

\begin{quote}
But my role is to be on the bottom of things. \cite{Knuth1993computer}
\end{quote}

\subsection{Some calculus in MA and IA\label{sub:Some-calculus-in}}

\begin{rem}
One can well integrate differential equations without having a definition
of a derivative (like done in \cite{Moore1966interval}). However,
derivatives can also be defined in MA and IA.
\end{rem}

\subsubsection{Definition of the derivative}

\begin{defn}
Let $f$ be a continuous function, and $f'$ be its derivative. Let
$x,y=x+h$ be two points where $f$ is evaluated. The Fréchet derivative
is $f(x+h)-f(x)-f'(x)h=r(h)$ with $\lim_{h\rightarrow0}\frac{r(h)}{h}=0$;
a constructive version is given in \cite[p. 44]{Bishop} as $|f(y)-f(x)-f'(x)h|\le\varepsilon|x+h|$
if $h\le\delta(\varepsilon)$ and $\varepsilon>0$. 
\end{defn}

\paragraph{MA derivative \cite[p. 32]{Schwichtenberg2005constructive}}

\begin{defn}
Let $\delta_{f'}$ be the modulus of differentiability of $f'$. Then
$|f(x+h)-f(x)-f'(x)h|\le2^{-k}$ if $h\le2^{-\delta_{f'}(k)}$.
\end{defn}

\paragraph{IA derivative \label{sub:Derivative} \cite[p. 228]{Rall1983meanvalue}}

\begin{defn}
Again use the Fréchet derivative $f(x+h)-f(x)-f'(x)h=r(h)$ with $\lim_{h\rightarrow0}\frac{r(h)}{h}=0$.
Note that the thus defined derivative is non-linear \cite{Ratschek1971uberdie}. 
\end{defn}
\begin{rem}
\cite{Moore1969functional}: In practical calculations, it will often
suffice to only define the derivative of polynomials : if $P(x)=\sum_{i=0}^{n}[a_{i},b_{i}]x^{i}$,
then the first derivative is $P'(x)=\sum_{i=0}^{n}i[a_{i},b_{i}]x^{i-1}$
and the $n$-th derivative $P^{(n)}=\sum_{i=0}^{n}i(i-1)\ldots(i-n)[a_{i},b_{i}]x^{i-n}$.
\end{rem}

\subsubsection{Mean value theorem}


\paragraph{IA mean value theorem}

\begin{thm}
If $X$ is an interval where $f$ is differentiable, and $F'$ is
an interval inclusion of $f'$ on $X$, then $f(y)-f(x)\in F'(X)(X-x)\forall x,y\in X$. 
\end{thm}
\begin{proof}
Let $h=y-x$, let $r(h)\le\frac{e}{n}$, then $\forall n:f(x+h)-f(x)\in F'(X)(X-x)+\frac{e}{n}[-1,1]$,
hence $f(x+h)-f(x)\in F'(X)(X-x)$.
\end{proof}

\subsubsection{Taylor theorem}


\paragraph{IA Taylor theorem }

\begin{thm}
If $f$ defined on $X$ is $n$ times differentiable, $F^{(n)}$ is
an interval inclusion of $f^{(n)}$ on $X$, then for $x,y\in X$:
$f(y)-f(x)-\sum_{k=1}^{n-1}\frac{1}{k!}(x)(y-x)^{k}\in\frac{1}{n!}F^{(n)}(X)(X-x)^{n}$. 
\end{thm}
\begin{proof}
Induction on $n$ using MVT, details see \cite[p. 231]{Rall1983meanvalue}.
\end{proof}

\subsubsection{IA integration: $k$-th order methods \cite[p. 75]{Moore1966interval} }

\begin{defn}
Integrate by subdividing the interval $[a,b]$ into $n$ subintervals
and expand the integrand in each subinterval into a Taylor series
with reminder. The method can be made of arbitrarily high order, because
the width of the interval can be made of the order $O(h^{k})$ for
any $k\in\mathbb{N}.$ 

Define $I:=\int_{a}^{b}f(x)dx$. Let $a=x_{0}<x_{1}<\ldots<x_{n-1}=b$

Let $I_{n,K}=\sum_{i=0}^{n-1}\sum_{r=0}^{k-1}\frac{F^{(r)}(x_{i-1})}{(r+1)!}(w([x_{i-1},x]))+\frac{1}{(k+1)!}\sum_{i=0}^{n-1}F^{(k)}([x_{i-1},x_{i}])w([x_{i-1},x_{i}])^{k+1}$.
\end{defn}
\begin{thm}
For all $n,k\ge1:I\in I_{n,k}$ and for each $k\ge1$ there exists
a positive $L_{k}$ such that $w(I_{n,k})\le L_{k}\cdot(max_{i=0,1,\ldots,n}w([x_{i-1},x])^{k+1}$. 
\end{thm}
\begin{proof}
From the Taylor theorem we obtain for each $t\in[0,w([x_{i-1},x])]$:
$f(x_{i-1}+t)=f^{(0)}(x_{i-1})+f^{(1)}(x_{i-1})t+\ldots+\frac{f^{(k-1)}(x_{i-1})t^{k-1}}{(k-1)!}+R_{i-1}^{k}(t)$
with $R_{i-1}^{k}(t)=\frac{1}{k!}f^{(k)}(x_{i-1}+\xi_{t}t)t^{k}$
for some $\xi_{t}\in[0,1]$. ...
\end{proof}

\subsubsection{Metric spaces}

\begin{defn}
For a more general setting (\cite[p. 18]{Moore1966interval}) let
$\mathcal{I}$ be a metric space on intervals generated by the distance
function $d([a,b],[c,e])=max(|a-c|,|b-e|)$. Let $I$ be an interval.
It holds that $f:I\rightarrow\mathcal{I}$ is continuous at $x_{1}\in\mathcal{I}$
if for every $\varepsilon>0$ there is $\delta>0$ such that $\forall x_{2}\in\mathcal{I}:d(x_{1},x_{2})<\delta$
implies $d(f(x_{1}),f(x_{2}))<\varepsilon$. If $f:I\rightarrow\mathcal{I}$
is a continuous real-valued function or interval-valued function,
then the extension $\bar{f}:\mathcal{I}_{I}\rightarrow\mathcal{I}$
is defined as $f(X):=\cup_{x\in X}f(x)$ for $x\in\mathcal{I}_{I}$.
Note that $\mathcal{I}$ is not a linear space.
\end{defn}

\subsection{MA example: Intermediate value theorem }

This section gives a brief flavor of algorithm extraction from a modulus
analysis proof done in \cite{Schwichtenberg2006inverting} (the coding
is manual, hence bypassing theorem prover machinery). Let $a,b$ be
rational numbers, $a<b$. Specifically, tutoring on this by Klaus
Aehlig is acknowledged. 


\subsubsection{Approximate splitting lemma}

Theorem: Let $x,y,z$ be given and assume $x<y$. Then either $x\le z$
or $z\le y$.


\subsubsection{IVTAux}

Theorem: Let $f:[a,b]\rightarrow\mathbb{R}$ be continuous, with a
uniform modulus of increase. Assume $a\le c<d\le b$, say $2^{-n}<d-c$,
and $f(c)\le0\le f(d)$. Then we can construct $c_{1},d_{1}$ with
$d_{1}-c_{1}=\frac{2}{3}(d-c)$, such that $a\le c\le c_{1}\le d_{1}\le d\le b$
and $f(c_{1})\le0\le f(d_{1})$.

Proof: Let $c_{0}=c+\frac{d-c}{3}$, $d_{0}=d-\frac{d-c}{3}$. From
$2^{-n}<d-c$ obtain $2^{-n-2}<d_{0}-c_{0}$, so $f(c_{0})<_{n+2+l}f(d_{0})$.
Compare $0$ with the interval (by ApproxSplit). If $f(d_{0})\le0$
then let $c_{1}=c_{0}$, $d_{1}=d$, if $f(c_{0})>0$ then let $c_{1}=c,$
$d_{1}=d_{0}$. 


\subsubsection{IVTAuxQ (without modulus of increase)}

Let $f:\mathbb{Q}\rightarrow\mathbb{Q}$, $c,d$ rationals with $f(c)\le0$,
$f(d)>0$. Hence we can construct Cauchy sequences $(c_{n})_{n}$,
$(d_{n})_{n}$ of rationals such that $\forall n:f(c_{n})\le0$, $f(d_{n})>0$
and such that they converge to each other such that $\forall n:f(c_{n})-f(d_{n})\le2^{-n}(f(c)-f(d))$. 

Corollary: If the $f$ can be extended to $F$ on the reals, then
$(c_{n})_{n},$ $(d_{n})_{n}$ converge to a root of $F$.

Proof: Assume $a\le c_{n}<d_{n}\le b$. Define $av=:\frac{c_{n}+d_{n}}{2}$.
Define $(c_{n+1},d_{n+1})$ to be $(av,d_{n}$) if $av\le0$, $(c_{n},av)$
otherwise.


\subsubsection{IVTcds}

Theorem: From an initial pair of rationals $(c_{0},d_{0})$ a continous
function $f:[a,b]\rightarrow\mathbb{R}$ with uniform modulus of increase
for each $k$ by repeated application of IVTaux we can construct a
sequence of length $n$ pairs of rationals such that $(c_{n-1},d_{n-1})\le2^{-k}$.


\subsubsection{IVT}

If $f:[a,b]\rightarrow\mathbb{R}$ is continuous with $f(a)\le0\le f(b)$,
and with a uniform modulus of increase, then we can find $x\in[a,b]$
such that $f(x)=0$ .


\subsubsection{Inv}

Theorem: If $f:[a,b]\rightarrow\mathbb{R}$ is continuous with $f(a)\le a'<b'\le f(b)$
and a uniform modulus of increase $l$ then we can construct a $g:[a',b']\rightarrow\mathbb{R}$
such that $f(g(y))=y$ for every $y\in[a',b']$ and $g(f(x))=x$ for
every $x\in[a,b]$ such that $a'\le f(x)\le b'$.

Proof: Construct $g:[a',b']\rightarrow\mathbb{R}$ such that $g$
is continuous. Pick a rational $u\in[a',b']$. Using $f(a)-u\le0\le f(b)-u$,
IVTcds gives an $x$ such that $f(x)-u=0$, as a Cauchy sequence $c_{n}$.
Let $h_{g}(u,n):=c_{n}$, let $\alpha_{g}$ such that for $n\ge\alpha_{g}(k)$
we have $(\frac{2}{3})^{n}(b-a)\le2^{-\omega_{f}(k+l+2)}$, for the
uniform modulus $\omega_{g}$ of continuity pick $\omega_{g}(k):=k+l+2$.
Let $a'\le u<v\le b'$ and $n\ge\alpha_{g}(k)$. 

For $c_{n}^{u}:=h_{g}(u,n)$ and $c_{n}^{v}:=h_{g}(v,n)$ assume that
$|c_{n}^{u}-c_{n}^{v}|>2^{-k}$; to show $|u-v|>2^{-\omega_{g}(k)+1}$.
Show that $|u-v|\ge|f(c_{n}^{u})-f(c_{n}^{v})|-|f(c_{n}^{u})-u|-|f(c_{n}^{v})-v|\ge2^{-k-l-1}$.
$|f(c_{n}^{u})-u|$ is obtained from $|f(c_{n}^{u})-u|\le|f(d_{n}^{u})-u|=|f(d_{n}^{u})-f(c_{n}^{u})|\le2^{-k-l-2}$,
similarly $|f(c_{n}^{v})-v|$. By the modulus of increase $l$ we
have $|f(c_{n}^{u})-f(c_{n}^{v})|\ge2^{-k-l}$. By IVT $d_{n}^{u}-c_{n}^{u}\le(\frac{2}{3})^{n}(b-a)\le2^{-\omega_{f}(k+l+2)}$. 

From $\forall n,m\ge\alpha_{f}(k+1):|f(g(u))-u|=|h_{f}(c_{n},n)-u|\le|h_{f}(c_{n},m)-h_{f}(c_{n},m)|+|h_{f}(c_{n},m)-u|\le2^{-k}$
infer $f(g(u))=u$. For every $x\in[a,b]$ with $a'\le f(x)\le b'$,
from $g(f(x))<x$ obtain the contraction $f(x)=f(g(f(x)))<f(x)$ from
the hypothesis on the slope, and similarly for $>$. Using $u\not<v\leftrightarrow v\le u$
obtain $g(f(x))=x$.


\subsubsection{InvQ}

Theorem: If $f:[a,b]\cap\mathbb{Q}\rightarrow\mathbb{Q}$ is continuous
with $f(a)\le a'<b'\le f(b)$ and $f$ injective we can construct
a $g:[a',b']\rightarrow\langle(c_{n})_{n},(d_{n})_{n}\rangle$ such
that $f(g(y))=y$ for every $y\in[a',b']$ and $g(f(x))=x$ for every
$x\in[a,b]$ such that $a'\le f(x)\le b'$. 

Proof: Let $a'\le u<v\le b'$. By IVTAuxQ for each $k$ we can construct
sequences on the rationals such that for $u-c_{n}^{u}\le d_{n}^{u}-c_{n}^{u}<2^{-k}$
and $c_{n}^{v}-v\le d_{n}^{v}-c_{n}^{v}<2^{-k}$, hence if $f(v)-f(u)>2^{-m-1}$
by choosing $n$ large enough that $|c_{n}^{u}-u|<2^{-m-2}$ and $|d_{n}^{v}-v|<2^{-m-2}$
then we can find $\varepsilon$ such that $v-u>\varepsilon$. If given
a modulus of increase $l$, then this $\varepsilon$ is $2^{-(m+1+l)}$.

Since real extensions $F,G$ of continuous functions are determined
by their values on the rationals, we also have $F(G(y))=y$ for $y\in[a',b']$.

Remark: If $f:[a,b]\cap\mathbb{Q}\rightarrow\mathbb{Q}$ is convex
(concave), then we also can find maxima (minima) by using an interval-halving
method with the sign of the slope as decision criterion, hence all
real roots of polynomials can be found, hence the real part of the
algebraic numbers can be obtained. 


\subsubsection{Hand-coded realizing scheme program }

\begin{lyxcode}
(define~(f~x)~(-~2~({*}~x~x)))

;

(define~(next~f~cd)~

~~~~(display~\char`\"{}~\char`\"{})

~~~~(display~(car~cd))

~~~~(display~\char`\"{}-\char`\"{})

~~~~(display~(cadr~cd))

~~~~(let~((av~(/(+~(car~cd)~(cadr~cd))~2)))

~~~~(newline)

~~~~(if~(>~(f~av)~0)

~~~~~~~~(list~av~(cadr~cd))

~~~~~~~~(list~(car~cd)~av))))

;

(define~(run~counter~data)

~~~~(newline)

~~~~(display~\char`\"{}run\char`\"{})

~~~~(display~counter)

~~~~(if~(=~0~counter)

~~~~(display~\char`\"{}done!\char`\"{})

~~~~(run~(-~counter~1)~(next~f~data))))

;

(run~30~'(1~2))
\end{lyxcode}

\subsubsection{$\sqrt{2}$}

Theorem: We can find $x\in[1,2]$ such that $2-x^{2}=0$.


\subsection{Taylor coefficients of ninth order\label{taylor9}}

\begin{rem}
It is easy to demonstrate that the chain rule can generate unwieldy
terms by symbolic differentian, here we show Taylor coefficients of
the ninth order (as used by Moore).

${\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))))))))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f))+(f)*{\frac{\partial}{\partial y}}({\frac{\partial}{\partial x}}(f)+(f)*{\frac{\partial}{\partial y}}(f)))+(f)*{\f