\chapter{Algorithms}
\label{chap:algorithms}

In this section we present algorithms for training the model introduced in
\autoref{sec:model}, inference and prefetching based on the trained model, and
the main body of the preload daemon.  The inference algorithms are presented
as their probability equations to maintain readability.

In the algorithms, we access properties of the objects as defined in the
structures in the model (\autoref{sec:model}).  For example, for a Markov
object $m$, its transition probability from state 1 to state 3 may be written
as $m.\m{tp}_{1,3}$ or $m.\m{tp}[1][3]$ depending on the context.  We use the
former in equations, and the latter in algorithms.

\section{Main Algorithm}

When the preload daemon is started, it will periodically run the training and
prediction algorithms.  This is shown in \autoref{alg:main}.

\begin{algorithm}[t]
\sffamily
\label{alg:main}
\begin{algorithmic}[1]
	\STATE Load configuration
	\STATE Load $\state$
	\WHILE{not terminated}
		\STATE $timestamp \leftarrow$ current time
		\STATE GatherData
		\STATE Prefetch
		\STATE Train
		\STATE Sleep $(timestamp + \tau -$ current time$)$ seconds
	\ENDWHILE
	\STATE Store $\state$
\end{algorithmic}
\caption[Main algorithm of the preload daemon]
{Main algorithm of the preload daemon\newline\footnotesize
The daemon loads configuration and state.
Next, it gathers data, predict and prefetch, and train in a
loop until terminated.  Finally it saves data and exit.
The \textsf{GatherData} procedure is described in \autoref{sec:datagathering},
and the \textsf{Prefetch} and \textsf{Train} algorithms are
presented in \autoref{alg:prefetch}, and \autoref{alg:train} respectively.
\hfill}
\end{algorithm}

\section{Data Gathering}
\label{sec:datagathering}

Data gathering happens by scanning the list of currently running processes
from \proc, parsing their maps, and reading memory and I/O statistics from
\texttt{/proc/meminfo}.

The data gathering algorithm populates a list of the file-name of
currently-running applications, named \texttt{running\_applications}, and a
MemStat object named \texttt{current\_memstat}.  This information is used
during prefetching.

\subsection{Exe and Map Filtering}\label{exe-filtering}

We do a very simple filtering on the scanned processes and maps.  For both we
allow the user to black-list certain files by matching patterns on the file
names.  The details of this black-listing can be found in
\autoref{subsec:system-params}.  The only other
filtering we do is to require a minimum size for the sum of the size of the
maps for a process to consider it as an Exe object.  This parameter is
explained in \autoref{subsec:model-params}.

\section{Predictor}

The predictor is responsible for inferring probabilities that each map may be
needed during the next cycle, and to choose and prefetch the high ranking
ones.

\subsection{Inference}
\label{subsec:inference}

In the following sections, all the probability estimates of the form $\P{X}$
are implicitly conditioned on a given state and parameter $\tau$ (the length of one
cycle).  For example when we write $\P{X}$, we really mean is
$\P{X|\state,\tau}$.


\subsubsection{Inferring Exe Probabilities}
\label{subsec:infer-exes}

For every Exe object $E$, we are interested in finding $\P{E\starts}$ which
is the probability that $E$ is not currently running but will be started
during the next cycle ($\tau$ seconds).  For a running application, this is
obviously zero.  We can encode this observation as:
\begin{equation}
\P{E\starts} = (1 - E.\running) \P{E\needed}
\end{equation}
where $\P{E\needed}$ is the probability that the application $E$ will be
running at the next cycle.  Similarly for $\P{E\notneeded}$:
\begin{equation}
\P{E\starts} = (1 - E.\running) (1 - \P{E\notneeded})
\end{equation}

Now to find $\P{E\notneeded|\state,\tau}$, we observe that $\P{E\notneeded}$
is independent of all the variables in state except for the Markov chains that
it forms with other applications.  In other words:
\begin{eqnarray}
\P{E\notneeded}
&=& \P{E\notneeded|\state,\tau} \\
&=& \P{E\notneeded|E.\markovs,\tau} \\
&=& \prod_{m\in E.\markovs} \P{E\notneeded|m,\tau} \\
&=& \prod_{m\in E.\markovs} (1 - \P{E\needed|m,\tau})
\end{eqnarray}

Now consider the case of a single Markov $m$ that has $E$ as one of its Exe
links.  Assume without loss of generality that $m.\a = E$.  The case for
$m.\b = E$ is similar.  We are only interested in the case that $E$ is not
currently running, so $P(E\needed|m,\tau)$ is equal to the probability that
$m$ makes a transition in time $\tau$, and that the state it goes into has $E$
running (states 1 and 3):

\begin{eqnarray}
\P{E\needed|m,\tau}
&=& \P{m\mbox{ makes transition in time }\leq\tau} \\
&\times& \P{m\mbox{ goes into states 1 or 3}\thinspace|\thinspace
            m\mbox{ changes state}} \\
&=& (1 - \exp({\frac{-\tau}{m.tt_{m.\state}}}))
    (m.tp_{m.\state,1} + m.tp_{m.\state,3})
\end{eqnarray}

and we are done.  The probability that application $E$ is not currently
running but will be started during the next cycle is:

\begin{eqnarray}
\P{E\starts} &=& (1 - E.\running) \P{E\needed}\\
\P{E\needed} &=& 1 - \prod_{m\in E.\markovs} (1 - \P{E\needed|m,\tau})
\\
\P{E\needed|m,\tau}
&=& (1 - \exp({\frac{-\tau}{m.tt_{m.\state}}}))
    (m.tp_{m.\state,1} + m.tp_{m.\state,3})
\end{eqnarray}


\subsubsection{Inferring Map Probabilities}
\label{subsec:infer-maps}

For every map object $M$, we are interested in finding
%$\P{M\isread}$ which is the probability that $M$ is not currently in
%the main memory but will be read into main memory
$\P{M\needed}$ which is the probability that $M$
will be used by a process
during the next cycle ($\tau$ seconds).
This happens when an
already running application accesses the map, or if a new application that
uses the map is run.  The former case cannot be tracked easily, so we only
handle the latter.

%If the map is already in memory, $\P{M\isread}$ this is obviously zero.  So:
%\begin{equation}
%\P{M\isread} = (1 - M.\incore) \P{M\needed}
%\end{equation}
%where
$\P{M\needed}$ is the probability that at least one application using
map $M$ that is not already running will be started by the next cycle.
Similarly for $\P{M\notneeded}$.
%:
%\begin{equation}
%\P{M\isread} = (1 - M.\incore) (1 - \P{M\notneeded})
%\end{equation}
%
With this definition,
$\P{M\notneeded}$ can be computed using $\P{E\starts}$:
\begin{eqnarray}
\P{M\notneeded}
&=& \P{M\notneeded|\state,\tau} \\
&=& \P{M\notneeded|\{E: M\in E.\maps\},\tau} \\
&=& \prod_{E: M\in E.\maps} (1 - \P{E\starts})
\end{eqnarray}
where $\P{E\starts}$ is inferred in previous section.  Hence:
\begin{equation}
\P{M\needed} = (1 - \prod_{E: M\in E.\maps} (1 - \P{E\starts}))
\end{equation}

%The probability that application $M$ is not currently in memory
%but will be read during the next cycle is:
%\begin{eqnarray}
%\P{M\isread|\state,\tau} = (1 - M.\incore)
%(1 - \prod_{E: M\in E.\maps} (1 - \P{E\starts})) \\
%\end{eqnarray}


\subsection{Prefetching}

With the list of currently running applications and memory statistics
available and the inference equations, the prefetching algorithm simply sorts
all maps based on the probability that they will be needed during the next
cycle, cuts at a threshold depending on the memory conditions, and fetches.
The prefetching algorithm is listed in \autoref{alg:prefetch}.

%TODOOOOOOOOOOOOOO TODO handle pagein/out

\begin{algorithm}[t]
\sffamily
\label{alg:prefetch}
\begin{algorithmic}[1]
	\STATE \id{maps} $\leftarrow \state.\member{maps}$
	\STATE Sort \id{maps} descending using $\P{M\needed}$ as key
	\STATE Compute \id{available\_mem} for prefetching
	based on configuration parameters and \id{current\_memstat}

	\STATE \id{selected\_maps} $\leftarrow \emptyset$
	\FORALL{$M$ in \id{maps} in the sorted order}
		\STATE $E.\running \leftarrow 1\text{ if }E\in\text{\id{running\_exes}, 0 otherwise}$
		\IF{$M.\m{length} > $ \id{available\_mem}}
			\STATE Break out of loop
		\ENDIF
		\STATE \id{selected\_maps} $\leftarrow \id{selected\_maps} \cup M$
	\ENDFOR
	\STATE Fetch maps in \id{selected\_maps} into memory
	\STATE $\state.\m{memstat} \leftarrow$ \id{current\_memstat}
\end{algorithmic}
\caption[Prefetch]{Prefetch\newline\footnotesize
The prefetching algorithm first sorts maps based on their probability of being
needed during the next cycle, derived in \autoref{subsec:infer-maps}.
Then, it computes available memory for prefetching according to
\autoref{eqn:memory}.
And finally it prefetches maps from the most probable to the least, until it
exhausts the available memory.
\hfill}
\end{algorithm}

%\subsubsection{Approximating Maps in Core}
%\subsubsection{Throttling}
%\subsubsection{Fetching}

\section{Training}

The training algorithm is straightforward: It checks for any new applications
that are not known to preload currently and registers them.  Then, it updates
the running status of all Exe objects, and finally updates all Markov objects.
The training algorithm is listed in \autoref{alg:train} and uses the
UpdateMarkov algorithm.

The UpdateMarkov algorithm is listed in \autoref{alg:updatemarkov}.
It computes the new state of the Markov, and if it is different from the
old state, a transition has been occurred.  In that case, it updates all
different timestamps of the object, as well as the transition time and
probabilities.

The transition time of the previous state is updated to reflect the transition
happening.  It follows \autoref{eqn:decay}.  The transition time maintained in
the Markov object is an exponentially-fading mean of the sequence of times to
leave the state, over all transition events that ever happened from the previous
state.

All transition probabilities are also updated in a similar fashion.  The
transition probability of the current event is 1 for the arc connecting
previous state to the new state, and 0 for all other arcs.  This value is
combined with the exponentially-fading mean value, using \autoref{eqn:decay}
again.

\begin{algorithm}[t]
\sffamily
\label{alg:train}
\begin{algorithmic}[1]
	\STATE \id{known\_applications} $\leftarrow \{E.\path|E\in\state.\exes\}$
	\FORALL{\id{path} in (\id{running\_applications} - \id{known\_applications})}
		\STATE Create a new Exe object $E$ for \id{path}
		\STATE Populate $E.\member{maps}$ by scanning \proc; create
		new Map objects and add them to $\state.\member{maps}$ if
		necessary
		\FORALL{$E'$ in $\state.\exes$}
			\STATE Create a new Markov object $M$ for $E$ and $E'$
			\STATE Add $M$ to $E.\markovs$ and $E'.\markovs$
		\ENDFOR
		\STATE Add $E$ to $\state.\exes$
	\ENDFOR
	\STATE \id{running\_exes} $\leftarrow \{E|E\in\state.\exes\text{ and }E.\path\in \text{\id{running\_applications}}\}$
	\FORALL{$E$ in $\state.\exes$}
		\STATE $E.\running \leftarrow 1\text{ if }E\in\text{\id{running\_exes}, 0 otherwise}$
	\ENDFOR
	\STATE $\state.\m{running\_exes} \leftarrow$ \id{running\_exes}
	\FORALL{$E$ in $\state.\exes$}
		\FORALL{$M$ in $E.\exes$}
			\STATE UpdateMarkov $M$
		\ENDFOR
	\ENDFOR
\end{algorithmic}
\caption[Train]{Train\newline\footnotesize
To train the model after gathering data in each cycle, we first register all
applications never know to preload before.  This is achieved by creating an
Exe object for it, populating it with Map objects and Markov objects, and
adding it to the list of all applications.  After that, we update the running
status of all Exe objects.  Finally we update all Markov objects using the
\textsf{UpdateMarkov} algorithm presented in \autoref{alg:updatemarkov}.
\hfill}
\end{algorithm}

\begin{algorithm}[t]
\sffamily
\label{alg:updatemarkov}
\begin{algorithmic}[1]
	\REQUIRE $M$
	\STATE \id{new\_state} $\leftarrow M.\a.\running + 2 \times M.\b.\running$
	\IF{\id{new\_state} $\neq M.\state$}
		\STATE \id{time} $\leftarrow$ current time
		\STATE $t\leftarrow$ \id{time} $- M.\m{timestamp}\ind{M.\state}$
		\STATE	$
			M.\m{tt}\ind{M.\state}
			\leftarrow
			e^{-\lambda t}
			M.\m{tt}\ind{M.\state}
			+ 
			(1-e^{-\lambda t})
			(\text{\id{time}} - M.\m{time})
			$ 
		\STATE $t\leftarrow$ \id{time} $-\thinspace M.\m{time}$
		\FOR{$i=0$ to $3$}
			\FOR{$j=0$ to $3$ such that $i\neq j$}
				\STATE $p \leftarrow 1$ iff $i=M.\state$ and
				$j=$ \id{new\_state}, 0 otherwise
				\STATE	$
					M.\m{tp}\indd{i}{j}
					\leftarrow
					e^{-\lambda t}
					M.\m{tp}\indd{i}{j}
					+ 
					(1-e^{-\lambda t})
					p
					$ 
			\ENDFOR
		\ENDFOR
		\STATE $M.\m{timestamp}\ind{M.\state} \leftarrow$ \id{time}
		\STATE $M.\m{time} \leftarrow$ \id{time}
		\STATE $M.\state \leftarrow$ \id{new\_state}
	\ENDIF
\end{algorithmic}
\caption[UpdateMarkov]{UpdateMarkov\newline\footnotesize
To update a Markov object we determine the state it is making a transition to.
If no transition is happening there is not much to do.  Otherwise, we update
the transition time going out of the previous state to reflect the transition
happening.  This is done on line 5 using \autoref{eqn:decay}.
Then we update all transition probabilities in a similar way.
This is done on line 10 using \autoref{eqn:decay}.
Note that line 10 is mixing probability values linearly with
coefficients adding to 1, so, the result is still a probability value.
Finally we record the transition by updating timestamps and current state.
\hfill}
\end{algorithm}
