\newif\ifpaper
\newif\ifslide
\def\x#1#2#3\end{%
\paperfalse
\slidefalse
\ifx#20\papertrue\fi
%\ifx#2a\papertrue\fi
%\ifx#2l\slidetrue\fi
}
\expandafter\x\jobname\end

\ifpaper
\documentclass[12pt]{article}
\usepackage{fancyhdr}
\textwidth=6.5in
\textheight=9.5in
\pagestyle{fancy}
\oddsidemargin=0in
\topmargin=0in
\chead{\small Persian Computing with Unicode}
\cfoot{\hspace{2em}\thepage\hspace{-2em}}
\lfoot{\small$25^{\textrm{\scriptsize th}}$ Internationalization and Unicode Conference}
\rfoot{\small Washington, DC, March/April 2004}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\def\textcolor#1#2{\textbf{#2}}
\def\pagecolor#1{}
\let\color\pagecolor
\def\endslide{}
\def\endoverlay{}
\def\overlay{}
\long\def\slide#1\endslide{%
\begin{center}
\hspace*{-.2em}
\fbox{
\hspace*{.2em}
\begin{minipage}[t]{.95\textwidth}
\parskip=1em
\vspace*{2ex}
\vbox to .60\textwidth{
\textheight=.60\textwidth
#1%
\endslide
}
\vspace*{1ex}
\end{minipage}
\hspace*{.2em}
}
\hspace*{-.2em}
\end{center}
}
\newenvironment{note}{\vspace{1ex}\par\small}{\newpage}
\else
\documentclass[landscape]{slides}
\ifslide\onlyslides\fi
\textheight=1.2\textheight
\oddsidemargin=1.5cm
\topmargin=-2.0cm
\usepackage{color}
% -- Background color blue?
\definecolor{backgroundcolor}{rgb}{0,0,.113}
% -- Or background color red?
%\definecolor{backgroundcolor}{rgb}{.5,0,0}
% -- Text color
\definecolor{textcolor}{rgb}{1,1,1}
\definecolor{headcolor}{rgb}{1,1,0}
\let\oldnote\note
\def\note{\oldnote\vspace{-3ex}\small\parindent=1em\relax\parskip=.3ex}
\fi

\usepackage{url}
\usepackage{graphicx}
\usepackage[kasre=off,yeh=small,tashdid=off,farsi456=on]{faanoos}
\usepackage{array}
\let\step\relax


\newcommand{\slt}[1]{\centering\textcolor{headcolor}{\Large\textbf{#1}}\hbox{}\vfill\raggedright\par}
\newcommand{\prog}[1]{\texttt{#1}}
\newlength\contboxwidth
\newcommand{\cont}{\setbox0=\hbox{\small\space(continued)}%
	\contboxwidth=\wd0\box0\hspace{-\contboxwidth}}
\def\abovecaptionskip{5ex}
\newcommand{\fa}[1]{\mbox{\Large\textfarsi{#1}}}

\let\saveendslide=\endslide
\let\saveendoverlay=\endoverlay
\renewcommand{\endslide}{\vfill\saveendslide}
\renewcommand{\endoverlay}{\vfill\saveendoverlay}
\let\saveslide=\slide
\renewcommand{\slide}{\saveslide\flushright}
\let\saveoverlay=\overlay
\renewcommand{\overlay}{\saveoverlay\flushright}

\newcommand{\unicode}[1]{{U+#1}}
\newcommand{\uniname}[1]{\emph{#1}}
\newcommand{\uni}[2]{\unicode{#1} \uniname{#2}}
\newcommand{\uniseq}[1]{$\langle$#1$\rangle$}
\newcommand{\unirange}[2]{\uniseq{\unicode{#1}..\unicode{#2}}}

\color{textcolor}
\begin{document}
\color{textcolor}
\pagecolor{backgroundcolor}

\slide
\ifpaper\else\thispagestyle{empty}\fi
\centering
\textcolor{headcolor}{\Huge\bfseries
\begin{tabular}{c}
	Persian Computing\\ {\Large with}\\ Unicode
\end{tabular}}\par
{\vspace{1em}\vfill\textcolor{textcolor}{\Large
$\stackrel%
{\mbox{Behdad Esfahbod}}%
{\mbox{\large\texttt{unicode@behdad.org}}}$\\[1em]
$\stackrel%
{\mbox{The FarsiWeb Project}}%
{\mbox{\large\texttt{http://www.farsiweb.info/}}}$
}}\par
% -- No date?
% -- Or presenation date?
%{\large March 31, 2004}
% --
\endslide
\begin{note}
\begin{center}
\Large Abstract
\end{center}
\begin{quotation}
In ef{}forts to internationalize software, the Persian language
has unfortunately merely been considered a variant of the
Arabic language. That is to say, people have considered Persian
to be Arabic plus a few extra letters.  While this is partially
true, it is only part of the story.  Persian has a modif{}ied form
of the Arabic script.  In addition to extra letters, it has
other modif{}ied letters as well as other stylistic complexities
unknown to Arabic.  While the script is an important
similarity, one must consider that the semantics and habits
behind the two languages are completely dif{}ferent. To
successfully incorporate Persian support, a certain degree of
knowledge relating to the semantics of the language is necessary.
This paper focuses on
providing developers and software engineers with no
Persian-language background an in-depth understanding of the
characteristics of Persian computing.

After briefly explaining the way Persian is handled in the
Unicode standard, we will focus on other concepts that are
critical for proper Persian support in internationalized
software.

The inherent complexity of the language and lack of customers,
and more importantly, customer feedback, have delayed Persian
computing in software internationalization ef{}forts. However, we
have seen many issues addressed and improved in the recent years
in regard to Persian computing. In keeping up with these
improvements, this session is intended to give managers, software
engineers, and developers enough information and references to
add proper Persian support to their products.
\end{quotation}
\end{note}


\slide
\slt{What is Persian?}
\begin{center}
\color{white}
\ifpaper
\includegraphics[width=.9\textwidth]{black.mps}
\else
\includegraphics[width=.9\textwidth]{white.mps}
\fi
\end{center}
\endslide
\begin{note}
Persian, an Indo-European language was once spoken from the
Middle East to India. Today it is spoken in Iran and several
neighboring countries.  It is important to distinguish between
the spoken versus the written form of the language.
The spoken forms of the variant dialects of Persian are
generally mutually understandable across modern political
borders. Being a member of the Indo-European family of
languages, Persian is very similar to English, French and
German.  However, the script has been borrowed from the Arabic
language.  Arabic is, like Hebrew, a Semitic language and bears
little resemblance to the Indo-European language family.

Today three major variations of the Persian language can be recognized as
shown in the slide.  These are \emph{Dari} the of{}f{}icial language
of Afghanistan, \emph{Farsi} the of{}f{}icial language of Iran, and
\emph{Tajik} the of{}f{}icial language of Tajikistan.

The name of the language is a source of much confusion.  In
English, the name of the language is ``Persian''
(ISO~639-1:2002).  In the Persian language itself, the name of
the language is ``Farsi.''  Unfortunately, a small group of
individuals outside of Iran have recently started using the
word ``Farsi'' even when speaking English.  This incorrect usage
has created much chaos and dif{}f{}iculties for everyone.  For
example, one must search under both ``Persian'' and ``Farsi'' when
looking for information about this language.  It is absolutely
necessary that professionals in the industry insist on use of
the name ``Persian'' in all products and documentation. A further
source of confusion is that the generic name Persian may either
refer to the Persian language in general, or it may refer to
the variant of Persian spoken only in the country of Iran.  In
the latter case, the Persian spoken outside of Iran maybe
refered to by more local-sounding names. In Afghanistan,
Persian is usually called ``Dari'' and in Tajikistan, Persian
is called, ``Tajik.'' In Iran, the default name, ``Persian'' is
used or, as stated above, ``Farsi'' if one is speaking in the
language itself.  We have used the word ``Farsi'' in our diagram to
distinguish it from ``Dari'' and ``Tajik'' but all three are in
fact, Persian. The two-letter code for the Persian language
as used in Iran and Afghanistan is \textbf{fa}. Compare this to
``German'' language with two-letter language code \textbf{de}.

The variations of the language should be identif{}ied in software
by presenting the territory, e.g.~``Persian (Iran)'', instead of
the \emph{unof{}f{}icial} name of the variation, e.g.~``Farsi''.

In the rest of this paper, the word ``Persian'' will be used to refer
only to the of{}f{}icial language of Iran.
\end{note}


\slide
\slt{Persian in Computers}
There are three relevant national standards:
\begin{itemize}
\item ISIRI 3342:1992  Farsi 8-bit Coded Character Set for
Information Interchange (deprecated)
\item ISIRI 2901:1994  Keyboard Layout for Farsi: Characters in
Computer
\item ISIRI 6219:2002  Information Technology -{}-
Persian Information Interchange and Display Mechanism, using Unicode
\end{itemize}
\endslide
\begin{note}

The infamous ISIRI 3342 character set standard is not the f{}irst
of its kind.  It is a sequel to the old ISIRI 2900 7-bit
information interchange standard.
But even ISIRI 3342 which tried to solve the problems of the old
document, never opened the way for the development of software in
Iran.

Until recently every vendor in Iran used its own 8-bit character
set and its own keyboard layout.  Then Microsoft implemented the
Unicode standard in its Windows operating system and by the time
Internet Explorer 5 was out, Microsoft Windows was the f{}irst
system widely used in Iran that implemented Unicode-based Persian support.
As a consequence people turned to the new system and f{}inally
ISIRI 6219 replaced the character set standard as the Unicode
standard.

Fortunately ISIRI 3342 was never used widely so
Persian is one of the lucky languages that do not have an 8-bit
character set still in use.  This is important to note that the
ISIRI 3342 standard is deprecated now and is \emph{not} the 8-bit
standard of Persian text encoding.  The author does not know of
any application that supports ISIRI 3342 natively, and so does
not know about \emph{any} signif{}icant amount of text encoded in
this encoding.  So there is no advantage in supporting ISIRI 3342
in software anymore.

ISIRI 2901 is still the standard keyboard layout.  It needs to be
updated to go with the Unicode-based character set standard.  The
updated layout would be registered as another standard due to
signif{}icant improvements.  It will deprecate ISIRI 2901.

While like any other Persian speaker, I appreciate  the ef{}forts that
brought about Persian support in Microsoft Windows,
like any other software system, the Persian support in Microsoft
products was not perfect.  Their problems have spread all
around the globe in the past f{}ive years and unfortunately have
af{}fected the common practice and user experience of some f{}ine
details of Persian computing to some degree.  To identify these
problems I will point them out in the following sections whenever
something has been handled in a wrong way in Microsoft Windows
system.  Almost all of these problems have been already f{}ixed or 
considered to be f{}ixed by Microsoft.
\end{note}


\slide
\slt{Modern Persian Script}
\begin{itemize}
\item Based on Arabic Script (\unirange{0600}{06FF} block): With some extra letters, some
modif{}ied letters
\item But with completely dif{}ferent semantics and typographical
habits
\item Is a bidirectional script: Is written from right to left,
    except for numbers
\item Needs cursive joining: Two adjacent letters may be {\em joined},
    forming 1, 2, or 4 glyphs for each character: (for example
\fa{s}, \fa{"=s}, \fa{"=s"=}, \fa{s"=})
\end{itemize}
\endslide
\begin{note}
The modern Persian script is an extension and modif{}ication of the
Arabic script.  Before the 7th century CE, Persian was written in
a very dif{}ferent script known as the \emph{Pahlavi} script.
With the coming of Islam to Iran, the Arabic script was adopted for
writing Persian.  The use of the Arabic script then spread to other
lands, each language adding letters and making modif{}ications as needed.
This propagated to Central, South,
and even South East Asia, as well as North Africa; from Morocco
to Java, where the alphabet was extended
even more: from 29 basic Arabic letters to more than hundred letters
in modern use (from Kurdish to Jawi).

Arabic is called a complex script.  This is mainly because it is
written from right to left.  Of course the text may be mixed with
Latin text and numbers that are written from left to right,
    adding to the complexity of rendering.  The Unicode Standard
    Annex \#9: \emph{The Bidirectional Algorithm} provides an
    \emph{exact} and \emph{explicit} mechanism for converting a
    logically stored stream of characters including some
characters of a right-to-left script, to a visually ordered one
suitable for display.  This algorithm is needed for Arabic
(incl.\ Persian, Urdu, Sindhi,~\dots), Hebrew (incl.\ Yiddish),
    Syriac, and Thanaa scripts.

The other complicated \emph{feature} of the language is
\emph{cursive joining}.  This simply means that the characters do
not have a single shape and may \emph{join} adjacent
characters, forming up to four dif{}ferent shapes.  The Unicode
standard encoding only characters not glyphs, has allocated one
character for each Arabic letter.  This means a render-time
process should select the proper shape for each character.  This
process is known as Arabic joining and is described in
Section 8.2 of the Unicode standard: Arabic Script.
\end{note}

\slide
\slt{Arabic Script Rendering}
\begin{center}
\setlength{\extrarowheight}{1ex}
\begin{tabular}{|l|l|l|}
\hline
Input text & Logical order & \fa{s} \fa{l} \fa{=aa} \fa{m} \\
\hline
After Bidirectional Algorithm & Visual order & \hfill\fa{m} \fa{=aa} \fa{l} \fa{s} \\
\hline
After Arabic Joining Algorithm & Glyph list & \hfill\fa{m} \fa{"=aa} \fa{"=l"=} \fa{s"=} \\
\hline
After Ligation & Glyph list & \hfill\fa{m} \fa{"=laa} \fa{s"=} \\
\hline
When Rendered & Output & \hfill\fa{salaam} \\
\hline
\end{tabular}
\end{center}

With enough care, it is possible, to apply the above algorithms in a
dif{}ferent order, and get the same result.

\endslide
\begin{note}
The input text is said to be in \emph{logical} order.  This is
the order that one reads and types in the text.  After applying
the bidirectional algorithm the order of characters is called
\emph{visual} order.  This is the order that they should appear
on screen.  Then the Arabic joining algorithm determines which
shape of a character should be rendered.  After that some
ligatures may form.  And that is the f{}inal list of glyphs that
would appear on screen.

There is an egg and chicken problem here.  That is, to correctly
apply the bidirectional algorithms, paragraphs should be broken
into lines.  But to break lines in almost all modern rendering
engines, the f{}inal glyph widths should be known.  And that is not
known before applying Arabic joining algorithm and ligation!
This simple argument adds to the complexity of the rendering
mechanism, such that the two algorithms cannot be separated from
each other and interact in some sense.  This again adds to the
complexity of the script, when it comes to computers.

There is another feature of the Unicode standard hidden in the
Bidirectional algorithm.  That is the \emph{mirroring} property
of some characters.  For example the character \uni{0028}{Left
Parenthesis} ``\textbf{(}''is def{}ined to actually be
	\textbf{Opening
	Parenthesis}.  This means that the same character is used
	in Arabic script too as an opening parenthesis, but because
	of the dif{}ferent direction of the script, it is
	\emph{mirrored} and is rendered like this:
	``\textbf{)}''.
\end{note}


\slide
\slt{Alphabet}
\vspace{-1ex}
\begin{itemize}
\item Extra letters:\\
\quad \uni{067E}{Peh} (\fa{p}),
      \uni{0686}{Tcheh} (\fa{ch}),\\
\quad \uni{0698}{Jeh} (\fa{zh}),
      \uni{06AF}{Gaf} (\fa{g})
\item Modif{}ied letters:\\
%\quad \uni{0643}{Kaf} (\fa{ك}) $\rightarrow$ \uni{06A9}{Keheh} (\fa{k}),\\
%\quad \uni{064A}{Yeh} (\fa{ي}) $\rightarrow$ \uni{06CC}{Farsi Yeh}
\begin{center}
\begin{tabular}{|l|c|c|c|c|}
\hline
\small Character & \small Isol & \small Fina & \small Medi &
\small Init \\
\hline
\uni{064A}{Arabic Letter Kaf} & \fa{ك} & \fa{"=ك} & \fa{"=ك"=} & \fa{ك"=} \\\small (Arabic Kaf)&&&&\\
\hline
\uni{06CC}{Arabic Letter Keheh} & \fa{k} & \fa{"=k} & \fa{"=k"=} & \fa{k"=} \\\small (Persian Kaf)&&&&\\
%\hline
%\uni{0649}{Alef Maksura} & \fa{ى} & \fa{"=ى} & \fa{"=ى"=} & \fa{ى"=} \\
\hline
\uni{064A}{Arabic Letter Yeh} & \fa{ي} & \fa{"=ي} & \fa{"=ي"=} & \fa{ي"=} \\
\small (Arabic Yeh)&&&&\\
\hline
\uni{06CC}{Arabic Letter Farsi Yeh} & \fa{y} & \fa{"=y} & \fa{"=y"=} & \fa{y"=} \\
\small (Persian Yeh)&&&&\\
\hline
\end{tabular}
\end{center}

\end{itemize}
\endslide
\begin{note}
The Persian alphabet shares with the Arabic alphabet most of its
letters.  There are four main extra letters that are neither
written nor pronounced in traditional Arabic.  Many rendering engines or
fonts that write the code for their joining tables manually instead of
extracting from Unicode data f{}iles, have problems joining these
extra letters properly.

The Unicode standard has identif{}ied a dif{}ferent character for the
Persian Kaf letter (Keheh is the Sindhi name).  This is basically
because of the dif{}ferent look of the f{}inal and isolated shapes of
the character as can be seen above.  Many font designers have
ignored the dif{}ference in the past.  As a consequence many people
do not dif{}ferentiate between the two shapes anymore and so use
Arabic Kaf in Persian context.  The \emph{Courier New} font even
mixes the appearance of the two, which adds to the confusion.
Many Persian web sites have text encoded using Arabic Kaf.

The situation for Arabic Yeh is worse than that.  It was
impossible to show the Persian letter Yeh (called Farsi Yeh in
the Unicode standard because of compatibility issues) with fonts
shipped by early Microsoft Windows products.  As you know, the
inability to type one letter is as good as not being able to type
at all!

Moreover, the Persian Yeh is also mapped incorrectly on the keyboard
layout in Microsoft Windows products.
To get around this problem, Persian webmasters (as well as
those doing Persian word-processing) resorted to use of the
Arabic Yeh to enable them to type Persian in one fashion or
another.  It was deemed preferable to see the two dots on the
f{}inal Yeh to the completely wrongly shaped Yeh in medial
position.

As a consequence, you see more than a half of Persian web pages
use Arabic Yeh with two dots below instead of Persian Yeh.  This
is unfortunate to see people do not even complain about the two
extra dots.  This has been further complicated by \emph{helpful}
software vendors in Iran selling fonts with the dots removed from
Arabic Yeh!  Others mixed both Arabic and Persian Yeh to achieve
a perfect visual presentation, while making it impossible to
search the content using any search engine.
\end{note}

\slide
\slt{Alphabet\cont}
\begin{itemize}
\item Three shapes of composed Hamza Above:\\
\quad \uni{0623}{Alef with Hamza Above} (\fa{=a`}),\\
\quad \uni{0624}{Waw with Hamza Above} (\fa{v`}),\\
\quad \uni{0626}{Yeh with Hamza Above} (\fa{y`})
\item Never used characters:\\
%\quad \uni{0629}{Teh Marbuta} (\fa{ة}),\\
\quad \uni{0649}{Alef Maksura} (\fa{y}): Like Yeh, but no dots at all\\
\quad \uni{06C0}{Heh with Yeh Above} (\fa{ۀ}): Should
\textbf{never} be used
instead of \uniseq{\uniname{Heh, ZWNJ, Farsi Yeh}} or
\uniseq{\uniname{Heh, Hamza Above}}
sequence
\end{itemize}
\endslide
\begin{note}
Hamza is the most ambiguous letter in the Persian alphabet.  It
is essentially one letter, but appears in three dif{}ferent shapes
as can be seen in the slide.  Sometimes people with dif{}ferent
writing styles use one instead of the other.  Sometimes they drop
the Hamza sign and use a Alef (\fa{a}), Waw (\fa{v}), or Yeh
(\fa{y}) instead!  We will come back to this with examples when
discussing the loose searching problem.

There is another native Arabic letter that is not used in Persian:
\uniname{Alef Maksura}.  \uniname{Alef Maksura}
is like a Persian or Arabic Yeh letter but with no dots at all.
Add this to the confusion already discussed.  By the way, these
two letters, and the Arabic Kaf and Arabic Yeh letters are allowed
to appear in Persian documents when quoting Arabic text.

Among the modif{}ications to the Arabic script is the letter Heh
with a small Hamza above.  This sequence is unknown in Arabic
and has proven to be a major challenge to implement in Persian.
Unfortunately, this was f{}irst encoded as \uni{06C0}{Heh with Yeh
Above}.  Unfortunately, the Heh in this sequence was def{}ined as
a certain Arabic variant of the Heh which is not used in
Persian.  In appearance, this was not a problem, however, when
search engines break down the sequence, they will not be able
to process this sequence correctly for Persian.  Therefore, the
\unicode{06C0} was deprecated in the Persian subset and now it is
necessary to type the \uniname{Heh} and the \uni{0654}{Arabic Hamza Above}
as two separate characters.  The result is visually identical with the
deprecated \unicode{06C0}.  It should be mentioned that even in WinXP,
the \unicode{06C0} has been mapped on the Persian keyboard
(called ``Farsi'' there) and so the user unknowingly propagages this error.

%The character \uniname{Heh with Yeh Above} is even more
%confusing!  In Persian some people instead of writing the sequence 
%\uniseq{Heh, ZWNJ, Farsi Yeh}, like in \fa{khaane-\kern.1em=y}, write it
%like this: \fa{khaane-ye}, which f{}its the name \uniname{Heh with
%Yeh Above}.  But this character has been registered for
%languages other than Persian and so has dif{}ferent semantics,
%most importantly, it is not canonically equivalent to the
%original sequence, and does not expand to anything
%containing the base letter Heh.  In Persian this 
%appearance should be achieved by the following sequence:
%\uniseq{\uniname{Heh}, \uni{0654}{Arabic Hamza Above}}.
\end{note}

\slide
\slt{Special Characters}
\begin{itemize}
\item \uni{0640}{Arabic Tatweel}, for a longer joining stem\\
\begin{center}
\fa{ketaab} $\rightarrow$ {\fa{ke"|taab}}
\end{center}
\item \uni{200C}{Zero Width Non-Joiner}, to prevent joining\\
\begin{center}
\fa{ketaabhaa} $\rightarrow$ \fa{ketaab-haa}
\end{center}
\item \uni{200D}{Zero Width Joiner}, to choose a joined glyph
when it would not join naturally\\
\begin{center}
\fa{h.\thinspace sh} $\rightarrow$ \fa{h"=.\thinspace sh}
% TBD, faanoos bug: \fa{h"=.sh}
\end{center}
\item \uni{200E}{Left-to-Right Mark}, \uni{200F}{Right-to-Left
    Mark}, and other bidirectional
control chars (\unirange{202A}{202E})
\end{itemize}
\endslide
\begin{note}
\emph{Tatweel} is used when the author wants to force a longer
joining stem as shown.  As an isolated character, it look like a
dash character, perhaps thicker (\fa{="|=}).

\emph{ZWNJ} is one of the essential features for proper Persian
support.  It is widely used in Persian texts.  Think of it as the
hyphen character in phrases like ``home-brew'', or
``pseudo-random''.  One may drop the hyphen in the f{}irst example and
write it ``homebrew''.  Or one may replace it with a space in
second example and write it ``pseudo random''.
In Persian, when to use ZWNJ and when to use a space are governed
by complex rules of style and esthetics.

\emph{ZWJ} is not used as regularly as ZWNJ.
It forces a character into its joined form when it normally
would be in its isolated form.
This is usually used in for two purposes:
\begin{itemize}
\item
To distinguish between \uni{0665}{Arabic-Indic Digit Five} (\fa{٥})
and \uni{0647}{Arabic Letter Heh} (\fa{h}).  As shown in the
slide.
\item
To present a specif{}ic shape of a character.  For example
to draw this initial Yeh glyph: \fa{y"=}.  These shapes are
sometimes used in abbreviations.
\end{itemize}

Bidirectional control characters are used to control the
behavior of the Bidirectional Algorithm explicitly.  
\emph{LRM} and \emph{RLM} are specially needed in, for example,
the paragraph direction of a paragraph starting with Latin text
to right-to-left and vice versa.
\end{note}


\slide
\slt{Numbers}
\begin{itemize}
\item
\unirange{06F0}{06F9}
\uniname{Extended Arabic-Indic Digits}:
\begin{center}
\fa{9 8 7 6 5 4 3 2 1 0}
\end{center}
instead of
\unirange{0660}{0669} \uniname{Arabic-Indic Digits}:
\begin{center}
\fa{9 8 7 ٦ ٥ ٤ 3 2 1 0}
\end{center}
\item \uni{066C}{Arabic Thousands Separator} and
\uni{066B}{Arabic Decimal Separator}:
\begin{center}
\fa{9'876'543.210}
\end{center}
\item Western numerals in Latin context.  Persian numerals
everywhere else (page numbers, section numbers, \dots)
\end{itemize}
\endslide
\begin{note}
Three of ten decimal digits used in Persian look dif{}ferent from
their Arabic counterpart.  For this and some other technical
reasons the Unicode standard has allocated a set of ten
characters for Persian digits (called Extended Arabic-Indic
Digits), as well as the ones used in Arabic (called Arabic-Indic
Digits).  Neither of these two set should be confused with the
Western set of ASCII digits also known as Arabic numerals.

Persian digits should be used with Arabic Thousands Separator and
Arabic Decimal Separator.  Using comma and a dot-shaped decimal
separator is not allowed with Persian digits.  In Iran people
always read and write Persian digits.  This means that page
numbers, section numbers, monetary values, font sizes,
spreadsheet cells, are all supposed to be in Persian digits.
This level of support needs the host system to be able to parse
Persian digits as numerical data.

In Iran people read and write western digits in a Latin
context.  This is unlike most of Arabic countries where they write
numbers with Arabic-Indic digits even in the middle of an English
text.  So turning all western digits into Persian digits
automatically is not an option.  Microsoft software does not
interpret Persian digits characters as numerical data yet.  As a
consequence they have put western digits on their Persian
keyboard (called ``Farsi'' in their context).  And f{}inally you
see Persian sites with western digits typed in everywhere.

Arabic-Indic digits should not be used in Persian text.
\end{note}


\slide
\slt{Other Characters}
\begin{itemize}
\item Harakat (Vowel) Non-spacing Marks:
\fa{"|A} \fa{"|E} \fa{"|O} \fa{"|AN} \fa{"|EN} \fa{"|ON}
\fa{"|"} \fa{"||} \dots
\item Arabic Punctuation Marks:
\\\quad\uni{060C}{Arabic Comma} (\fa{,}),
\\\quad\uni{061B}{Arabic Semicolon} (\fa{;}),
\\\quad\uni{061F}{Arabic Question Mark} (\fa{?}),
\\\quad\uni{066A}{Arabic Percent Sign} (\fa{\%}),
\item\uniseq{\unicode{00AB}, \unicode{00BB}} \uniname{Double Angle Quotation Marks}
\fa{<< >>}
\item Shared Punctuation Marks:
\\\quad Latin full stop, exclamation mark, parenthesis, square
brackets, \dots
\end{itemize}
\endslide
\begin{note}
Almost all Arabic Harakat (vowels) are allowed in Persian text.
But they are usually used rarely.  There is an exception for
\uni{0650}{Arabic Kasra} (\fa{"|E}) that is widely used at the
end of words.

Arabic punctuation marks are used instead of their Latin
counterparts, because of the dif{}ference in shape.  Latin
quotation marks are not allowed in Persian text and Double Angle
Quotation Marks should be used instead.  In Persian fonts these
Double Angle glyphs usually have a rounded shape. Other Latin punctuation
marks are allowed.

The ISIRI 6219 standard contains the complete list of characters
that should be supported, are optional, or are forbidden in
Persian text.
\end{note}
% TBD, add about RIAL, ALLAH, FARSI SYMBOL, forbidden Pres Forms
% ...
%\slide
%\slt{That's a big lie!}
%\begin{itemize}
%\step\item Glyphs: there are four dif{}ferent presentation forms of ARABIC LETTER BEH
%(\fa{b"=}, \fa{"=b"=}, \fa{"=b}, \fa{b}),
%in addition to one general one, but\dots
%\step\item Ligatures: there \emph{is} an ARABIC LIGATURE LAM ALEF (\fa{laa}), among many others
%\step\item Logos and Emblems: FARSI SYMBOL (U+262B) is there, as well as playing cards suits.
%\end{itemize}
%\endslide
%\begin{note}
%\end{note}



\slide
\slt{Keyboard Layout}
ISIRI 2901:1994, features:
\begin{itemize}
\item All letters and punctuation marks
\item Reasonable placement
\item Persian digits
\item Regular and shifted keys only
\item Some empty slots
\end{itemize}
\endslide
\begin{note}
The ISIRI 2901:1994 national standard is an update to the ISIRI
2901:1989 old keyboard layout.  This is important to support the
second edition, not the old one.  Many companies in Iran provide
the old style and claim to be conforming to ISIRI 2901 standard.

This standard has the best design among dif{}ferent layouts used by
dif{}ferent companies in Iran.  Unfortunately it is not supported
by Microsoft yet.  But free drivers are available in dif{}ferent
formats and for dif{}ferent environments.
Many sites are changing their keyboard layout to this
standard and many Persian sites provide this keyboard through a
JavaScript code.
They say that once you've experienced the feel of this layout,
you'll never leave it!  Not that it is designed for fast typing
or being ergonomic; Just that it \emph{is} better than the other
alternatives.

The mapping to Unicode characters is available at
\url{http://www.farsiweb.info/table/2901-unicode.txt}
and the layout itself can be seen at 
\url{http://crl.nmsu.edu/~mleisher/keyboards/persian.html}
\end{note}


\slide
\slt{Keyboard Layout\cont}
Proposed update, features:
\begin{itemize}
\item Fully backward-compatible with ISIRI 2901:1994
\item Unicode 4.0 repertoire, and complete support for ISIRI 6219:2002
\item Support quoting Arabic text
\item Uses AltGr to add required but rarely used characters
\item Adds all ASCII punctuation marks, useful for editing XML,
    \dots
\item Adds bidirectional control characters
\end{itemize}
\endslide
\begin{note}
The layout has already been out as an experimental driver for Microsoft
Windows for a few months now, and is passing its f{}inal stages to
become the new national standard.  It features a wide range of
characters that used in Iran.

For many years people have suf{}fered from the problem of not
having many ASCII marks on the keyboard, like double quotation,
or number sign, that are needed in editing markup languages, like
HTML and XML.  While they would need to switch to the English
layout to type ASCII letters and digits, everything other ASCII
character can be entered in this layout at most by holding AltGr.

A preview of this layout is available at:\\
\url{http://www.farsiweb.info/standard/ir-kb-layout-preview.pdf}\\
And the windows driver at:\\
\url{http://prdownloads.sourceforge.net/farsitools/persiankeyboard.zip?download}
\end{note}


\slide
\slt{Fonts}
\begin{itemize}
\item Microsoft fonts are Arabic
\item Tahoma is the best looking one
\item Persian fonts are not Unicode compatible yet
\item The only ligature: Lam-Alef
\item Nastaliq is desired, but not possible yet
\end{itemize}
\endslide
\begin{note}
There are almost no Unicode compatible fonts suitable for
Persian available to the public.   The best option are the
Microsoft core fonts which ship with Windows.  But they were
designed for Arabic and do not have a Persian look to them.
The best among them is Tahoma, which is widely used in Persian
web sites today, but it looks comic.

Even though completely functional, many Persian users simply
refuse to use them, purely for esthetic reasons and instead
prefer the completely incomplete and non-standard Persian hack
fonts widely available for download on the internet.

These popular Persian fonts are not available in Unicode
compatible form yet, but some beta releases are available for
download.  The whole package should be out in a couple of months.
The project is supported by High Council of Informatics and fonts
will be available for free.  Thus, the font problem should be
solved within the next few years although it may take longer for
the non-compatible fonts to completely disappear.

One important note in Persian font design is that unlike in
Arabic countries, in Iran people usually do not like any
ligatures.  The only ligature which is used and is mandatory
by the Unicode standard, is the Lam-Alef ligature.  It would be
nice if Arabic fonts had the ability to turn of{}f other ligatures
for Persian text through OpenType's LanguageSystem tables.

Historically, Persian typography has been using Nastaliq style
since its invention in 15th century CE.  Nastaliq is an artistic
style of writing Persian, with complicated joining and curves.
With lead typography it switched back to Naskh, which is what
used today.  With late 1990s' digital typography tools, Nastaliq
became public again, but the popularity dropped because of
unreadability.  It is still used in rare occasions.  Persian
Nastaliq is completely dif{}ferent from Pakistani Nastaliq.
There are rumors that a Persian Nastaliq OpenType font is
possible, but the support for needed OpenType features is not
implemented in any system yet.
\end{note}


\slide
\slt{Date and Time}
\begin{itemize}
\item Three calendars in use!\\
\quad \textbf{Gregorian}, to synchronize with the rest of the world\\
\quad \textbf{Jalali}, the of{}f{}icial calendar\\
\quad \textbf{Islamic}, for some holidays and ceremonies
\item Islamic calendar depends on moon-sighting once a year
\item Week starts Saturday
\item Business weekdays from Saturday to Thursday
\item 24-hour preferred in media
\item No AM/PM equivalent
\end{itemize}
\endslide
\begin{note}
You usually read three of them on a typical Iranian wall-mounted
or pocket calendar.  Jalali is the most commonly used system and
is the default for everyday use; but the two others should be around too.
So it is
important that software systems support these hybrid combination.
A list of Iranian national holidays is available at
\url{http://www.farsiweb.info/table/iran-holidays.txt}

While Jalali is a solar calendar, it is not synchronized with
Gregorian system.  For example, my date of birth is 27 September
1982, but it does not mean that my birth day is 27th of September
each year.  For example, my next birth day is September 26th
2004, according to Jalali system.  The reason is that 2004 is a
leap year.  Jalali has its own leap years.  This simply means that
periodic events in Jalali system cannot be translated in
Gregorian system for storage.  A sample Gregorian to Jalali
two-way converter is available at \url{bamdad.org/date}.  Sources
for the converter are also available for free.

With the Islamic calendar, the situation is even worse.  A
once-yearly moon-sighting is used ot determine the length of
one month, but any changes are reversed in the following month,
such that the global of{}fset of the system can be pre-computed.

Week days start on Saturday and end with Thursday.
The \emph{weekend} consists only of Friday.
There is nothing simpler than supporting this in an
application written from scratch.  On the other hand, there is no design
def{}iciency bigger than only supporting Sunday and Monday as week
start days.

Finally, 24-hour format is what used of{}f{}icially.  Moreover,
there is no direct equivalent for AM/PM.  In Iran people use
dif{}ferent words after the hour to indicate they mean AM or
PM.  These words are equivalents to mid-night, early morning,
morning, noon, afternoon, evening, night.
\end{note}


\slide
\slt{Collation}
\begin{itemize}
\item Like Arabic basically
\item \hfill\fa{Y > v > h"=- > n}\quad$\rightarrow$\quad\fa{y >
h"=- > v > n}\hfill\hbox{}

\item Some L2 equal pairs:\\
\hfill
\quad\fa{k} $<_3$ \fa{K}
\quad\fa{y} $<_3$ \fa{Y}
\quad\fa{ة} $<_3$ \fa{t}
\hfill\hbox{}\\[1ex]\hfill
\quad\fa{4} $<_3$ \fa{٤}
\quad\fa{5} $<_3$ \fa{٥}
\quad\fa{6} $<_3$ \fa{٦}
\hfill\hbox{}
\item Traditional rules:  Hamza variants are L2 equal:
\\\hfill
\quad\fa{=a`}\ \ $<_3$\ \ \fa{w`}\ \ $<_3$\ \ \fa{y`}
\hfill\hbox{}
\item Modern rules:  L2 equal with their base letter:
\\\hfill
\quad\fa{a} $<_3$ \fa{=a`}
\quad\fa{w} $<_3$ \fa{w`}
\quad\fa{y} $<_3$ \fa{y`}
\hfill\hbox{}
\end{itemize}
\endslide
\begin{note}
Persian collation rules are basically the same as Arabic.  There
is a major dif{}ference in the order of two main letters Heh and
Waw.  The modif{}ied letter are considered basically equal and only
make a dif{}ference in third level in Unicode collation algorithm.
The preference is trivially for Persian letters to come f{}irst in
case of a tie.

The confusing part is about Hamza.  Traditionally Hamza has been
considered a single letter.  So three dif{}ferent variants would
be sorted as one letter and before Alef.  But recently this habit
is changed in favor of the base characters that Hamza sits on.
There are both good and bad examples of why each way is good and
is bad.  We will soon see examples that dif{}ferent variants of
Hamza are used for the same word, and will see examples that the
Hamza can be replaced by the base characters.
But generally speaking, the modern rules make more sense, but
each one has its own users.  So providing both is desired.

There is a national project under way
to identify Persian collation rules completely and  precisely.
\end{note}


\slide
\slt{Loose Searching}
\begin{center}
\renewcommand{\extrarowheight}{1.5ex}
\begin{tabular}{ccc}
\uniname{ZWNJ} $\simeq$ \uniname{Space} & &
\uniname{ZWNJ} $\simeq$ \emph{empty string}
\\
\fa{حج"=\textcolor{red}{"=ت}\thinspace‌الاسلام} $\simeq$
\fa{حج"=\textcolor{red}{"=ة}\thinspace‌الاسلام} &
\quad &
\fa{دایر\textcolor{red}{ة}\thinspace‌المعارف} $\simeq$
\fa{دایر\textcolor{red}{ه}\thinspace‌المعارف}
\\
\fa{ketaab} $\simeq$ \fa{ke"=\textcolor{red}{"|}"=taab} & &
\fa{khaane"=\textcolor{red}{"=h\thinspace=y}} $\simeq$
\fa{khaane"=\textcolor{red}{"=h-ye}}
\\
{\fa{y}} $\simeq$ {\fa{Y}}\quad & &
{\fa{k}} $\simeq$ {\fa{K}}\quad
\\
\end{tabular}
\begin{tabular}{ccc}
\fa{t"=\textcolor{red}{"=ا}خیر} $\simeq$
\fa{t"=\textcolor{red}{"=أ}خیر} &
\fa{paa"=\textcolor{red}{y"=}="=iz} $\simeq$
\fa{paa"=\textcolor{red}{y`"=}="=iz} &
\fa{s"=\textcolor{red}{"=و}ال} $\simeq$
\fa{s"=\textcolor{red}{"=v`}ال}
\\
\fa{mas"=\textcolor{red}{"=y`"=}"=له} $\simeq$
\fa{mas"=\textcolor{red}{"=a`}له} &
\fa{m"=\textcolor{red}{"=a`}mn} $\not\simeq$
\fa{m"=\textcolor{red}{"=v`}mn} &
\fa{mas"=\textcolor{red}{"=v`}vl} $\simeq$
\fa{mas"=\textcolor{red}{"=y`"=}"=vl}
\\
\fa{\textcolor{red}{m"=}"=lk} $\simeq$
\fa{\textcolor{red}{mO"=}"=lk} &
\fa{\textcolor{red}{mO"=}"=lk} $\not\simeq$
\fa{\textcolor{red}{mE"=}"=lk} &
\fa{\textcolor{red}{mE"=}"=lk} $\simeq$
\fa{\textcolor{red}{m"=}"=lk}
\\
\fa{0} $\simeq$ \fa{0}
\dots\hspace{-1em}
&
\fa{4} $\simeq$ \fa{٤}
\quad\fa{5} $\simeq$ \fa{٥}
\quad\fa{6} $\simeq$ \fa{٦}
&
\hspace{-1em}\dots
\fa{9} $\simeq$ \fa{9}
\end{tabular}
\end{center}
\fa{}
\endslide
\begin{note}
Most of the world's ``Find'' dialogs has an option to turn case
sensitivity on or of{}f.  Also almost all search engines have their
own rules to f{}ind matches that queries with accents removed.  The
equivalent for these for Persian is a set of simple rules, that
when applied makes a huge dif{}ference in the quality of the
matched results.

Many of these rules that are shown above are due to dif{}ferent
orthographical practices.  Unfortunately there is not a general
framework for loose searching implemented in any software as far
as I know.  But many applications already implement some of these
rules when searching in Arabic script in general.  It is a real
problem of the Iranian computer users today that when searching
for a word in Google they have to examine dif{}ferent cases, with
Arabic Yeh, or Persian Yeh, with Waw with Hamza Above, or
with Waw only, etc.

When applying these rules in a computer system, especially a word
processor, it is important to let the user turn each instance of loose matching
on or of{}f.  For example while generally users like the the Arabic
Yeh to be matched to Persian Yeh and vice versa; then there are
this rare but important situations where the user may like to f{}ind just
Arabic Yehs in the document to change them to Persian Yeh!  Then
the user should be able to turn loose matching of Arabic Yeh to
Persian Yeh of{}f.  Same for other cases.

Much like the case for collation, this is not a complete and exact
list of all cases that should be considered, but the most
important one.  The same national project is working to identify
and document the loose searching requirements of Persian
computing precisely.
\end{note}


\slide
\slt{Last Notes to Application Developers}
\begin{itemize}
\item Typesetting Persian paragraphs:
\\\quad Justif{}ied lines
\\\quad No inter-letter spacing
\\\quad No word hyphenation
\\\quad Almost no inter-word spacing
\\\quad Use Tatweel instead: \fa{ketaab-e man} $\rightarrow$
\fa{ke"|"|"|t"|"|aab-e m"|"|"|n}
\item All text f{}ields Right-to-Left
\item Persian numbers
\item Right-to-Left layout
\item Beware: Right and Left are swapped!
\end{itemize}
\endslide
\begin{note}
Persian paragraphs usually have justif{}ied lines, but
right-justif{}ied paragraphs are allowed too.    
Justif{}ied lines can be achieved in CSS by \texttt{p \{text-align: justify;\}}.
Moreover, renderers should avoid inter-letter spacing and word
hyphenation with Persian text.  Inter-word spacing should be
avoided as much as possible too.  Instead, the joining stem can
be stretched to adjust the text to the line width.  This is like
inserting Tatweel characters in special places that characters
join together.

Another thing to note is that all text f{}ields should be
right-to-left.  In CSS it means \texttt{direction: rtl} for
almost all elements.  This is specially needed for f{}ields like
date that have no Persian letter in them.

And do not forget that numbers should be presented in Persian.
This usually means that you should have special number output
handling functions that when working in Persian mode, generate
Persian digits.  No programming environment does handle
generating Persian digits automatically in general.

Last but not least is to remember to mirror the general layout of
the user interface.  In CSS this is almost automatically done
when you set direction for HTML tags.  But this is not the whole
story.  All the places that you specify \texttt{left} or
\texttt{right} in your code, the value most probably should be
switched.  By the way, it would be easy if that were all, but there
are places that the switch should not happen, for example a
text entry f{}ield for \emph{English Name} better always be aligned
to left, not right.

This is very important, when writing code, that sometimes the left
and right directions are swapped.  For example it is quite common
that in Persian application with automatic right-to-left layout,
in a menu bar, you press the left arrow key, but it would bring you to the
right neighbor!
\end{note}


\slide
\slt{Current Status -- Microsoft Windows System}
\begin{itemize}
\item Renders correctly
\item Shipped fonts work
\item Keyboard layout is terrible
\item No Persian digits support
\item No Iranian calendar
\item Locale data is wrong in places
\item No interface translation
\item Not trivial to enable Persian support
\end{itemize}
\endslide
\begin{note}
As mentioned before, Microsoft Windows systems are the widely
used system in Iran.  The current status as of Microsoft Windows
XP is that the shipped fonts f{}inally work as expected with
Persian data, but the keyboard layout is next to unusable and
wrong.  Standard keyboard driver and Persian fonts are available
to download for free.  There are chances that they f{}ind their way
in the next major version of the system, but there may be a
problem with font licenses.

There is not much more about Persian support in yet.  Persian
digits does not work, and so they do not appear on the keyboard
layout.  There is no Iranian calendar support, and the other
localization work is still weak.  As an example, AM/PM schema has
been translated to Persian, which is never used by Iranians.

Microsoft has never had a Persian translation of the system
interface, so people usually use Windows in English.  This, plus
the compexity of enabling Persian support in Microsoft Windows
has proved to be a stopper for end-users.  There are a handful of
dif{}ferent software vendors in Iran selling software to
\emph{enable} or \emph{add} Persian support to Microsoft Windows
XP!

Fortunately Microsoft has started the work on Persian interface
translation.  So we can expect that as of the next major release,
Persian is in the initial list of supported languages, and
selecting Persian would automatically enable all other needed
options.  This can put an end to a complete business in Iran!

Needless to say, Microsoft Of{}f{}ice inherits the same level of
support and so the same shortcomings.
There is an excellent tutorial by Connie Bobrof{}f on setting up
Microsoft Word to produce Persian documents.  It covers in detail
Windows versions from 98 to XP.  The tutorial is available online
at:\\
\hbox{\url{http://students.washington.edu/irina/persianword/persianwp.htm}}
\end{note}


\slide
\slt{Current Status -- Linux}
\begin{itemize}
\item Important systems support rendering
\item No good fonts yet
\item Standard keyboard layout
\item No Persian digits support yet
\item KDE claims Iranian calendar support
\item Some interface translation done
\item Not trivial to enable Persian support
\end{itemize}
\endslide
\begin{note}
It is harder to talk about Persian support under Linux as there
is no reference Linux distribution.  Another reason is that not
all applications use the same library and toolbox.  But the good
news is that almost all important platforms have basic Persian
support already.  This includes GNOME, KDE, Mozilla, OpenOf{}f{}ice,
and a few others.  Needless to say, GNOME uses Pango to display
internationalized text, including Persian.

There is no good Persian font available under Linux yet, but the
fonts we mentioned earlier, that are passing their beta stage
rapidly would become available and can be included in any Linux
distribution.

Standard keyboard layout is available, both under the X Window
system and the Linux console.  Interestingly, KDE claims to have
Iranian calendar support, but we have never tested it.  The
locale data seems to be more accurate under Linux.  Moreover,
there is work that would enable Persian digits in the GNOME
platform in the near future.

There is some Persian interface translation work available, but
still far from a good quality usable thing.  The good point is
that the mechanisms are open and mature.  So anyone is welcome to
translate any application he would like.

Much like the case with Microsoft Windows, Linux users suf{}fer
from the nontriviality of enabling Persian support.  In the case
of Linux it mostly means to just setup the keyboard and fonts,
but since Linux is much less popular in Iran, not everyone knows
how to do this.
\end{note}


\slide
\slt{Current Status -- MacOS}
\begin{itemize}
\item Supports rendering
\item No good fonts
\item Legacy and standard keyboard layouts
\end{itemize}
\endslide
\begin{note}
Apple hardware and MacOS systems are rare in Iran.  The latest
system MacOS 10.3 is known to render Persian pretty good.
But again the shipped fonts are not Persian.  MacOS provides
both the standard keyboard layout and their own legacy
layout.  Apple used to ship Persian translated interface in the
old pre-Unicode days.  But seems like they have stopped it long
ago.  The good news is that translation infrastructure is in
place so again anyone can do the translation.

The author does not know about the level of Persian support under
MacOS systems in more detail.
\end{note}


\slide
\slt{References and Resources}
\begin{itemize}
\item The Unicode Standard at \url{http://www.unicode.org/}
\item Institute of Standards and Industrial Research of Iran at
\url{http://www.isiri.com} (documents in Persian)
\item The FarsiWeb Project at \url{http://www.farsiweb.info/}
\item PersianComputing list at
    \url{http://lists.sharif.edu/mailman/listinfo/persiancomputing}
\item Typing Persian Word Documents with Windows Tutorial at \\
    \url{http://students.washington.edu/irina/persianword/persianwp.htm}
\end{itemize}
\endslide
\begin{note}
The Unicode standard is considered a great resource for Persian
computing.  After that you may like to look at the FarsiWeb
Project web pages that contains documents and products of the
project available for free.

Unfortunately all national standards are in Persian.  FarsiWeb
project is translating the important ones to English, but this is
not done yet.

None of the discussed issues in this paper have been covered
fully with all details here, as that would be well beyond the
bounds of this general overview.  For more information you are
invited to subscribe to the PersianComputing public mailing list 
where the FarsiWeb Project Group members as well as many other
volunteers will help you f{}ind the answers to your questions and
provide you with latest information.  The list archives also are
considered one of the best resources in this area.

Typing Persian Word Documents with Windows Tutorial is
an excellent website on setting up Microsoft Word to produce Persian
documents.  It introduces most of the the problems on the
platform and provides solutions under dif{}ferent versions of the
system.
\end{note}


\begin{note}
\begin{center}
\Large Acknowledgement
\end{center}
The author wish to thank C.~Bobrof{}f for taking the hard task of
editing the f{}inal version of this paper multiple times.

\begin{center}
\Large About the FarsiWeb Project
\end{center}
The FarsiWeb project started as a research project in the
Computing Center, Sharif University of Technology in early 1999,
which later moved to a startup company called Sharif FarsiWeb,
incorporated in late 2003, that still has its research lab at
SUT.

FarsiWeb has close relations to all of the language and computing
authorities of Iran and the Persian language, including the
Persian Academy of Language and Literature, the High Council of
Informatics, and ISIRI (the Iranian national standardization
body). In the last f{}ive years, FarsiWeb has been representing
those organizations in various standard bodies and international
organizations, including the Unicode Consortium, ISO JTC1/SC2,
World Wide Web Consortium, and IETF. It has helped ref{}ine the
specif{}ications of those bodies to incorporate the requirements of
Persian and other languages written in Arabic script.

FarsiWeb is considered one of the main authorities of standard
Persian computing, and has published many recommendations and a
national Iranian standard (on Persian information interchange
using Unicode) on matters related to implementation of Persian
language, which has won the approval of all
the authorities. A national standard on a national keyboard
layout, a reference set of Persian fonts for web and printing
usage, and a few specif{}ications on requirements of standard
Persian support on GNU/Linux platforms are under preparation.

In early 2003, FarsiWeb also co-developed a report on ``Computer
Locale Requirements of Afghanistan'' with Everson Typography of
Ireland, which won the approval of the transitional government of
Afghanistan.

\begin{center}
\Large About the Author
\end{center}

Behdad Esfahbod is the maintainer and main developer of FriBidi,
a Free Software implementation of the Unicode Bidirectional
Algorithm. FriBidi is used in many Open Source projects
including the GNOME desktop and AbiWord word processor, where
it is used as a requirement for rendering scripts like
Arabic, Hebrew, and Syriac. Behdad is also a key member of
the FarsiWeb Project Group, working on tasks ranging from
adding Persian support to Free Software applications around
the world, to writing national standards on Persian computing
issues.

Behdad is a member of Unicode Consortium's Bidi Committee, and a
member of the FarsiTeX Project Team.  He is currently pursuing
graduate studies at the University of Toronto, Department of
Computer Science.   It is a pity that he enjoys mountain climbing
so much, while there are only lakes around Toronto.
\end{note}
\end{document}

\slide
\slt{}
\begin{itemize}
\item
\item
\item
\end{itemize}
\endslide
\begin{note}
\end{note}

