The following two tables use a webfont to test your browser's Syriac shaping of Alaph.

With one U+0710 ALAPH character(s) in between:
L /
 / 
/ R
U+0020 SPACE U+0718 WAW U+072A RISH U+0716 DOTLESS DALATH RISH U+0715 DALATH U+072F PERSIAN DHALATH U+0712 BETH
U+0020 SPACE ܐ ܐܘ ܐܪ ܐܖ ܐܕ ܐܯ ܐܒ
U+0718 WAW ܘܐ ܘܐܘ ܘܐܪ ܘܐܖ ܘܐܕ ܘܐܯ ܘܐܒ
U+072A RISH ܪܐ ܪܐܘ ܪܐܪ ܪܐܖ ܪܐܕ ܪܐܯ ܪܐܒ
U+0716 DOTLESS DALATH RISH ܖܐ ܖܐܘ ܖܐܪ ܖܐܖ ܖܐܕ ܖܐܯ ܖܐܒ
U+0715 DALATH ܕܐ ܕܐܘ ܕܐܪ ܕܐܖ ܕܐܕ ܕܐܯ ܕܐܒ
U+072F PERSIAN DHALATH ܯܐ ܯܐܘ ܯܐܪ ܯܐܖ ܯܐܕ ܯܐܯ ܯܐܒ
U+0712 BETH ܒܐ ܒܐܘ ܒܐܪ ܒܐܖ ܒܐܕ ܒܐܯ ܒܐܒ

With two U+0710 ALAPH character(s) in between:
L /
 / 
/ R
U+0020 SPACE U+0718 WAW U+072A RISH U+0716 DOTLESS DALATH RISH U+0715 DALATH U+072F PERSIAN DHALATH U+0712 BETH
U+0020 SPACE ܐܐ ܐܐܘ ܐܐܪ ܐܐܖ ܐܐܕ ܐܐܯ ܐܐܒ
U+0718 WAW ܘܐܐ ܘܐܐܘ ܘܐܐܪ ܘܐܐܖ ܘܐܐܕ ܘܐܐܯ ܘܐܐܒ
U+072A RISH ܪܐܐ ܪܐܐܘ ܪܐܐܪ ܪܐܐܖ ܪܐܐܕ ܪܐܐܯ ܪܐܐܒ
U+0716 DOTLESS DALATH RISH ܖܐܐ ܖܐܐܘ ܖܐܐܪ ܖܐܐܖ ܖܐܐܕ ܖܐܐܯ ܖܐܐܒ
U+0715 DALATH ܕܐܐ ܕܐܐܘ ܕܐܐܪ ܕܐܐܖ ܕܐܐܕ ܕܐܐܯ ܕܐܐܒ
U+072F PERSIAN DHALATH ܯܐܐ ܯܐܐܘ ܯܐܐܪ ܯܐܐܖ ܯܐܐܕ ܯܐܐܯ ܯܐܐܒ
U+0712 BETH ܒܐܐ ܒܐܐܘ ܒܐܐܪ ܒܐܐܖ ܒܐܐܕ ܒܐܐܯ ܒܐܐܒ

Report to OpenType and Unicode Technical Committee mailing lists:

Hello again everyone,

Two days of digging into Syriac and I think I figured it all out.  Thanks
everyone who responded.

I downloaded the Meltho Syriac fonts, chose one, added a dummy glyph for
U+072F SYRIAC LETTER PERSIAN DHALATH, and modified all the various glyphs for
ALAPH to have distinct looks, and tried it in ie9 to get a look at what it's
doing.  It confirmed my suspicion that the OpenType Spec has a bug in when
med2 is used.  ie9 fully matches my suspicions and expectations so I'm
believing that it's the correct behavior.  To summarize:

 * The Syriac Open Type spec [1] has the following rule:

      g. Apply feature 'med2' to replace the 'Alaph' glyph in the middle of
	 Syriac words when the preceding base character cannot be joined to.

I believe that the correct rule must be:

      g. Apply feature 'med2' to replace the 'Alaph' glyph in the middle of
	 Syriac words when the preceding base character can be joined to.

Ie. 'med2' replaces the 'fina' form of Alaph when not at the end of the word,
not the 'isol' form.

I find it extremely confusing that the OpenType features for Alaph shaping are
named 'fin2', 'fin3', and 'med2' which suggest that these are alternative
versions of 'fina' and 'medi' whereas in reality:

 * 'fin2' and 'fin3' are alternate versions of 'isol' when appearing at the
end of the word.

 * 'med2' is an alternate version of 'fina' when not appearing at the end of
the word.

How confusing...


Anyway.  So the OpenType spec is wrong and needs to be fixed.  Other than that,

 * Both Unicode and OpenType need to be updated to include all four of Dalath,
Rish, Dotless Dalath Rish, and Persian Dhalath.  In my HarfBuzz implementation
I'm using Joining_Group=Alaph and Joining_Group=Dalath_Rish for the
Syriac-specific shaping rules.  I suggest Unicode specifies that.

  - Unicode should also make it clear that in the Syriac shaping ruls (R1, R2,
R3), Joining_Type=Transparent characters are skipped.  That is, to mention
that the R1 rule from Arabic Shaping has precedence over the Syriac shaping rules.


The Unicode Syriac Shaping has rules that depend on "word breaking character"s:

      An alaph that has a non-left-joining character to its right, except
      for a dalath or rish, and a word breaking character to its left will
      take the form of A_fn.

That can be hard to implement since 1) there's no such thing as "word breaking
character" in Unicode, there is the Unicode Text Segmentation Algorithm, 2)
shaping engines typically don't have access to the word-boundaries as
determined by the Text Segmentation Algorithm.  I suggest replacing that with
something like this:

      An alaph that has a non-left-joining character to its right, except
      for a dalath or rish, and a non-joining character or the end of the
      text to its left will take the form of A_fn.

Ie. rely on the fact that in this context, roughly any "word breaking
character" is non-joining and vice versa.


I put together an annotated table of the Syriac Alaph shaping rules from the
Unicode standard as well as two tables of Alaph in various contexts rendered
using a modified webfont that has different glyphs for each different shape of
Alaph all together into an HTML page.  One can compare the rendering of this
page with that of ie9 to check their Syriac shaping correctness:

  http://behdad.org/syriac/

Feel free to copy / circulate.

behdad

[1] http://www.microsoft.com/typography/otfntdev/syriacot/shaping.htm


On 10/04/10 23:04, Behdad Esfahbod wrote:
> > [At the risk of making a mess I'm posting this to both multiple lists.]
> > 
> > Hi,
> > 
> > I'm implementing Syriac in HarfBuzz and am using the Syriac shaper from Pango
> > and the Arabic/Syriac shaper prototype from Jonathan Kew as references, as
> > well as the Unicode 5.0 section on Syriac Shaping and the Syriac OpenType
> > specification [1].
> > 
> > The Unicode spec specifies three additional shaping rules dealing with Syriac
> > Alaph at the end of word shaping differently (two additional shapes) as well
> > as a special shape when alaph is at the end of word and preceded by "a dalath
> > or rish".
> > 
> > The OT spec (from 2002) has similar rules (the 'fin2' and 'fin3' features) but
> > also includes "dotless 'Dalath-Rish'" in addition to dalath and rish.  This
> > sounds correct.
> > 
> > The code from Jonathan Kew also includes the U+072F SYRIAC LETTER PERSIAN
> > DHALATH as well.  Given that this was added in Unicode 4.0, there is no
> > surprise that it's not mentioned in the OpenType spec.  But I'm not sure that
> > it really should affect the alaph like regular dalath does.  Can someone who
> > knows that character comment please?
> > 
> > One way or the other, the Unicode text needs to be updated to clarify what "a
> > dalath or rish" means in the text.
> > 
> > 
> > Thanks,
> > behdad
> > 
> > [1] http://www.microsoft.com/typography/otfntdev/syriacot/shaping.htm
(C) 2010 Behdad Esfahbod