By Behdad Esfahbod
<behdad behdad org>
Last major update: January 18, 2010
Last minor update: December 18, 2012
At the time of writing the initial version of this paper, the author was working for Red Hat's Desktop team and has been involved with GNOME and Fedora for a long time. He has been a developer and/or maintainer of many modules discussed in this paper at various times, including fribidi, fontconfig, harfbuzz, pango, cairo, and gnome-terminal.
Text is the primary means of communication in computers, and is bound to
be so for the decades to come. With the widespread adoption of
Unicode as the canonical character
set for representing text a whole new domain has been opened up in a
desktop system software design. Gone are the days that one would need to
input, render, print, search, spell-check, ... one language at a time.
The whole concept of internationalization (i18n) on which Unicode is based
all languages, all the time.
The Free Software desktop has been rather late to the Unicode bandwagon, but in the past ten years all the major pieces have gathered together and nowadays, on a modern GNU/Linux distribution like Fedora, one cannot easily get anything other than Unicode working.
Internationalization and Unicode text processing are about more than just rendering text on the screen. However, in this paper we focus on the specific problem of text rendering, ie. from input Unicode text to pixels lit on the screen. We will discuss the current architecture, identify problems that have limited progress in recent years, and propose actions to be taken to remedy them.
While there are multiple text rendering stacks available in the Free Software world and even on a single GNU/Linux desktop, in this document we focus on the GNOME text rendering stack and the Fedora Project where it comes to distro-specific issues. Fedora and Red Hat have been showing leadership in advancing the text stack for years, and other distributions have been fast adopting these new technologies. We expect that to remain the case for the years to come, although it would be nice to see other distributions / communities start contributing more closely to the parts of the stack we all share.
This document is a draft working-copy paper. It is a roadmap of where we are now and where we want to be, and will be updated as we get there.
Traditionally fonts were a collection of glyphs and a simple one-to-one mapping between characters and glyphs. Rudimentary support for simple ligatures was available in some font formats. With Unicode however there was a need for formats allowing complex transformation of glyphs (substitution and positioning). Two technologies were developed to achieve that, one is OpenType Layout from Microsoft and Adobe, the other is AAT from Apple. These two technologies, plus TrueType and Type1 font formats, all were combined in what is called OpenType.
There are fundamental differences in how AAT and OpenType Layout work. In AAT the font contains all the logic required to perform complex text shaping (the process of converting Unicode text to glyph indices and positions). Whereas in OpenType, the script-specific logic (say, Arabic cursive joining, etc) is part of the standard and implemented by the layout engine, with fonts providing only the font-specific data that the layout engine can use to perform complex shaping.
The Free Software text stack is based on the OpenType Layout technology. HarfBuzz is an implementation of the OpenType Layout engine (aka layout engine) and the script-specific logic (aka shaping engine).
Originally the FreeType project implemented the OpenType Layout engine as part of the FreeType 2 project, however it was dropped from FreeType at the last moment when it was decided that OpenType shaping is not involved in rasterizing glyphs and hence is out of the scope of FreeType. The FreeType Layout (FTL) code was salvaged by Pango and Qt developers and kept in house for quite a few years. Owen Taylor developed an abstract buffer on top of the layout engine making it much easier to use.
Around 2006 Pango and Qt developers cooperated to reunify the layout engine again, and HarfBuzz was born as a freedesktop.org project. Initially it was just merging back the existing code and renaming it, but after various meetings, the plan to make HarfBuzz be a unified shaping engine was born and have been the goal since. HarfBuzz was relicensed (thanks to FreeType developers) to the old MIT license to rid it of the FTL advertisement clause.
In 2007 (?) TrollTech donated the Qt shapers to HarfBuzz under the same license as the layout engine code. This is the current state of HarfBuzz. At this time Qt ships with its own copy of HarfBuzz which is identical to the upstream HarfBuzz. Pango ships with its own copy also, but only uses the layout engine, and not the HarfBuzz shapers.
Since 2008 the author has been working on rewriting the layout engine to be more robust and use mmap()ed fonts efficiently, and that work is mostly done now. Next step is to design a user-friendly high-level API for the shaping engine and merge the Pango and Qt shapers and put them under the new API. This is a work in progress by Red Hat and Mozilla.
HarfBuzz is currently being used by Pango, Qt, the Linux port of Google's Chromium browser, as well as some smaller project. The grand plan is for it to be used directly by any code needing direct access to a portable and robust complex shaping engine. That would include toolkits, browsers, word processors, and design applications. We will expand on that in a later section.
One can loosely divide the consumers of the text rendering stack based on their varying demands and requirements:
GUI Toolkits like GTK+ and Qt need the least flexibility when it comes to text rendering. Indeed, all the user cares about is that the text is rendered to the screen and is legible. Pango has been specifically designed with this use case in mind. Qt is even worse in that it pretty much does not support any other use case. Ultimately most (all?) other use cases should be made to use HarfBuzz directly, freeing Pango to do what it's really good at: Providing an easy-to-use API for GUI toolkits.
Web browsers have two unique requirements that make it hard to use the native text stack in full:
It is worthwhile to review what the various web browser engines currently use for their complex text support:
Firefox uses Pango. Firefox 2 was hacked to use PangoLayout API. That was very abusive and inherently inefficient. Firefox 3 has got a new layout engine that is completely based on cairo. The Linux port subclasses PangoFcFontMap to be able to support both CSS text selection as well as web fonts. By doing that it is essentially reimplementing most of Pango and only using the shaping logic. It makes much more sense to use HarfBuzz directly, and Mozilla is now working on getting HarfBuzz ready for that.
Webkit-GTK uses PangoCairo. They use Pango the same way that Firefox 2 used to do. At the end of the day, it's at best a hack. Moving to HarfBuzz when the time is right should fix that.
Webkit/Android is the webkit engine as used by Google in Android and Chromium. It uses a system called Skia for 2D graphics rendering. The team at Google has released an alpha Linux port of Chromium that is using HarfBuzz directly.
Word processors' biggest unique demand from a shaping engine is good support for and lots of control on line hyphenation and justification. Also important to them is choosing fonts as closely and robustly as possible to the font requested by the document. Device-independent metrics as well as metric-compatibility with other word processors is another requirement (required by all kinds of office suite applications really.) OpenOffice.org currently uses ICU and AbiWord uses Pango. Both will have a better time using HarfBuzz shaping directly.
Designer tools demand much of what word processors do, but also access to advanced font features (alternate glyphs, etc), being able to correctly handle fonts with many various (and non-standard) styles, things like manual kerning, as well using SVG fonts and embedded subsetted fonts. Inkscape and The GIMP use Pango currently, and Scribus is in transition to / has been ported to using HarfBuzz directly.
Font designer tools can use HarfBuzz directly to generate previews. Other than that there is not much else to share really, even though they both deal with the very same objects (fonts): the font tools needs to be able to generate font tables, which is out of the scope of a shaping engine. Fontforge has the option to use Pango currently.
Terminal emulators with support for complex text are very weird hybrids. On the one hand terminal emulators have to lay text out in a predefined grid in a predefined way, which is in conflict with many aspects and requirements of complex text, on the other hand users demand support for complex text in their terminals. It gets uglier when you think about bidirectional text, say, inside a console text editor. Nonetheless, it is fair to say that such hybrids do not put any new demands on the shaping engine. gnome-terminal currently has no support for complex text other than combining marks. Konsole has bidirectional text support. Apple's Terminal App has at least bidi support as well as Arabic shaping support, not sure about other complex text. Update (Jan 18, 2010): The terminal mode (term and ansi-term) in recent versions of Emacs can render complex text, including Indic.
Batch document processors have no requirements other than what's required by, say, browsers or word processors. However, so far a decent internationalized batch document processor has pretty much been nonexistent. The reason historically has been that shaping engines were always developed in the context of GUI frameworks, and batch processors typically do not rely on one, and hence are designed without the i18n models present in all moderns GUIs in mind. XMLFO / Docbook processors, etc fit in this category and should use HarfBuzz and the rest of the stack for full complex text support.
TeX engines are batch document processors but worth looking into separately. Historically TeX had no shaping engines and basic shaping was done using macro packages and a variety of hacks. More recently though, XeTeX was invented. XeTeX simply outsources the shaping to an external library, ICU or Apple's ATSUI currently. XeTeX is a separate branch of the TeX evolutionary hierarchy than the mainstream pdfTeX though. The XeTeX creator is working on HarfBuzz on behalf of Mozilla now, and plans to port XeTeX to HarfBuzz eventually. In the long term though, pdfTeX's successor luaTeX should be made to do the same thing. There is more to Unicode support than just shaping, and in those areas the TeX engines can gain a lot by building on top of existing libraries.
FreeType has the widest range of supported font formats in the world.
Fontconfig has the most expressive font configuration language. In fact, other text rendering stacks simply don't have much of a configuration mechanism.
Fontconfig-based text stacks have been the first to support font fallback based on glyph coverage transparently. This feature appeared in Mozilla around the same time (early 2000s) and only recently in Windows Vista as far as I know, though I've been pointed to this which suggests that per-script font fallback was supported since Windows 2000, but certainly not per-codepoint.
FriBidi has been the most standard-complying implementation of the Unicode Bidirectional Algorithm for years. All algorithm bugs found in FriBidi over the years have revealed bugs in the spec itself as well as its reference implementations.
Pango has been the first text stack to support some of the minority scripts encoded in Unicode, even before that version of Unicode was officially released.
However, when one stands back and looks at the stack as a whole, it is not something to envy. As a whole, we have not been making ground-breaking progress for quite a while. The last major progress was the move to client-side fonts itself which fueled a renaissance. Since then, it has mostly been bug fixing, cleanup, polish, small features here and there. Pretty similar to the GNOME2 status one would say. Indeed, the client-side fonts were first introduced in early GNOME2. What we need is the GNOME3 of text rendering, in time for GNOME3.
To those familiar with the text stack, it is hard to not see what is wrong. I believe there are two problems: 1) the current stack is good enough, so improving it stays low-priority for parties involved, and 2) what I like to call segregated efforts. By that I mean, for example:
Fontconfig exposes a dangerously expressive configuration language in XML, and from there any misconfiguration is font packager's fault and not fontconfig's. Even if it is painfully hard to express sane preferences using fontconfig rules.
Fontconfig rightly chose XML as its configuration language, the idea being that font configuration GUIs can be developed to output configuration XML based on user preferences. This has yet to be realized though. Update (Jan 18, 2010): There is a small GUI project called Fontik that allows per-font configuration. This is a step in the right direction IMO.
Pango maintainer adds new features, making it possible to implement advanced features in more demanding applications like designer tools, but leaves it to others to actually implement such features in the tools higher in the stack. All the new features get is a mention in the release notes and, if lucky, a blog post.
Text Layout Summit has been a fairly successful meeting bringing all interested parties together since 2006 (it has been held at GNOME Boston Summit 2006, Akademy 2007, and Libre Graphics Meeting 2008). However, those same parties do not work as closely out of the meeting. For example, while Scribus and HarfBuzz developers have been discussing various issues at length during these meetings, there has not been a much email exchange between the projects over the years.
unifont.org is a website that has many of the right aspirations for how an ideal font and text ecosystem should look like, but the project does not work with the HarfBuzz and other library designers as closely as I like to see.
SIL Graphite and m17n both are reinventing the wheel in one sense. They each definitely have their own justifications why a new wheel may be needed, but the fact remains, that their efforts does not advance the mainstream GNU/Linux text rendering pipeline.
One may even argue that the extremely modular design of GNU/Linux systems makes it painfully hard to expose a truly integrated solution, in many areas including text rendering. For example, the X architecture combined with client-side font rendering makes it close to impossible to optimize the pipeline to take advantage of all the possibilities exposed by modern GPUs, like Microsoft does for example. However, that excuse is irrelevant as it may be part of the problem statement, but it hardly is the answer.
Only recently have the Desktop Team at Red Hat and the Fedora Font SIG started working on features that extend across the stack (vertically or horizontally):
Online font addition/removal: The user-visible feature here is that when a new font is installed or an existing one removed, running applications pick that up automatically and in a matter of seconds, without needing a restart. Implementing this feature involved changes way higher in desktop stack, and surprisingly, no changes in the text stack itself: the text stack already provided the API needed to build this feature on top of it for years, but it was never used. The changes required where in fact in GTK+ and gnome-settings-daemon. The way this works is:
Automatic font installation: Building on top of the previous feature, this feature involves automatic detection of missing fonts, and installing them. Minimal changes in the text stack were required. The main changes were in RPM and PackageKit. The way this works is:
Streamlining font packaging: The Fedora Font SIG rightfully recognized that it is actually painfully hard for font packagers to write the correct fontconfig configuration for their font packages. Indeed, no one really knows what the correct configuration really is. It also does not help that many font packagers want their font to be used as the default font for their language. So, during the past year the SIG has been busy drafting the Fedora font packaging guidelines and getting them passed officially. A major font package cleanup followed. This is still a work in progress, but a step in the right direction on the distro level.
CJK problem is an artifact of the Unicode Han Unification. That is, the fact that the same Unicode character is used for all three of Traditional Chinese (used Hong Kong and Taiwan), Simplified Chinese (used in mainland China), and Japanese (a variation of Simplified Chinese originally). The three languages, while sharing the same ideographs, require different visual rendering of the shared characters, making correct font selection critical for legible rendering of text in this family of languages.
Moreover, users of these languages typically have different requirements for rendering Latin than the rest of the world. For example, while Indic or Arabic users prefer their Latin text to be rendered using the default Latin font on the system, CJK users want the Latin to be rendered using the same font used for CJK. This is because CJK characters are very complex drawings and must be rendered using handcrafted bitmaps to be legible at small sizes. Such bitmap glyphs simply look ugly adjacent to antialiased Latin glyphs.
Inherent to the CJK problem is also communication failure. CJK is a huge and still emerging market, affecting over one billion of the world's population. Yet it is hard to find two native field experts that can agree on the very basics of how the fonts should look on screen. So far the burden has been falling on fontconfig and Fedora Font SIG maintainers to explore possible solutions and implement them. But we are not there yet. To fix this problem, we need to go back to the design stage and re-design how fontconfig configuration is supposed to work. Fontconfig configuration idioms need to be extended and the new idioms documented and implemented across all font packages.
ACTION: Understand and document the roots of the problem,
extend fontconfig and Pango as necessary to be able to address the problem,
document idioms for font configuration in Fedora, and update all font packages
to use the new guidelines.
STATUS: Behdad to read CJKV Information Processing, 2nd edition.
Indic problem is rooted in the fact that over a dozen of scripts used in India are all implemented using a single shaper driven by different data-tables. This makes a lot of sense from a design point of view since the scripts are very similar in the way they are encoded in Unicode. However, each of them does have delicate differences in how certain common characters interact with the others and that has made it hard to fix bugs in one script without breaking others. The Indic shapers in both Pango and Qt were ported from the one in ICU, so this problem is common to all available free-software Indic shapers. It is practically impossible to fix the tens of outstanding Indic bugs without first merging all the available implementation and also developing an exhaustive test suite.
Moreover, the Open Type Indic standard was also so complex and hard to implement correctly that Microsoft moved to a new Indic standard in Vista. There is currently no free implementation available for the new standard.
ACTION: Merge the three Indic shapers into one as part of the HarfBuzz shaper merger with Pango. STATUS: Jonathan Kew of Mozilla will do this as soon as the new shaping infrastructure in HarfBuzz is in place.
ACTION: Develop an extensive Indic shaping test suite, as part of a larger, HarfBuzz-wide, shaping test suite. STATUS: A high priority item after the basic new shaping infrastructure in HarfBuzz is in place.
Develop an Indic shaper for the new OpenType Indic standard.
STATUS: Not planned currently.
Understand the scope of the problem and design a solution in Pango.
STATUS: Behdad to work on the understanding part this week.
Experience shows that if module X needs to use library Y, it would make for much better code if Y developers implement that in X and submit the patch to X maintainers for review, than the common practice of X developers implementing Y support in X based on available documentation (which is always incomplete anyway). With that in mind, we as the text stack maintainers need to reach out upward to applications across the desktop whenever we add new features. For example, if a new font selector and dialog are designed for GTK+, we need to cooperate with OpenOffice.org, The GIMP, etc to make them provide users with the same enhanced experience.
In this section we will identify areas that can benefit from immediate technical attention to advance the user experience with text rendering on the free desktop. We also need to start thinking about more integration issue and seek longer term vision for improving the text rendering experience.
There are also issues that do not directly affect text rendering in the context discussed so far, but are closely relevant and require some of the same expertise to address:
2012-12-18: Add note re Windows font fallback, brought up by Pekka Pihlajasaari.
2010-10-18: Add note about Emacs to Terminal Emulators section. Mention fontik.
2009-07-05: First public version as presented at the Gran Canaria Desktop Summit.