State of Text Rendering

By Behdad Esfahbod <behdad behdad org>
Last major update: January 18, 2010
Last minor update: December 18, 2012

Disclaimer

At the time of writing the initial version of this paper, the author was working for Red Hat's Desktop team and has been involved with GNOME and Fedora for a long time. He has been a developer and/or maintainer of many modules discussed in this paper at various times, including fribidi, fontconfig, harfbuzz, pango, cairo, and gnome-terminal.

Introduction

Text is the primary means of communication in computers, and is bound to be so for the decades to come. With the widespread adoption of Unicode as the canonical character set for representing text a whole new domain has been opened up in a desktop system software design. Gone are the days that one would need to input, render, print, search, spell-check, ... one language at a time. The whole concept of internationalization (i18n) on which Unicode is based is all languages, all the time.

The Free Software desktop has been rather late to the Unicode bandwagon, but in the past ten years all the major pieces have gathered together and nowadays, on a modern GNU/Linux distribution like Fedora, one cannot easily get anything other than Unicode working.

Internationalization and Unicode text processing are about more than just rendering text on the screen. However, in this paper we focus on the specific problem of text rendering, ie. from input Unicode text to pixels lit on the screen. We will discuss the current architecture, identify problems that have limited progress in recent years, and propose actions to be taken to remedy them.

While there are multiple text rendering stacks available in the Free Software world and even on a single GNU/Linux desktop, in this document we focus on the GNOME text rendering stack and the Fedora Project where it comes to distro-specific issues. Fedora and Red Hat have been showing leadership in advancing the text stack for years, and other distributions have been fast adopting these new technologies. We expect that to remain the case for the years to come, although it would be nice to see other distributions / communities start contributing more closely to the parts of the stack we all share.

This document is a draft working-copy paper. It is a roadmap of where we are now and where we want to be, and will be updated as we get there.

Status Quo

When we talk about the text rendering stack, we really mean a collection of separate modules sitting on top of each other:
FreeType
Performs font rasterization. Given font data (file or data in memory), it does simple (non-complex) mapping of Unicode characters to glyph indices and rendering glyphs to images.
Fontconfig
Performs font selection based on a pattern of desired font characteristics. These characteristics typically include a family name, style, weight, slant, size, as well as language. Font configuration happens by way of a set of very expressive XML rules. Fontconfig uses FreeType to inspect fonts and caches the results in an mmap()able architecture-specific binary cache.
FriBidi
GNU FriBidi is an implementation of the Unicode Bidirectional Algorithm. Pango uses FriBidi and has an internal copy of it. AbiWord is the other major user of FriBidi. Many other projects use FriBidi as the simplest route to add support for Hebrew and Arabic scripts without adding support for a full complex text rendering engine.
HarfBuzz
HarfBuzz is the meat of the modern GNU/Linux text rendering stack. With OpenType emerging as the universal font format supporting complex text rendering, HarfBuzz, as an OpenType Layout engine, is where all the magic happens. In fact it is of such importance to the stack that it deserves an entire section of its own in this document.
Pango
Pango is, for the most part, the roof of the text rendering stack. Components sitting on top of Pango (eg. GTK+) need not know about complexities of i18n text and are expected to simply use these opaque objects called PangoLayout's. Pango has been designed to satisfy GTK+'s needs for i18n text. However, Pango still provides a low-level API on which one can build their own layout engine. This is what Firefox, Webkit-GTK, etc do, but it has proved to be a cumbersome practice. We will expand on that later.
There are other modules that are not immediately relevant to text rendering but facilitate getting the text on the screen: The X render extension provides the basic support for caching client-side rendered glyph shapes in the X server and showing them on the screen. Glyphs are rendered by the client (ie. application) and uploaded to the X server which will then hash and only keep one copy of each image, but each client has to go through the render+upload phase regardless. There are various higher-level wrappers around the text-rendering functionality of X render: the old and semi-obsolete one being Xft. These days however, cairo does that job for the GNOME stack and Qt does it for KDE.

HarfBuzz

Traditionally fonts were a collection of glyphs and a simple one-to-one mapping between characters and glyphs. Rudimentary support for simple ligatures was available in some font formats. With Unicode however there was a need for formats allowing complex transformation of glyphs (substitution and positioning). Two technologies were developed to achieve that, one is OpenType Layout from Microsoft and Adobe, the other is AAT from Apple. These two technologies, plus TrueType and Type1 font formats, all were combined in what is called OpenType.

There are fundamental differences in how AAT and OpenType Layout work. In AAT the font contains all the logic required to perform complex text shaping (the process of converting Unicode text to glyph indices and positions). Whereas in OpenType, the script-specific logic (say, Arabic cursive joining, etc) is part of the standard and implemented by the layout engine, with fonts providing only the font-specific data that the layout engine can use to perform complex shaping.

The Free Software text stack is based on the OpenType Layout technology. HarfBuzz is an implementation of the OpenType Layout engine (aka layout engine) and the script-specific logic (aka shaping engine).

History

Originally the FreeType project implemented the OpenType Layout engine as part of the FreeType 2 project, however it was dropped from FreeType at the last moment when it was decided that OpenType shaping is not involved in rasterizing glyphs and hence is out of the scope of FreeType. The FreeType Layout (FTL) code was salvaged by Pango and Qt developers and kept in house for quite a few years. Owen Taylor developed an abstract buffer on top of the layout engine making it much easier to use.

Around 2006 Pango and Qt developers cooperated to reunify the layout engine again, and HarfBuzz was born as a freedesktop.org project. Initially it was just merging back the existing code and renaming it, but after various meetings, the plan to make HarfBuzz be a unified shaping engine was born and have been the goal since. HarfBuzz was relicensed (thanks to FreeType developers) to the old MIT license to rid it of the FTL advertisement clause.

In 2007 (?) TrollTech donated the Qt shapers to HarfBuzz under the same license as the layout engine code. This is the current state of HarfBuzz. At this time Qt ships with its own copy of HarfBuzz which is identical to the upstream HarfBuzz. Pango ships with its own copy also, but only uses the layout engine, and not the HarfBuzz shapers.

Since 2008 the author has been working on rewriting the layout engine to be more robust and use mmap()ed fonts efficiently, and that work is mostly done now. Next step is to design a user-friendly high-level API for the shaping engine and merge the Pango and Qt shapers and put them under the new API. This is a work in progress by Red Hat and Mozilla.

HarfBuzz is currently being used by Pango, Qt, the Linux port of Google's Chromium browser, as well as some smaller project. The grand plan is for it to be used directly by any code needing direct access to a portable and robust complex shaping engine. That would include toolkits, browsers, word processors, and design applications. We will expand on that in a later section.

Other Free Software Shaping Engines

ICU
ICU is the Internationalization Classes for Unicode, a library developed by IBM with existing ports in C, C++, and Java. It does a lot more than shaping, and is a huge library. That's perhaps the main reason why it is not used widely for shaping. The most notable users of ICU are the OpenOffice.org suite and Sun's Java implementation. It is highly probable that ICU will be ported to using HarfBuzz when HarfBuzz gets to production stage.
m17n
Mostly of academic importance, m17n is an internationalization framework that includes a shaping engine. Its most notable characteristic is that it is based on language- and script-specific shaping rules expressed as Lisp code. Latest versions of Emacs use m17n for complex text rendering.
SIL Graphite
SIL Graphite is a complex/smart-font technology parallel to OpenType Layout. In this framework, the font itself contains all the shaping logic and the engine has no language- or script-specific knowledge. This allows for developing fonts for minority scripts and languages without having to update the engine first. For established scripts though, there is not much reason to prefer Graphite over OpenType.

Consumers

One can loosely divide the consumers of the text rendering stack based on their varying demands and requirements:

The Problem

Over the past few years the Free Software text stack has made a lot of progress. When one looks at each piece, technical excellence is evident. For example:

However, when one stands back and looks at the stack as a whole, it is not something to envy. As a whole, we have not been making ground-breaking progress for quite a while. The last major progress was the move to client-side fonts itself which fueled a renaissance. Since then, it has mostly been bug fixing, cleanup, polish, small features here and there. Pretty similar to the GNOME2 status one would say. Indeed, the client-side fonts were first introduced in early GNOME2. What we need is the GNOME3 of text rendering, in time for GNOME3.

To those familiar with the text stack, it is hard to not see what is wrong. I believe there are two problems: 1) the current stack is good enough, so improving it stays low-priority for parties involved, and 2) what I like to call segregated efforts. By that I mean, for example:

One may even argue that the extremely modular design of GNU/Linux systems makes it painfully hard to expose a truly integrated solution, in many areas including text rendering. For example, the X architecture combined with client-side font rendering makes it close to impossible to optimize the pipeline to take advantage of all the possibilities exposed by modern GPUs, like Microsoft does for example. However, that excuse is irrelevant as it may be part of the problem statement, but it hardly is the answer.

Recent Advances

Only recently have the Desktop Team at Red Hat and the Fedora Font SIG started working on features that extend across the stack (vertically or horizontally):

Modern GNU/Linux desktops have become very complex systems. With technologies like D-BUS, PolicyKit, PackageKit and others spanning across the entire desktop, integration becomes a much harder problem, and the secret ingredient of a polished user experience. It remain our challenge to provide that experience when it becomes to i18n and text rendering.

User/Customer-facing Issues

While it is easy to understand the problems identified, it may be hard to justify working on fixing them from a business point of view. However, the following user- and customer-facing issues can all be tracked down to the mentioned problems:

CJK Problem

CJK problem is an artifact of the Unicode Han Unification. That is, the fact that the same Unicode character is used for all three of Traditional Chinese (used Hong Kong and Taiwan), Simplified Chinese (used in mainland China), and Japanese (a variation of Simplified Chinese originally). The three languages, while sharing the same ideographs, require different visual rendering of the shared characters, making correct font selection critical for legible rendering of text in this family of languages.

Moreover, users of these languages typically have different requirements for rendering Latin than the rest of the world. For example, while Indic or Arabic users prefer their Latin text to be rendered using the default Latin font on the system, CJK users want the Latin to be rendered using the same font used for CJK. This is because CJK characters are very complex drawings and must be rendered using handcrafted bitmaps to be legible at small sizes. Such bitmap glyphs simply look ugly adjacent to antialiased Latin glyphs.

Inherent to the CJK problem is also communication failure. CJK is a huge and still emerging market, affecting over one billion of the world's population. Yet it is hard to find two native field experts that can agree on the very basics of how the fonts should look on screen. So far the burden has been falling on fontconfig and Fedora Font SIG maintainers to explore possible solutions and implement them. But we are not there yet. To fix this problem, we need to go back to the design stage and re-design how fontconfig configuration is supposed to work. Fontconfig configuration idioms need to be extended and the new idioms documented and implemented across all font packages.

ACTION: Understand and document the roots of the problem, extend fontconfig and Pango as necessary to be able to address the problem, document idioms for font configuration in Fedora, and update all font packages to use the new guidelines.
STATUS: Behdad to read CJKV Information Processing, 2nd edition.

Indic Problem

Indic problem is rooted in the fact that over a dozen of scripts used in India are all implemented using a single shaper driven by different data-tables. This makes a lot of sense from a design point of view since the scripts are very similar in the way they are encoded in Unicode. However, each of them does have delicate differences in how certain common characters interact with the others and that has made it hard to fix bugs in one script without breaking others. The Indic shapers in both Pango and Qt were ported from the one in ICU, so this problem is common to all available free-software Indic shapers. It is practically impossible to fix the tens of outstanding Indic bugs without first merging all the available implementation and also developing an exhaustive test suite.

Moreover, the Open Type Indic standard was also so complex and hard to implement correctly that Microsoft moved to a new Indic standard in Vista. There is currently no free implementation available for the new standard.

ACTION: Merge the three Indic shapers into one as part of the HarfBuzz shaper merger with Pango. STATUS: Jonathan Kew of Mozilla will do this as soon as the new shaping infrastructure in HarfBuzz is in place.

ACTION: Develop an extensive Indic shaping test suite, as part of a larger, HarfBuzz-wide, shaping test suite. STATUS: A high priority item after the basic new shaping infrastructure in HarfBuzz is in place.

ACTION: Develop an Indic shaper for the new OpenType Indic standard.
STATUS: Not planned currently.

Latin Problem

Latin problem is not really inherent to Latin, but it refers to the problem that currently it is hard or impossible to use high-quality, usually very expensive, fonts that have many different styles. This is not a high priority issue for using the desktop for day to day purposes, but is a real showstopper for graphic design applications as well as word processors. This is mostly a Pango limitation.

ACTION: Understand the scope of the problem and design a solution in Pango.
STATUS: Behdad to work on the understanding part this week.

Road Ahead

Experience shows that if module X needs to use library Y, it would make for much better code if Y developers implement that in X and submit the patch to X maintainers for review, than the common practice of X developers implementing Y support in X based on available documentation (which is always incomplete anyway). With that in mind, we as the text stack maintainers need to reach out upward to applications across the desktop whenever we add new features. For example, if a new font selector and dialog are designed for GTK+, we need to cooperate with OpenOffice.org, The GIMP, etc to make them provide users with the same enhanced experience.

In this section we will identify areas that can benefit from immediate technical attention to advance the user experience with text rendering on the free desktop. We also need to start thinking about more integration issue and seek longer term vision for improving the text rendering experience.

There are also issues that do not directly affect text rendering in the context discussed so far, but are closely relevant and require some of the same expertise to address:

Revision History

2012-12-18: Add note re Windows font fallback, brought up by Pekka Pihlajasaari.

2010-10-18: Add note about Emacs to Terminal Emulators section. Mention fontik.

2009-07-05: First public version as presented at the Gran Canaria Desktop Summit.