A keyboard underpins digital text production. So any effort to address the under-representation of Urdu and other regional languages on the internet, and in software as a whole, starts with the keyboard.
It is imperative that we do this now, as millions more in the region come online through the availability of smartphones.
The متنساز (matnsāz) keyboard is a rethink of how Urdu should be represented digitally. It takes ideas old and new, situates them in the historical development of Urdu and of the technological landscape, to reimagine what a digital representation of the language may be without compromises.
In an effort to test a variety of solutions, we are building a number of designs that will be made available to beta-testers.
The keyboard is currently being built for iOS.
Legible key layout
Any conversation with Urdu speakers – native or foreign, experienced or inexperienced, old or young – mentions how hard it is to find keys on an Urdu keyboard.
Sure QWERTY is illegible too, but it has had a century to get to the point where even more optimal layouts will not unsettle its position.
Most Urdu speakers have never typed very much at all, especially not in Urdu. Making a keyboard layout understandable is perhaps critical to allowing effective text production.
In addition to being laid out alphabetically, the keys also represent the letters in the nastaliq script.
We are in the process of determining how an alphabetical layout may affect typing speeds in comparison with a layout optimized for letter occurrence in the language.
Goodbye to the Shift key
The two-layer keyboard that is now synonymous with computing was originally built into typewriters to make the number of keys manageable for Latin script – the shift key allowed the typing of uppercase letters.
Even as smartphone keyboards become the primary text input interface for the world, moving Urdu keys out from behind the Shift key onto one plane is still an idea only used by one mainstream operating system.
Keys that represent the shape the characters will take
Arabic script's cursive nature means that each letter changes its shape depending on where it is positioned in the word.
Children are taught to think on two layers of abstraction: learning to identify the letters in their isolated form and then also in the forms they take in words. The letters م + ت + ن are for example, put together as متن. This is the elementary توڑ جوڑ (disassembly - assembly) exercise.
After building these connections, the isolated forms of letters are no longer the forms that are needed to form shapes. Representing the keys in the isolated shapes requires extra mental effort to find letters on the keyboard, and also makes it hard to debug erroneous text input.
A true multilingual keyboard for the Arabic script
Millions of Urdu speakers do not count Urdu as their mother tongue. Regional languages in South Asia that use the Arabic Script, such as Pashto, Sindhi, Balochi and Panjabi, are even more under-represented than Urdu in the digital sphere.
Most multilingual people end up having to find workarounds – such as using an Urdu keyboard with other letters tacked on in shift or hold interactions.
This design boils down the entire Arabic script to the essential letter forms (the رسم). The auto-completion software then chooses the right letter from the parent shape.
Language models can be plugged in for whatever languages the keyboard is being used for.
Individual letters can be added by holding the button.
Instead of manually having to add diacritics, the software can be used to add diacritics at all times, or to allow manual entry.
A collectively owned language model
While large technology companies often do a valiant job of serving the world's languages, they are not best positioned to preserve the cultures built into the language. Instead deferring to companies in the West only re-emphasizes a dependency that causes the world's languages to lose more of themselves to technology.
متنساز is built from the ground-up with the goal of allowing the community to contribute to and engage with the language models that underpin the keyboard.
A growing body of research showcases that biases encoded in artificial intelligence can manifest in ugly ways. It has also been shown that suggestions from artificial intelligence can influence human behavior. The preservation of a language requires careful curation of the materials used to power its autocorrect and autocomplete technologies – so as to preserve the essential features of the language without encoding unwanted bias.
Support the language models
If you own a collection of digitized text that can be used to power our language models, please do get in touch.
Get Early Access
Sign up to become a beta-tester and help influence the future of Urdu in the digital world: