Issues Regarding the Use of Unicode
Brief History of Personal Computers and Internet
· Personal computer revolution started around 1980. Computers have become very powerful since then.
· Internet use by common man started about 15 years ago.
· The world is dramatically changed for ever with these two developments.
· Earlier personal computers used mostly English/Latin fonts with 255 characters each; out of which 32 characters were control characters.
· Macintosh computers introduced around 1982 started using a variety of English/Latin fonts.
· Soon it became possible to use other languages on the Macintosh computers, but the characters had to be located at same 255 slots as English/Latin.
· It also became possible to use different languages on MS Widows platform, by the introduction of Windows 3.1.
· For many languages the limit of 255 characters was a hindrance, thus arose the need for other standards. However, many different standards became prevalent.
· It soon became obvious, that there was a need for a unified one standard for all languages of the world. This led to the formation of Unicode consortium.
What is Unicode?
· The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard, which specifies the representation of text in modern software products and standards.
· It is the accepted international standard that includes support for all major scripts of the World and is adopted by all current major computer operating systems.
· This is a 16 bit standard that allows use of 65535 characters in one font. It also has support for major Indic (Indian) scripts that include Devanagari (Hindi, Marathi, Sanskrit), Bengali (Bengali, Assamese), Gurmukhi (Punjabi), Gujarati, Oriya, Tamil, Telugu, Kannada and Malayalam.
Unicode Implementation by MS Windows XP & Other OS
· Microsoft Windows XP has full support for Indic scripts, including Gurmukhi. All other current OS are doing the same.
· All future development regarding scripts will be based on Unicode.
· The implementation of Indic scripts by Unicode has been done in consultation with the Indian government and it is done in such a way that phonetic transliteration between Indic scripts will be easy as code points for corresponding characters are well defined and because half characters, subjoined forms and conjoined characters are implemented as substitutions (this functional implementation is done with use of Halant).
Unicode scheme for Indic scripts
· Each script has been allocated its own block for characters.
· Characters of Indic scripts are allocated locations corresponding to Devanagari, but in their own blocks.
· There is no allocation for half characters or subjoined forms (paireen-characters), or for conjoined forms. However, those are implemented as functions (by use of Halant). Devanagari reph is also implemented as a function (see below on how to type paireen-characters/half forms).
· Unicode consortium also provides information on the unique requirements of each language so that the font-makers can provide built-in substitutions within the Unicode fonts, and the font-engines can provide the required functionality. Examples: Half/Paireen characters; placement of Sihaaree.
· Many font-functions are actually implemented by font-engines.
Advantages of Unicode relating to Gurmukhi script
· Viewing Documents: Documents or web-pages made with Unicode text, when viewed with an appropriate program or web-browser on a computer that has support for Unicode, will always be viewed in the right script even if the font in which the documents or web-pages are made is not installed into the system; just as English text is always in English, even if the font in which it is made is missing.
· Permanence: This is the international standard and will remain so for the future . . . perhaps as long as humanity lives. The standard may improve with some additions and adjustments, but there will be no drastic changes to disable the standard that is already developed. Thus, any work that is done in Unicode fonts today, will not require any significant modification in the future.
· On the internet, only the information prepared with Unicode fonts will be properly analyzed by the search engines. Thus it will be possible to search the internet in Punjabi, if the information uses Unicode standard. This is not a small a achievement.
· In the future, no major computer company will support any other font standard. Thus, all other standards will become obsolete.
· The use of Unicode makes the following possible:
File Naming: One can name files and folders in Gurmukhi and other scripts at the same time.
Searching: Search documents and web pages in Gurmukhi.
Sorting: Sort (alphabetize) Gurmukhi text with ease.
Exchange data without having to worry about fonts.
Avoid the hassles of upper-case lower case and spacing problems that happen when many available non-Unicode Gurmukhi fonts are used.
· Unicode fonts are made in such a way that many typing issues become very easy:
You will need to type only one form of Addak or Tippi and the proper form will automatically be substitued.
Paireen-Bindi will also appear at the right place even when used under characters that normally are not used with Bindi and so will the Laga-Matras under the paireen characters.
A typing mistake of double Laga-Matra will become highlighted for you.
Typing paireen/half characters is very easy. It is done with the use of a halant character before typing a full character. This makes life so easy, especially when typing Hindi.
· Key-board input method can be easily changed. Each user can type according to his/her choice.
· When all non-unicode fonts are converted to Unicode, there will be a huge variety of fonts that will be easy to use, without any worry about loss of information.
· When using Unicode fonts one can easily mix languages without any worry of losing information when a different font is applied later on.
· The find/search and replace text will appear in the proper script when one is using Unicode standard. This is true about many other word processing functions when applicable.
· Searching the pdf (Acrobat Reader) files made with Unicode fonts can be done in the script in which the files are made . . (Adobe needs to do more work on it).
Logic dictates that we should follow and promote Unicode standard only.
Issues Relating to Use of Unicode Gurmukhi script
· Migration to Unicode may not be painless as one has to adopt to new ways, however, it is not a big deal.
· And for editing purposes, one has to have a software that has support for Unicode. For example, to edit Gurmukhi Unicode text on Windows XP, MS Word 2003 becomes a necessity. However, there are cheaper and free alternatives as well.
· Conversion of documents already made with non-unicode fonts will require expert help (by most users). But that is only one time effort.
· Variety of fonts: Although, a number of Gurmukhi Unicode fonts are currently available, more are desirable.
· Gurmukhi Unicode standard has some minor deficiencies, only for representation of Gurbani and old Gurmukhi text.
· Activation: Indic Unicode fonts are not automatically activated in Windows XP, the user has to do that.
The Issue of Improving Unicode Gurmukhi Standard
· There are some issues with Unicode Gurbani text display, but we have implemented reasonable solutions that already exist and those do work.
· Udaat character (a kind of quarter Hahaa in Gurbani) and Yakash (equivalent to half Yayaa, but a paireen character) have been accepted by Unicode for incorporation.
· Half characters are not part of the character list, but are implemented as a function. That is really not an issue, although often gets perceived as an issue due to lack of understanding. Any half/paireen character can be implemented in any Unicode Indic font. This is how it is done in Hindi for so many half characters and other related forms.
Available Unicode Gurmukhi Fonts
· Raavi font is a part of Windows-XP.
· Arial-Unicode MS font is usually bundled with some software and has most characters of the world, including Gurmukhi characters and very good Hindi characters.
· Tahoma is another font from Microsoft that also has Unicode Indic scripts including Gurmukhi and Devanagari.
· Saab font can be downloaded from the internet.
· Dr. Thinds fonts at: http://www.gurbanifiles.org and on the Gurbani-CD-Uni.
· In the future you can expect more of Unicode fonts from Dr. Thind. Hopefully, many others will make Unicode Gurmukhi fonts as well.