Jump to content
Eternal Lands Official Forums
Sign in to follow this  
Lachesis

Font definition file

Recommended Posts

Hi everyone,

 

As I mentioned previously, I'd like to move some font handling out of the code and add some convenience and flexibility along the way. The new mechanism will only be used for the new books at first, still I hope we can extend this later. The idea is to define the location and properties of each character in an xml file and use that info in the code instead of modifying the code each time fonts are added or modified. The intended format of that file looks like this:

 

<?xml version="1.0" encoding="UTF-8"?>
<fontdef>
 <font name="default" id="0" linesize="15" baseline="10">
   <texture src="font.bmp">
&lt;     1  48  9  9  8
Æ       70 179 12 13  3
&#197; 108 168 13 17 15
&#xF8;  36 174 11 10  8
ç       18 154 10 13  7
   </texture>
   <texture src="mathchars.bmp">
&#221E;  0   0 15  9  10
   </texture>
 </font>
 <font name="dense" id="3" baseline="10">
   <texture src="font3.bmp">
&lt;     1  48  9  9  8
   </texture>
 </font>
</fontdef>

 

Note: The UTF-8 encoding is mandatory.

 

The columns are from left to right: character, offset in pixels from the left and the top border of the texture, width and height in pixels, and the offset of the character's baseline to the top of the character in pixels.

 

The font's name will be used in the visual data format to specify the document font. Currently, "default" (font.bmp) and "ancient" (font2.bmp) are defined. The font's ID will be used for internal identification, currently the ID 0 through 3 (font.bmp, reserved [fontv.bmp], font2.bmp, font3.bmp) are used. The font baseline specifies the offset of the text baseline to the upper border (the starting border in block progression direction). It will only be used in horizontally progressing text lines.

 

These predefined entities can be used:

  • &lt; (&#60;) less than (<)
  • &gt; (&#62;) greater than (>)
  • &amp; (&#38;) ampersand (&)
  • &apos; (&#39;) apostrophe (')
  • &quot; (&#34;) quotation mark (")

For other characters that you don't want or cannot (because your editor doesn't support UTF-8)

include literally into the file you can use their decimal unicode equivalent like &#197; for Å or their hexadecimal unicode equivalent like &#xF8; for ø.

 

Note: The less than (<), greater than (>) and ampersand (&) symbols must not be included literally, they have to be masked using the above entities, otherwise the XML parser will fail to read the document.

 

Benefits of this method:

  • easier addition of characters
  • full unicode range available for display (in books)
  • full freedom how to distribute characters in textures
  • full freedom what characters to support in what font
  • using the same bitmap in multiple fonts possible

With Regards

Lachesis

Edited by Lachesis

Share this post


Link to post
Share on other sites

Good plan Lachesis. This is a good way to build up 'metafonts' from seperate files (cf. Java's logical vs physical fonts).

 

One thing though - if the full UNICODE range is available (potentially), then the codes representing font colour changes need to be moved to a set of reserved undefined values (eg CFF80-CFFFD) to avoid clashes.

 

Now if the missing letters bug can get fixed in the next release as well, I may be able to read what is going on (without needing 'tail -f' on the chat log). Somebody previously mentioned that compiling without optimisation fixed it, so its definitly due to client code, probably related to sequence points and undefined behaviour... :(

Edited by trollson

Share this post


Link to post
Share on other sites

Color codes will not be supported in book display. You will have to use the facilities of the visual data format to achieve colored display. For chat, if it ever is supposed to come (I've been told UTF-8 support not possible to do on the server, which I have to believe as I can't verify it), I planned to extend UTF-8 a little to support four trailing bytes and use range 0x03000000-0x03FFFFFF for full 24 bit RGB colors. The nice effect of this choice is that the leading byte will have the value 0xFB making color codes very easy to detect. Additionally, the high range is very unlikely to ever intersect with official unicode.

Share this post


Link to post
Share on other sites

Not sure why there would be a tech problem in supporting UTF-8. Data storage isnt an issue, since UTF-8 'char*' is guarenteed not to contain nulls (except a terminating null). The only difference is in counting characters (as opposed to bytes) in the string, and rendering. The basic Latin ASCII set (<0x7F) are preserved in UTF-8, and those byte values do not occur in multibyte characters.

 

(I'll be doing this conversion on some very legacy code in the next few months!)

 

UNICODE currently uses 32bit character codes, and has high plains reserved for vendor use which are capable of storing 24bit characters (which can therefore be used for colour codes). Hopefully your scheme will map the colours to one of these?

 

UTF-8 should be able to encode any UNICODE character, so should already be capable of encoding high value 32bit characters. Although older implementations may be limited to the previous 16bit UNICODE set.

Share this post


Link to post
Share on other sites

I'll make my own implementation :P

 

P.S. Theoretically the UTF-8 algorithm is capable of encoding characters of up to 36 bits length, however the official RFC only covers the range that can be reached by UTF-16, which is 0x000000-0x10FFFF. That's why using it to encode 26-bit numbers is actually an extension.

 

P.P.S. Using five bytes for a colour code is actually quite a waste, is there any available range of 28 consecutive characters in the two byte encodable range 0x080-0x7FF?

Edited by Lachesis

Share this post


Link to post
Share on other sites

Addition: Font tags should carry a baseline (for horizontal inline progression) attribute. It will carry the offset of the baseline from the starting border of the line box in block progression direction, which is the upper border in any written language I know of that uses baselines. The baseline will only be used when drawing in horizontal inline progression, that is e.g. in Polish and Hebrew, but not in Chinese languages.

Edited by Lachesis

Share this post


Link to post
Share on other sites

Here goes the XML Schema:

 

<?xml version="1.0" encoding="ISO-8859-1"?>

<!-- Image definitions for encyclopedia  -->

<schema
   xmlns="http://www.w3.org/1999/XMLSchema"
   xmlns:t="http://www.eternal-lands.com/xmlns/fontdef"
   targetNamespace="http://www.eternal-lands.com/xmlns/fontdef"
   elementFormDefault="qualified"
   attributeFormDefault="unqualified"
 />

<element name="fontdef">
 <sequence>
   <element name="font" minOccurs="0" maxOccurs="1">
     <attribute name="name" type="string" use="required"/>
     <attribute name="id" type="nonNegativeInteger" use="required"/>
     <attribute name="linesize" type="positiveInteger" use="required"/>
     <attribute name="baseline" type="nonNegativeInteger" use="optional" default="0"/>
     <sequence>
       <element name="texture">
         <complexContent>
           <extension base="string">
             <attribute name="src" value="string" use="required"/>
           </extension>
         </complexContent>
       </element>
     </sequence>
   </element>
 </sequence>
</element>

</schema>

Share this post


Link to post
Share on other sites

Fonts should be alble to optionally specify the width for each character. There is support in EL for variable width fonts already if a method to load the data is supplied.

Share this post


Link to post
Share on other sites

I know that. But it's very restricted and not very nicely done. I revise it in order to untie fonts and textures and to untie character and position, so that the number of characters per font is less limited, textures can be reused in or shared by multiple fonts, and characters can more freely be placed within textures, altogether saving space and giving more flexibility. Finally I'll make the code more transparent. I refuse to use current font handling for something as complex as the new encyclopedia typesetting :whistle:

 

P.S. Actually I don't revise anything, I rather implement my own font handling and use it only in books. For easier translation it certainly would be wise to change it everywhere in the client, but I don't think I will have the time for it, and since there is some resistance against UTF-8 on server side, that would not be as useful as it could be anyway.

Edited by Lachesis

Share this post


Link to post
Share on other sites
I know that. But it's very restricted and not very nicely done. I revise it in order to untie fonts and textures and to untie character and position, so that the number of characters per font is less limited, textures can be reused in or shared by multiple fonts, and characters can more freely be placed within textures, altogether saving space and giving more flexibility. Finally I'll make the code more transparent. I refuse to use current font handling for something as complex as the new encyclopedia typesetting :whistle:

206886[/snapback]

I was referring to needing to be able to specify font widths while redoing this. Variable widths is something already partially supported and we should get full support if we are going to be redoing this.

Share this post


Link to post
Share on other sites
Look at the first post, the width is only one of five attributes that are given for each character.

206892[/snapback]

Sorry, had missed that detail ... I do suggest then you use a more proper XML format instead of that cryptic block of data. That is what XML is for, begin able to define data in detail. I'd look at doing one XML tag per char with defined attributes

<char name="A" Width=24 ... />

 

Then you use the XML tools for loading all the data and there is no abiguity. You could even specify default values for width, etc in the tag above it.

Share this post


Link to post
Share on other sites

I used the text block because I didn't want too much bloat. For parsing it doesnt make any difference and for typing the textual format is easier. In case it ever needs to be converted it is simple enough so that the plain text doesnt make it more difficult.

 

P.S. Of course the format I choose is unambiguous too.

P.P.S. I think it's interesting that you start to argue about this now, this has been around as RFC for quite a while, now I have implemented it and so I don't really want to change anything substantial, as you probably understand.

Edited by Lachesis

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×