TKSESH
A hieroglyphic database system
Équipe EC.art, laboratoire LRIA, Université
Paris 8
Abstract
We introduce tksesh, a multiplatform editor, database and dictionary
software. Tksesh is both intended to be a toolkit to build applications
for the philologist and an example of such application. The core of
tksesh is a hieroglyphic editor which understands "Manuel de codage"
encodings. The edited texts can be saved in a database, and referenced
by the dictionary system, via hyperlinks. The dictionary can handle
complex definitions by multiple authors. Most of the text in the dictionary
has precise meaning, not only for the reader, but also for the computer.
This allows automated treatments and possibly complex searches. An
important feature of the dictionary is that it can contain references
to the text database, in a readable way, and that clicking on these
pops up the referenced text. Of course, exhaustive searches in the
database text are also an option. The Tksesh system is written in the
Tcl/Tk language, which is freely available for both Windows, Mac, and
Unix systems, allowing it to be very portable.
Introduction
The system we introduce here, called TKsesh, is a multi platform system
(it works under windows 95, Unix, and should work on macintoshes as
well) built around a hieroglyphic editor (compatible with the manuel
de codage), and a database engine.
We started working on it quite a long time ago, but at that time it
was a rather secondary work. Yet, it appeared that the system was potentially
useful. Thus we decided to develop it further. What is presented here
is a preliminary version of the software. We look forward for opinions
and criticisms to improve it.
Context and goals of the system
While working on our computer-science thesis, whose subject was automatic
syntax analysis applied to middle Egyptian, the question of what was
to be done with the texts we worked on was left in the background.
However, we ended up with the idea of an integrated environment which
would allow to store most information one reads and produce while working
on a text, to share them, and, most important, to retrieve them. The
system could also be a testbed for the Natural Language Processing
systems we worked on.
The goal of Tksesh is both to help the realisation of a complete database
of Ancient Egyptian texts, and to be a working tool for its user, where
one can keep notes, lexical files, and so on.
The components of the system
Tksesh is currently an integrated system : a number of quite different
elements bound together in one program. We will now proceed to describe
these elements, starting with the editor.
The editor
The hieroglyphic editor is a central element of tksesh; its use will
be described here, and information on its code will be given in the
system's documentation. The editor's primary function was typing texts
for databases, not for printing. So our primary goal was ease and speed
of text typing, versus accuracy of printed representation.
typing text
As any hieroglyphic editor, tksesh allows entering the signs by a menu
or by code. Simple sign grouping can be done by use of the Manuel de
Codage symbols ":" and "*", which allows fast typing; but complex grouping
is done by menu. So it is impossible to enter incorrect codes in the
system. The strong point of Tksesh as far as typing goes is that it
is tolerant about codes. When a transliteration is typed, if the sign
is not the expected one, a press on the spacebar will propose a new
sign. For example, if I type "mr", I'll get
. If
I want the pyramid-sign (O24), I'll press "space" a few times, and
get
. Next time I'll enter "mr", the system will
remember it and propose O24 first. Even better,
when the list of possible signs is exhausted, the system looks in the
dictionary, for words having the said transliteration.
Thus, in the present state of the dictionary, typing "iw" and spacebar
will propose : 









.
Grammatical informations
Supporting the grammatical codes of the Manuel de codage was
essential for a system whose primary goal was text databases. However,
there's a problem with the manipulation of these codes. The display,
and in general the way a user manipulate the text, is cadrat-oriented.
Words separations and grammatical separations are sign oriented. Hence
we have two problems : the first is to design how the user will manipulate
the system to add these informations, and the second is how the system
will extract words and the like. At the time being, grammatical markers
are indicated by sign colors. Blue signs indicate word endings, yellow
signs grammatical markers, and green signs word endings that are also
grammatical markers (see Figure 1).
FIGURE 1 : WORDS ENDINGS IN LOUVRE C14 STELA
Currently,
the user can only change the status of the last sign of a cadrat, typing
"/" to make it a word ending, and "=" to make it a grammatical ending.
If the marking is done at the time the text is entered, this is not
a problem. However, it becomes one when the marking is done a posteriori.
In these cases, one has to break the cadrats and rebuild them after
marking. It is not convenient. Next version will include a sign-by-sign
navigation mode, which will allow to navigate one sign at a time, and
thus to change the status of the current sign.
A related problem is the difficulty to cut and paste a word.
This possibility is highly desirable, since it would allow, for example,
to add commands like "find the current word in the dictionary", enter
the current word in the dictionary, and so on. To do this properly,
we have to
- be able to designate the current word --- and for this, the good
unit is the sign, not the cadrat ;
- be able to build a cadrat from parts of a cadrat.
This is not currently possible, but should be soon.
References
As our goal was to build an "intelligent" text database system, with
hypertext links all over the place, we needed a way to refer to particular
points in a text in an efficient way.
We thought that readable references, very much akin to those used while
referring to paper editions, would be fine. This has many advantages.
First, it allows the references to be used outside the base : we support
things like "O. DM 1567 verso, 2"
. A second point was
that long texts are not always entered from the first line of the first
page onward. In fact, you can start typing an interesting part of a
text, enter information in the base about its contents, and some time
later, decide, for completeness's sake, to type the rest. Our current
system allows to explicitly give the position of the current part of
the text. For example, in Figure 2, the whole
content of P. L2 wasn't entered. But as the first line is explicitly
"page 2, ligne 6", references to parts of the text will still
be exact, even if we type the first pages afterwards.
FIGURE 2 : EXPLICIT REFERENCES IN A TEXT
In order to create this system while keeping our files "Manuel de Codage"-compliant,
we used the comment system. The first lines of L2 look like this once
saved :
++TKSESH DATABASE FILE+s
++NAME Ptahhotep,L2+s
++COORDS=page 2, ligne 6+s
-i-r-wn:n-n:k\-m-s-.-sSm
Our main problem now with this reference system is to make
it really usable by the end-user. The interface to the reference-setting
system is probably not very user-friendly. As an example, I give in
Figure 3, a picture of the system for the
stela Berlin 1157, from an example file of Winglyph. The stela has
four zones : three called A, B, C; and the main text below, which is
not designated by a letter. So the text is separated in "lines" (which
could be columns), grouped into zones. The lines are simply numbered
(the value NUM for coord 1), and the zones (usually used for pages)
are separated into A, B etc. Hence the numbering system is A1, A2,
B1, C1, 1, 2, etc.
FIGURE 3 : REFERENCE CREATING SYSTEM
Further needs
The main shortcomings of tksesh's editor stand in the domain of presentation.
Improved cadrat rendering and column handling would be nice. More seriously,
a font editor is absolutely necessary. We have already written one,
but it runs only under UNIX, and thus can't be integrated in the whole
system.
Another interesting addition would be the support of multiple reference
systems. It might be interesting, for example, to be able to chose
between a reference in the original source, or a reference in an edition
(That is, for example, between P. Leyde I 350 verso 13 and
KRI II,813,3).
The dictionary
Introduction
When we decided to transform tksesh into a work environment for studying
texts, the need for a linked dictionary arose naturally. In the first
version of the dictionary, entries were quite simple : three fields
: "transliteration", "spelling", "translation", the latter being free
text. We included also the possibility to add hypertext references
to the text database.
However, a real dictionary entry is something both complex and very
structured. Entering it as free text is not a very good option, because
the structure is lost to the computer. The human reader might be able
to reconstruct it, the system won't. Many automated processing that
would be possible with a well structured lexicon are then impossible.
On the other hand, giving a very precise and rigid form to the dictionary
would also be a problem, because it would force an artificial structure
on all definitions.
Last, but not least, the structure proposed should be extensible, but
no extension should break existing data.
Hence the structure we are going to describe now. This structure is
supported by the editor built in the dictionary, which prevents unstructured
entries to be made. It allows to enter many different style of dictionary
entries, while keeping the maximum amount of structure information.
We tested it by entering definitions from GARDINER's
lexicon, FAULKNER, HANNIG, and of P. WILSON's
A Ptolemaic Lexicon.
Structure of the dictionary
The fields in the dictionary are of roughly three types : base fields,
which contain one type and only one type of information (for example,
a transliteration), complex fields, that can contain mixed information
(for instance text in transliteration and hieroglyphs), and the group
and comment fields.
The group field
The group field is the main organizational device of the dictionary.
Groups can be nested, to represent sub-meaning of a words, derived
words, and so on. The basic point is that if a group contains multiple
fields of the same kind, let's say multiple transliterations, they
are supposed to be variants. In the case of translations, this would
mean near-synonymous meanings.In Figure 4,
all spellings for ip.t are supposed to be equivalent. In a
likewise case, the completion system described above
should be able to propose all these writings.
FIGURE 4 : REPRESENTATION OF VARIANT SPELLINGS
When some important information (transliteration, spelling, for example)
is not available in a group, it is supposed to be inherited from its
parent group.
Let's take, for example, the entry for Awi (Figure 5,).
The groups (indicated by french quotes << >>)
delimit a number of new entries. The first one is for the adjective-verb
meaning be long, which has the same transliteration as the
head word, but different determinatives. It inherits its transliteration,
but changes the translation and the spelling.
Then comes the expression "ib=f Aw". Expressions and composite words
are a tricky problem, whose representation might need some improvement.
At the time being, we have a number of tags that can be used in expressions
to represent the currently defined word, an animate, or an inanimate.
The other words might be free text, or explicitly transliterations
of words (in which cases they are indexed. For instance, in the current
case, a search for the word "ib" will retrieve this definition).
FIGURE 5 : COMPLEX DEFINITIONS
The comment field
In some cases, a dictionary definition can be a true little monograph
on a word. In this case, the dictionary entry structure is not very
efficient. This is the reason for the comment field's existence. It
is there for anything that can't fit in a definition. Many fields can
appear in it, like in a true little text editor.
References
References are hypertext links to the text database. They are readable
by the human reader, like in the example on the right, taken from Amenemope.
These links are made quite easily,
by using the "copy reference" menu option in the editor, and pasting
the reference in the dictionary. Afterwards, a click on the reference
will load the text at the proper place.
Signature
An important feature for further use of the system will be the possibility
to share texts and dictionary entries, and conversely to identify the
author of these entries. The Signature field is supposed to be used
for this. A further development would be to fill it automatically --
at least, at data exchange time.
Indexation and search
The system builds an index for each dictionary entry, into which it
writes references for each transliteration, spelling, and translation.
Any field of these types in dictionary entries is indexed, so even
sub-definitions are entered in the database.
An important practical point is that text is treated to ease the search.
For example, the hieroglyphs are save as Gardiner code, no matter their
original form, and only the list of signs is saved. So, someone looking
for "p*t:pt" and typing "p:t-pt" will find his word. The transformation
could even be improved by suppressing redundant phonetic complements
and the like, but this is future work.
Transliteration are also simplified for searches : all 'j' are made
into 'i', all points suppressed, etc. Note that this is only made at
search time. Any point entered in the dictionary entries will be retained.
Extensions
Looking at the current state of the dictionary, we see a possible generalization
: the same mechanism can be used for freer text, for example for notes
and the like. So we intend to reuse the code for the dictionary to
allow the edition of general notes, which will benefit from the indexation
mechanism of the dictionary.
Another interesting extension would be to add an reference system to
parts of the dictionary. This would allow referencing a definition
in another one. It would be very interesting in expression definitions,
as it would allow to reference in a precise and explicit way the words
which appear in the definition.
The transliteration and translation editor
This facility will be a central working point of the system once finished.
What we present now is just a model. It allows parallel edition of
the text translation and transliteration, which can be saved separately.
The advantage is that this allows multiple translations to be edited
for one text.
FIGURE 6 : TRANSLATION EDITOR
In
the editor, all texts are synchronized : the edited line is always
displayed in translation, transliteration, and hieroglyphs. We worked
a little with the model, and it seems to be quite suitable. We linked
it with the dictionary, and it is now possible to look for the word
selected in the hieroglyphic window.
Search facilities
One of the most important facility a database can provide is the possibility
to retrieve the information it contains. So our base allows to find
a words (given in transliteration) in the texts. For texts which have
been manually transliterated, it should look in the man-made transliteration,
because it's supposed to be accurate. But (as we'll see later), we
have a automatic transliteration program that produce a rough transliteration.
It's quite fast on a basic Pentium computer, and can be used if no
time is available to write a transliteration. The search made in this
case is not complete nor sure : some occurrences might be lost, and
some words founds can be errors. Yet, it can give a fast initial working
base, and it should improve with our transliteration system. In Figure 7,
we have the result of the search for the word Axt, and an
example of a solution.
FIGURE 7 : SEARCH RESULT
Natural Language processing
The system includes (currently only in its UNIX version) a prolog interpretor
which allows us to use natural language processing techniques more
easily. We had sketched in [ROS94] a transliteration
system. After having worked more on syntactic analysis problems, we
had a student, L. KERBOUL, working on the subject
. [KER97]. As the result was interesting, we decided
to work again on transliteration, and if possible, to end up with a
usable system. A detailed technical description does not fit here;
let's only say that the basic principle is still the one described
previously, but the performances have much improved. The main problem
now is word cutting, which is a difficult problem. For this, the best
solution would be an interactive one, the system proposing word-cuttings,
and the user changing them. It is, interestingly, the solution used
by some in editing Asian language (an example for Thai in MCB97)
Further developments
The developments axis of the system will be :
- improvement of the editing system
- improvement of the natural language processing system -- ultimately
with the incorporation of grammatical information
- creation of an exchange module --- to allow easy information sharing
and exchange, using disks or even the net.
Conclusions
The system I've just described is still in his infancy. Its foundations
are however quite sound, and it works well. Now what it needs is users,
to make it live, to propose friendlier interfaces, and useful additions.
Appendix
TCL and TK
TCL/TK is a programming language developped by John OUSTERHOUT
at Cambridge University (USA) and then in the SUN research department.
It is a simple, powerful, cross-platform language, specially designed
to be embedable in programs written in compiled languages like C or
Pascal. Tksesh is based on a number of extensions, written in C, to
TCL/TK. It runs under UNIX and Windows 95. Due to the flexible nature
of TCL, it is possible to use tksesh to write little application that
would need to display hieroglyphs.
Availability
The system will be available free of financial charges, as "textware":
if the system is of some use to you, please contribute some texts.
It would be definitly better to ask which texts are needed before sending
them. At the time being, the software is quite young, and has had very
few users. For this reason, I don't release it by simply putting it
on a ftp server. You will have to register first. The details about
obtaining the system will be available by next autumn on http://www.iut.univ-paris8.fr/~rosmord/EgyptienE.html.
References
- BIL95 S. BILLET, 1995,
Apports à l'acquisition interactive de connaissances contextuelles
Thèse de doctorat de l'Université Montpellier II
(another approach of automated translitteration).
- KER97 F. KERBOUL, 1997,
Translittération automatique des hiéroglyphes
Rapport de stage de l'ENSTA
- MCB97 S. MEKNAVIN, P.
CHAREONPORNSAWAT and B. KIJSIRIKUL, 1997,
Feature-based Thai Word Segmentation In Natural
Language Processing Pacific Rim Symposium 1997, Phuket, Thailand
- ROS94 S. ROSMORDUC, 1996,
Traitement
automatique du langage naturel en moyen égyptien In
Robert VERGNIEUX, editor, Xieme
conférence Informatique et Égyptologie
- ROS96 S. ROSMORDUC, 1996,
Analyse
morpho-syntaxique de textes non ponctués Thèse
de doctorat, École normale supérieure de Cachan,
Serge Rosmorduc