From Egyptian rubbish heaps to your computer screen (or whatever)

Bringing papyri to a display device near you!

By Tim Finney, Religion and Technology Center, 2003-03-10

Introduction: The three questions

§1. People of old were wont to ask three questions: What do you think? Why do you think so? What difference does it make? Today I will tell you what I have discovered about publishing ancient documents on the World Wide Web. At the end, I will venture to speculate about the difference it makes. As to why? Who knows?

Discovery: Recovering what was lost

§2. We are indeed fortunate to have recovered so many ancient documents from the sands of Egypt over the last century. It would be great to know what proportion of the ancient world's literature we now have. A small fraction? A significant proportion? Who knows?

§3. I am glad that our predecessors expended such efforts to bring these treasures to light. It is now upon us to ensure that they remain in the light.

Protection: Securing for posterity

§4. I have heard of papyrus being burnt as incense and of a ship full of recovered papyri burning then sinking. Happily, incidents like this are less common today. Nevertheless, we must not be content to preserve a papyrus, mount it between glass plates, edit its contents, then place it in a drawer. The aging process continues, and will eventually turn any papyrus to dust and ashes just as surely as if it had been consumed by fire.

§5. We now have another means of preserving our inheritance: A system conceived to withstand an atomic holocaust might be able to save the literary contents of our papyri from any further combustion. So the story goes, the Internet began as a Defence Department effort to find a communication system that could survive a conflagration by virtue of its great redundancy. As to whether the electronic medium will manage to survive as long the papyri we seek to preserve, who knows? Even so, translation into the virtual world of bits and big-endian bytes has merit besides the promise of preservation.

Metamorphosis: Transforming for utility

§6. I will now demonstrate what is involved in preparing a papyrus for life on the Internet. Before beginning, I would like to make some general recommendations.

USE STANDARDS!

§7. I apologise for shouting, but this needs to be emphasised. Some of the appropriate standards are:

XML and related standards such as XSL
The Text Encoding Initiative, which provides mark up guidelines written by academics for academics
Betacode, which allows you to enter Greek easily
Unicode, which allows you to display Greek and nearly every other written language on earth.

§8. There are probably other standards that I should be aware of. Is there a standard on archival imaging that is appropriate for papyri? As a general principle, if there is an appropriate standard, use it. You will be glad you did.

Use open source software

§9. There are various approaches to obtaining the necessary software:

Buy off-the-shelf XML-aware digital library software. (Got a spare $50,000?)
Hire an SQL programmer to create a system that will almost certainly not be as functional. (Got a spare $50,000?)
Use open source software and command one of your graduate students to make it work. (You might have trouble finding someone who can read Greek and program at the same time.) Open source software is often as good as expensive commercial software, and it's free.

Demonstration of the digitisation phase

§10. I am presented with a papyrus fragment that seems to be from a codex. The first step is to produce digital images of the two sides. The images should be of archival quality, which means scanning at a resolution somewhere between 300 and 600 dpi. You may even produce multispectral images (i.e. a set of narrow band images taken at a series of wavelengths in the infrared to ultraviolet range). Multispectral images can be used to recover faded text and to differentiate between scribes who used different ink compositions. A team from Brigham Young University specialises in this kind of imaging, and might even do it for free.

§11. My first impression is that this is a Christian text because it contains what appears to be a nomen sacrum. I also notice a few words that indicate that the text is about chains and an Italian. With some good fortune, I manage to identify this as part of a New Testament manuscript containing the Acts of the Apostles. Armed with this information, I am ready to make a Betacode transcription:

[anaxwr]hsa[ntes elaloun]
[pros al]lhlous oti [ouden]
[qanato]u h desmwn [acion]
[prassei] o anos outo?[s]
[ei mh ep]ekeklht[o *kaisa]
[ra kai ou]tws ekri[nen o]
[hgemwn] auton an[apem]
[! ! ! ! ! ! ]! [! ! ]! [

[! ! *ale]c?and[rinon pleon]
[eis th]n? *italian e[nebibasen]
[hmas] eis auto: bra[duplo]
[ounte]s? en de i+kan[ais hme]
[rais kai] molis gen[omenoi]
[kata t]hn *knidon [mh pros]
[ewnto]s? hmas? t[ou

§12. There are two ways to display this as Greek. The first requires a font that displays Betacode as Greek characters. You will need to use a program to convert the Betacode to something that the font will render correctly. In addition, every user will have to obtain and install the font on his or her computer. This is not a trivial exercise, so this approach will immediately reduce the the potential size of your viewing audience.

§13. The second approach uses a program to convert the Betacode to Unicode, then relies on the user having a web browser that is modern enough to display Unicode correctly. The latest browsers (e.g. Internet Explorer, Netscape, Mozilla and, for Mac OS X, Safari) do quite well with Unicode. Some will complain that not everyone has the latest generation of browsers. Even so, the Unicode approach has the best long term prospects because it is standards based.

§14. The last stage is to wrap the transcription in TEI XML. I will not bore you with the details. The TEI has a way to tag nearly everything a scholar needs to tag, perhaps not exactly as the scholar would prefer it, but nevertheless sufficient in almost every case. Using TEI is tricky. Firstly, you need to learn enough to do things like tag scribal corrections, apparatus entries for orthographic variants, and so on. Hardly anyone knows the full TEI. (Would anyone here venture to guess the number?) Fortunately, there are others who have already walked this path. If you manage to obtain an XML transcription from Perseus, you will see how they use TEI to mark up Greek manuscripts.

§15. You will also need software to edit and validate XML files and XSL stylesheets (e.g. Emacs), along with an XSLT processor (e.g. Apache's Xalan), a set of XSLT stylesheets and a good operating system to run it all. (I prefer Linux.) Stylesheets are used to convert XML into a form suitable for display on whatever display device you happen to be using, whether a web browser, a cell phone, a Personal Digital Assistant or something that is yet to be invented. Sebastian Rahtz of the Oxford University Computing Service has written a suite of stylesheets for TEI XML. These provide a good starting point for your own stylesheets.

§16. The combination of XML and XSLT is the best known strategy to future-proof your transcriptions. It requires whoever is doing the work to learn enough TEI, XML and XSLT to be dangerous. This requires a lot of reading and experimentation. The latest edition of TEI is a must. Ask around for good books on XML and XSLT. Also, join discussion lists such as TEI-L and the UNICODE mailing list at the University of Kentucky. Do not expect your graduate student to learn what is required overnight or even within three months. You may have to hire a consultant to get the basic infrastructure in place.

§17. On the other hand, you could stick to HTML, which only requires your worker to learn HTML and not the TEI, XML and XSLT trilogy. You can produce an acceptable digital library in this manner. If you take this approach, there is a fair chance that your transcriptions will eventually have to be reworked into XML in order to remain useful for future generations.

Transcendence: Reaching the ethereal plane

§17. The final stage is to set up a web server running a digital library system so that your transcribed and imaged papyri can be released into the ether. I have experience with three such systems. The first was a commercial system that is superior to the other two with respect to its handling of XML and queries. A licence for that one cost $50,000 and you need to be a programmer to use it.

§18. The second one was a public distribution (i.e. free) of the Perseus system created at Tufts University. It took me a while to work out how to install this system. Unfortunately, the search mechanism doesn't work in the public version, so it is only good for browsing. Nevertheless it does the browsing part well, and allows you to choose between your own fonts and Unicode. Here is what the interface looks like for an experimental version of the Duke Databank, which I have been developing for John Oates.

§19. More recently, I have been using the Greenstone digital library system produced by the University of Waikato in New Zealand. (Hooray for the Kiwis!) This is easy to install and reasonably straightforward to use. It is open source (i.e. free) software that runs on Windoze, Linux and Mac OS X. Here is an example from the Corpus of Literary Papyri, which I have been developing for Dirk Obbink of Oxford University.

§20. The search mechanism works although it won't accept Unicode queries unless you extend the source code to do so. An alternative is to include three versions of each transcription:

Accented Betacode
Orthographically normalised Betacode with diacritics, line breaks and everything else that is not text stripped out
Unicode

With this arrangement, a person can type a query in Betacode and see matching documents displayed in Unicode. However, the two Betacode versions will be displayed as well.

Conclusion: What difference does it make?

§21. We have skimmed across the top of the deep and churning sea of imaging, transcription, Unicode, HTML, XML, XSLT and digital library systems. But what difference does it make? What will be the outcome of being able to read Menander on your wireless device? For a start, universities had better ban these devices from exam rooms. As far as more erudite outcomes are concerned, who knows? Perhaps ancient literature in the ether will serve as a cosmic countermeasure to the likes of WWF, Fox News, stupid politicians and Amazon Survivor? We can only hope.