XML and the Bible
The Blog of Nathan D. Smith
While working on an importer to bring the SBL Greek New Testament into Open Scriptures, I noticed some interesting features of the SBLGNT XML file. (I promised that I would try to exclude posts of a technical nature from this blog, but I am breaking that promise, because I think this technical discussion is interesting and applicable to Biblical studies.)
The SBLGNT's XML representation of the Biblical text makes an interesting distinction between tags which have child elements and childless tags. That is, normal XML tags encompass the actual Greek text and its structures (such as paragraphs and books), while childless tags represent insertions which are not original to the text. Here is a truncated Matthew 1:1 in the SBLGNT XML as an example:
<book id="Mt"> <title>ΚΑΤΑ ΜΑΘΘΑΙΟΝ</title> <p> <verse-number id="Matthew 1:1">1:1</verse-number> <w>Βίβλος</w> <suffix> </suffix> ... <w>Ἀβραάμ</w> <suffix>. </suffix> </p>html:</code>
Notice how there is no "verse" tag which encompasses all of the included text. Instead "verse-number" is a tag which is inserted where ever the verse breaks are located. This is opposed to the "p" (paragraph) tag, which encompasses all of the child "w" (word) and "suffix" (spaces and punctuation) tags. Paragraphs are of course present in the original biblical text.
One thing I might have done to take this principle even further would be to put the Book titles where they appear in the Greek manuscripts. In SBLGNT XML, the title is always the first child element of the "book" tag. However, that is not always where the title was in the manuscripts. Sometimes it was printed at the end of the book.
I like the distinction between textual forms and externally imposed structures as reflected in this XML document. I'm not sure what Logos' exact thinking was behind these design choices, but I think I've identified it.