Unicode conversion of the LXX Morph text
The Blog of Nathan D. Smith

I had been looking for a morphologically-tagged LXX for research and came across the CATSS LXXM text. The one thing lacking for my use of this text was that it was in betacode and not in unicode.

By searching I have found that many people have taken this text and converted it to unicode for embedding in web sites, but to my knowledge nobody is publishing the equivalent plain text files. The Unbound Bible comes closest, but it publishes the text and the morphological analysis in two separate files, which is suboptimal. So I decided to embark on converting the LXXM to unicode.

Luckily James Tauber has shared a Greek betacode to unicode conversion script which took care of most of the hard work for me. Using this, I was able to convert all of the texts to betacode to unicode. I am sharing the result as a git archive: lxxmorph-unicode.

The texts differ from the originals in the following ways:

  1. Several corrections have been applied.
  2. The betacode text has been converted to unicode.
  3. The files are now whitespace-separated rather than fixed-width.
  4. The second column, containing the POS and parsing information, has had its whitespace replaced with hyphens in accordance with the above.
  5. Combined the split files of Genesis, Psalms, Isaiah, Jeremiah, and Ezekiel, and renumbered all the files.

Please note that this resource has a rather novel license which requires users to fill out a user declaration and send it in to the CCAT program at the University of Pennsylvania (see 0-user-declaration.txt in the repo). As far as I can tell, my redistribution of the unicode version complies with the license. I have contacted Robert Kraft (the former steward) and Bernard Taylor (the current steward) with the corrections I've found.

(link to the original announcement on the Open Scriptures mailing list)

Date: 2013-06-01 12:10

Validate