Fun with LXXM-Corpus
The Blog of Nathan D. Smith

Once I have a text available for natural language processing, there are a few basic tasks I like to perform to kick the tires. First, I like to run the collocations method of NLTK, which gives common word pairs from the text. For the LXXM, here are the results:

ἐν τῇ
ἐν τῷ
ὁ θεὸς
τῆς γῆς
καὶ εἶπεν
λέγει κύριος
ἀνὰ μέσον
τὴν γῆν
τοῦ θεοῦ
ὁ θεός
τάδε λέγει
πρός με
πάντα τὰ
ὁ βασιλεὺς
οὐ μὴ
οὐκ ἔστιν
τῇ ἡμέρᾳ
οἱ υἱοὶ
τῷ κυρίῳ
τοῦ βασιλέως

If you disregard the stop words, you can get a decent idea of the fundamental thematic content of the text.

Now for the silliness, using the n-gran random text generator:

ἐν ἀρχῇ ὁδοῦ πόλεως ἐπ' ὀνόμασιν φυλῶν τοῦ Ισραηλ παρώξυναν οὐκ ἐμνήσθησαν διαθήκης ἀδελφῶν καὶ ἐξαποστελῶ πῦρ ἐπὶ Μωαβ ἐν τῷ ἐξαγαγεῖν σε τὸν ἱματισμόν

Fun with LXXM-Corpus The Blog of Nathan D. Smith

Fun with LXXM-Corpus
The Blog of Nathan D. Smith