Hebrew Text Database ETCBC4

Peursen, Prof. Dr. W.T. van (ETCBC, VU Amsterdam) (): Hebrew Text Database ETCBC4. DANS. https://doi.org/10.17026/dans-2z3-arxf

2014-07-13

The ETCBC database of the Hebrew Bible (formerly known as WIVU database), contains the scholarly text of the Hebrew Bible with linguistic markup.
A previous version can be found in EASY (see the link below).
The present dataset is an improvement in many ways:

(A) it contains a new version of the data, called ETCBC4.
The content has been heavily updated, with new linguistic annotations and a better organisation of them, and lots of additions and corrections as well.

(B) the data format is now Linguistic Annotation Framework (see below). This contrasts with the previous version, which has been archived as a database dump in a specialised format: Emdros (see the link below).

(C) a new tool, LAF-Fabric is added to process the ETCBC4 version directly from its LAF representation. The picture on this page shows a few samples what can be done with it.

(D) extensive documentation is provided, including a description of all the computing steps involved in getting the data in LAF format.

Since 2012 there is an ISO standard for the stand-off markup of language resources, Linguistic Annotation Framework (LAF).

As a result of the SHEBANQ project (see link below), funded by CLARIN-NL and carried out by the ETCBC and DANS, we have a created a tool, LAF-Fabric, by which we can convert EMDROS databases of the ETCBC into LAF and then do data analytic work by means of e.g. IPython notebooks.
This has been used for the Hebrew Bible, but it can also be applied to the Syriac text in the CALAP (see link below).

This dataset contains a folder laf with the laf files, and the necessary declarations are contained in the folder decl.
Among these declarations are feature declaration documents, in TEI format (see link below), with hyperlinks to concept definitions in ISOcat (see link below).
For completeness, the ISOcat definitions are repeated in the feature declaration documents.
These definitions are terse, and they are more fully documented in the folder documentation.