Finanziato dall'Unione Europea NextGenerationEU
Ministero dell'Università e della Ricerca
Italiadomani Piano nazionale di ripresa e resilenza
Università di Catania

Production workflow for the ontology of Nouns and Verbs of the Gallo-Italic variety spoken in Nicosia and Sperlinga

This page describes all the production steps for the ontology of Nouns and Verbs of the Gallo-Italic variety spoken in Nicosia and Sperlinga, among with the utilized software tools and the intermediate products.

Download in Turtle Format

nicosiaesperlinga-base.ttl is an OWL ontology providing the Lexicon metadata. It is the base OWL ontology containing just the ontology individual, the lexicon one and some entries, which will has to be populated with all the remaining lexical entries in the Gallo-Italic variety spoken in Nicosia e Sperlinga.

pdfimporter is a tool that extracts lemmas from the Vocabolario del dialetto galloitalico di Nicosia e Sperlinga and places the corresponding entries into nicosiaesperlinga-base.ttl.

Thus, running the following command in the same directory with nicosiaesperlinga-base.ttl will produce nicosiaesperlinga-lemmas.ttl, i.e., an ontology with all the nouns and verbs of the Gallo-Italic variety spoken in Nicosia and Sperlinga.

 java -jar pdfimporter.jar nicosiaesperlinga.pdf

sicilian-derivationbuilder finds out, using a brute force approach, all the possible derivations through Gallo-Sicilian Features which transform Sicilian etymons into lemmas in nicosiaesperlinga-lemmas.ttl.

 java -jar sicilian-derivationbuilder.jar nicosiaesperlinga-lemmas.ttl

It produces the file derivations-bf.csv, enumerating one derivation per row. These derivations have the form lemma <--feature_label_1--intermediate_form_1<-- ... intermediate_form_n<--feature_label_n<--sicilian_etymon java

This file is then revised and reworked by lexicographers that eliminate multiple derivations for the same lemma, remove unplausible ones, and add further derivations produced using the online derivation tool.

derivations-revised.csv is the file that resulted after the manual intervention of lexicographers.

These derivations are then imported into the final ontology of nouns and verbs in the gallo-italic variety spoken in Nicosia e Sperlinga using gs-derivationsimporter. This tool takes as input three arguments:

  1. the derivation file,
  2. the OWL TTL file where the derivations will be placed and
  3. the language tag for the etymons.

When run without arguments, it searches for derivations-revised.csv as derivations file, copies nicosiaesperlinga-lemmas.ttl into nicosiaesperlinga.ttl and places derivations in it indicating sic as language tag for etymons.

 java -jar gs-derivationsimporter.jar

The imported derivations can be verified by means of liph-validator, a tool that checks that all the derivation steps occurring in a ontology are compliant with the definition of the linguistic phenomena they refers to. More in details, liph-validator takes as arguments

  1. an ontology containing linguistic phenomena occurrences and
  2. another ontology containing finite state definitions of the linguistic phenomena.

In our context

 java -jar liph-validator.jar nicosiaesperlinga.ttl https://gallosiciliani.unict.it/ns/gs-features?ttl

gs-derivationsextractor allows one to produce a CSV file useful for statistical purposes enumerating all the derivations in an OWL ontology in turtle format, provided that the linguistic phenomena in the derivations are in the GalloSicilian Features Ontology.

The former argument of gs-derivationsextractor is the ontology file in turtle format, whereas the latter is the name of the CSV file which will be produced.

 java -jar gs-derivationsextractor.jar nicosiaesperlinga.ttl derivations-ext.csv

The generated file derivations-ext.csv has the following columns:

id a unique identifier for the row
lemma vnisp indicating the Gallo-Sicilian lemma ending the derivation
derivazione containing the derivation
tratti disattesi enumerating all the Gallo-Sicilian features which could have affected the etymon, but are not in the derivation
nuovo indice di galloitalicità the rate between the total number of the Gallo-Sicilian features which could have affected the etymon and those that occurred in the derivation
Afer `sì` if there is some feature belonging to the category Apheresis, `no` otherwise
Assib `sì` if there is some feature belonging to the category Assibilation, `no` otherwise
Degem `sì` if there is some feature belonging to the category Degemination, `no` otherwise;
Deretr `sì` if there is some feature belonging to the category Deretroflexion, `no` otherwise
Dissim `sì` if there is some feature belonging to the category Dissimilation, `no` otherwise
Ditt sì` if there is some feature belonging to the category Diphthongization, `no` otherwise
Leniz `sì` if there is some feature belonging to the category Lenition, `no` otherwise
Palat `sì` if there is some feature belonging to the category Palatalization, `no` otherwise
Vocal `sì` if there is some feature belonging to the category Vocalization, `no` otherwise
microtratto 1 for the first feature in the derivation, if any
microtratto 2 for the second feature in the derivation, if any
microtratto 3 for the third feature in the derivation, if any
microtratto 4 for the fourth feature in the derivation, if any
microtratto 5 for the fifth feature in the derivation, if any
microtratto 6 for the sixth feature in the derivation, if any
microtratto 7 for the seventh feature in the derivation, if any
microtratto 8 for the eighth feature in the derivation, if any