View on GitHub

aama

Afro-Asiatic Morphology Archive

Welcome to AAMA - the Afro-Asiatic Morphology Archive.

Getting Started

Overview

Appendix 1: The Data Schema

Appendix 2: The Data Files

Details
  • 2. How to Install and configure required software

    Although it is anticipated that the AAMA digital application will eventually have a home where its data and be consulted, and to a certain extent manipulated, online we anticipate that most users will want to download the application and a selection (or all!) of the data, and work with it on their own machine, perhaps including data of their own, which they might wish to propose uploading to the home site, along with proposals for modifications and additions to the data-manipulation software.

    At the moment, pending the creation of an executable of the `.exe` or '.jar` type what we can propose, in addition to the downloading of a choice of the data files of interest, is the downloading and running of the set of scripts which constitute the application.

    • Note on Git client

      The aama project uses GitHub to store data and tools; you wll need a git client in order to download the tools repository and the data repositories you are interested in. Follow the instructions at Set Up Git.

      Note that you do not need to create a github account unless you want to edit the data or code. Instructions for how to do that are below.

    • 2.1 Set up aama directory

      We will assume that the data is placed in a directory called 'aama-data' and application software is to be placed in a directory called 'webappy'. So create and switch to an aama directory structure on your local drive, e.g.

      ~/ $ mkdir aama-data
      ~/ $ mkdir webappy
      ~/ $ cd webappy
      ~/webappy/ $ mkdir bin
      ~/ $ cd aama-data
      ~/ $ aama-data/mkdir data
                          

    • 2.2 Install Apache Jena Fuseki

      Fuseki is the SPARQL server we are using to query the dataset. Download the current apache-jena-fuseki-N.N.N-distribution distribution (either the zip file or the tar file; NB, make sure your Java JDK is up-to-date with the download) and store it in a convenient location. ~/jena is a good place. The following steps will install the aama dataset and verify that it runs. Futher information about Fuseki, as well as information and links about RDF linked data and the SPARQL query language can be found at the Apache Jena site.

  • 2.3. Download data

    • Take a look at the Aama repositories and decide which languages interest you. In general we use one repository per language, or in some cases, language variety, e.g. beja-huda is the variety of Beja described by Richard Hudson in . . . beja-van is the variety of Beja described by . . . Verhove ub . . . etc.

      Now you need to download the data to your local harddrive. Create a data directory inside the aama directory, e.g. ~/aama $ mkdir data. Then clone each language repository into the data directory:

      ~/ $ cd aama-data/data
      ~/aama-data/data $ git clone https://github.com/aama/afar.git
      ~/aama-data/data $ git clone https://github.com/aama/geez.git
      ~/aama-data/data $ git clone https://github.com/aama/yemsa.git

      Alternatively, you can create a personal github account, fork the aama repositories (copy them to your account), and then clone your repositories to your local drive. See Fork a Repo for details.

      For the normative/persistant data format we are using JSON. This notation has the advantage of being rigorously defined system of terms (:term), strings ("string"), vectors ([a b c d] maps ({a b, c d} and sets (#{a b c d}, and thus reliably transformable into a consistent RDF notation, while at the same time providing a human-readable natural format for data-entry and inspection.

      Our current JSON structure (cf. below) while open to extension and revision, seems to provide a natural notation for the verbal and pronominal inflectional paradigms encountered in Afroasiatic, and perhaps for inflectional paradigms generally.

    • 2.4. Download Application Code

      In the ~/webappy directory, clone the aama application repository::

      ~/aama $ git clone https://github.com/aama/webappy.git
      			

      The Python scripts (`.py`) will remain in this directory, while the shell and query scripts (`.sh`. `.q`) should be moved to the `~/webappy/bin` subdirectory

    When you have finished, your directory structure should look like this (assuming you have cloned afar, geez, and yemsa):

       ~/ 
       |-aama-data
       |--data
       |---afar: `afar.json`, `afar.ttl`
       |---geez: `geez.json`, `geez.ttl`	
       |---yemsa: `yemsa.json`, `jemsa.ttl`
    `   |-jena
       |--apache-jena-fuseki-N.N.N: `aamaconfig.ttl`, . . .	
       |-webappy: `pdgmDict-...py`,  `pdgmDisp-...py`
       |---bin: `... .sh`, `... .q`
    		
  • 3. Configuring Appication

  • 3.1 Generate RDF data from morphological data files

    In order to convert JSON-format data files to TTL ("turtle" -- a more easily human-readable RDF format), you will use the pdgmDict-json2ttl.py file in the webappy directory. The aama-datastore-update.sh shell script will call aama-ttl2fuseki.sh which in turn will convert the .ttl file to the rdf-xml which is needed for uploading to the Fuseki SPARQL service.

    For convenience, an already-generated TTL version is included with each language's JSON file. Since the JSON file is the normative/persistant data format, any corrections or additions you want to make should be made in this file; and if any changes are made to the SON file, you must then generate new TTL/RDF files to be uploaded to the SPARQL server. And in fact, as long as you observe the above structure for JSON files, you can create any number of new language files of your own, transform them to RDF format, and upload them to the SPARQL server for querying.

  • 3.2 Upload RDF data to SPARQL service

    In order to upload the RDF files to Fuseki, you must first start the server by running:

    ~/aama $ webappy/bin/fuseki.sh
                    
    This script, like the following, assumes that the current version of Fuseki, for the moment apache-jena-fuseki-3.16.0, has been placed in the jena directory, and that the file aamaconfig.ttl has been copied to the Fuseki version directory; the scripts should be edited for the correct locations if this is not the case. When run for the first time, you will notice that the script, which references the configuration file aamaconfig.ttl, will have placed a, for the moment empty, data sub-directory aama in the jena/apache-jena-fuseki-3.16.0/ directory.

    The following script:

     ~/aama $ webappy/bin/aama-datastore-update.sh "../aama-datadata/[LANG]"
                    
    will load the relevant LANG-pdgms.ttl file in aama-data/data/[LANG] into the Fuseki server.

    It also automatically runs the queries count-triples.rq ("How many triples are there in the datastore?") and list-graphs.rq ("What are the URIs of the language subgraphs?"), from the directory webappy/bin. If the upload has been successful, you will see an output such as the following (assuming again that afar, geez, and yemsa are the languages which have been cloned into aama/data/).

    Query: bin/fuquery-gen.sh bin/count-triples.rq
    ?sTotal
    33871
    Query: bin/fuquery-gen.sh bin/list-graphs.rq
    ?g
    <http://oi.uchicago.edu/aama/2013/graph/afar>
    <http://oi.uchicago.edu/aama/2013/graph/geez>
    <http://oi.uchicago.edu/aama/2013/graph/yemsa>
    	  

  • 4. Running the Application

    The SPARQL service can be accessed to explore the morphological data via two interfaces:
    • 4.1The Apache Jena Fuseki interface

      You can see this on your browser at localhost:3030 after you launch Fuseki. SPARQL queries, for example, . . . , can be run directly against the datastore in the Fuseki Control Panel on the localhost:3030/dataset.html page (select the /aama dataset when prompted). Also the pdgmDisp-... scripts automatically write to the terminal all SPARQL queries generated in the course of the computation. These queries can be copied and pasted into the Fuseki panel for inspection and debugging.

    • 4.2 An application specifically oriented to AAMA data

      ### REVISE! ###

      A preliminary menu-driven GUI application, will have already been downloaded following the instructions outlined above in Download data, tools, and application code. This application demonstrates the use of SPARQL query templates for display and comparison of paradigms and morphosyntactic properties and categories. It is written in Python, which has a very engaged community of users who have created a formidable, and constantly growing set of libraries. However essentially the same functionality could be achieved by any software framework which can provide a web interface for handling SPARQL queries submitted to an RDF datastore,

      The application presupposes that the Fuseki AAMA data server has been launched through the invocation of the shell-script bin/fuseki.sh. ication menu option.

    • Note: Remote Data and Webapp Update

      AAMA is an on-going project. Its data is constantly being updated, corrected, and added-to; the accompanying webb application is in a process of constant revision. To ensure that your data and web app are up-to-date you should periodically run the following shell scripts, which assume that git has been installed and that the data and webapp have been cloned from the master version in the manner outlined above.

      The following script:

       ~/aama $ tools/bin/aama-pulldata.sh data/[LANG]
                      
      will update the JSON language data file in the data/[LANG] directory.

      While:

       ~/aama $ tools/bin/aama-pulldata.sh "data/*"
                      
      will update the JSON language data files in all the data/[LANG] directories.

      Once revised (or new) JSON files have been installed, remember to run the appropriate scripts to transform them to ttl format and to load them into the SPARQL server, as outlined above.

      Finally, the script:

       ~/aama $ tools/bin/aama-pullwebappy.sh 
                      
      will update he files of the web applicagtion.

    • Appendix 1: The Data Schema

      Basic structure:

      In outline each language JSON file has the following structure (see any of the LANGUAGE-pdgms.json files for a concrete example, and see below for explanation of terms):

      
      {
      |-:lang         "language name"
      |-:sgpref       "string representing 3-character ns prefix used for the 
      |                  URI of language-specific morphosyntactic properties 
      |                  and values"
      |-:datasource   "bibliographic source(s) for the data in the file"
      |-:geodemoURL   "on-line geo-/demo-graphical information about the language"
      |-:geodemoTXT   "short textual summary of geo-/demographcal information"
      |-:schemata {	"associative map of each morphosyntactic property used 
      |                  in the inflectional paradigms with a list of its values"
      |            }
      |-:lexemes   {  "associative map of paradigmatic 'lexemes' with summary map 
      |                  of properties -- a rudimentary lexicon of paradigm lexemes"
      |            }
      |-:termclusters   [  "label-ordered list of  term-clusters/paradigms,
                          each of which has the structure:"
      |   {
      |----:label     "descriptive label assigned to the term-cluster at data-entry"
      |----:note
      |----:common    "map of property-value pairs which all members of the
      |                 termcluster have in common"
      |----:terms     "list of lists, the first of which enumerates  the
      |                  properties which differentiate individual terms, while
      |                  the others list, in order, the value of the i-th
      |                  property -- in fact, a paradigm for the distinct property-
      |                  value pairs of the lexeme in question"
      |   }
      |
      | . . .
      |
      | ]
      }
      
      

      Appendix 2: The Data Files

      At present the following data files are available: