International Summer School on
Language Documentation and Description

School of Oriental and African Studies, London

22 June - 3 July 2009

Data and archiving

This course addresses methodological and practical issues in managing data in documentary linguistics and preparing it for digital archiving. The issues begin before data is collected and include deciding what counts as data, how to encode data, how to design and manage data, how to make data most useful for a variety of linguistic and other purposes, the role of metadata, and how to ensure that data conforms to good practices for long term preservation. There will be emphasis on the concepts, processes and technologies involved in language archiving.

The course begins with an introduction to data design and management, leading to a comparison of the properties and uses of data-management strategies such as tables, layout, and mark-up systems. Mark-up systems including XML and XSLT are examined in greater depth, including hands-on training in the use of XML tools. Also covered are principles of character and document encoding, with special emphasis on the Unicode Standard.

The course concludes with topics specifically related to archiving, such as the relationship between the strategies and technologies used for language documentation on the one hand and archiving on the other, how to decide what to include or exclude when selecting materials for archiving, how to structure and annotate a collection, how to ensure data is preservable, and how to facilitate the organic growth and community-driven evolution of an archived collection.