The Hans Rausing Endangered Languages ProjectThe Hans Rausing Endangered Languages Project   The Hans Rausing Endangered Languages Project

Deposit formats

We recommend that all data is as 'portable' as possible:

  • Objects (disks etc.) should be labelled
  • File names should be platform-independent and have correct extensions
  • Files should be in a platform-independent format

For more information about data portability, see Seven Dimensions of Portability for Language Documentation and Description (Bird/Simons) or 'Language documentation and archiving, or how to build a better corpus' (Heidi Johnson 2004 in P. Austin (ed) Language Description and Documentation 2).

If some of your data is not portable (or not yet digital), you will usually be able to convert it to a portable digital format. For case studies in converting materials to portable formats, see:

Preferred formats

We can accept a range of formats, with a preference for the following:

  • sound - WAV
  • image - BMP, TIFF, JPEG. See full advice about images
  • video - MPEG2
  • text - plain text, with or without markup
  • documents - plain text, PDF or postscript
  • structured text - XML, other markup (with description of markup system)
  • structured data in commonly available Office formats - ELAR will convert them to archive-suitable formats
  • character encoding :
    • preferred encoding is ASCII or Unicode
    • clearly document any other encodings used, e.g. ISO 8859-5
    • discuss with us if you use font substitution to handle non-Roman characters

Please contact us at archive@hrelp.org if you unsure whether a particular format is suitable for submission to the archive, or if you are having problems converting your data to a portable format.

This page is not yet complete.