Visuelle Linguistik

Theorie und Anwendung von Visualisierungen in der Sprachwissenschaft

19. bis 21. November 2014, Schloss Herrenhausen, Hannover, Deutschland

Übersicht | Overview

Velislava Todorova and Maria Chinkina: "Significance Filter for N-gram Viewer"

MA students in the ISCL program at the University of Tübingen

Slash/A (http://linguistics.chrisculy.net/vistola/tools/slasha.html) is a visualization tool for the analysis of tendencies in language use over time. Given a dated and tokenized corpus, it calculates and visually presents frequencies of selected n-grams for the time period covered by the corpus and provides the option of smoothing the graph in order to make the general tendency easier to see. The user can specify an n-gram using all token level annotations provided in the corpus. The tool also accesses the metadata of the corpus, allowing for comparison of the use of n-grams in different groups of texts (grouping criterion can be the author, the genre or any other text characteristic). In order to facilitate the work on complex research questions involving statistical analysis, we are currently introducing into our tool a filtering mechanism that indicates the periods of time throughout which the observed values are interesting in one of the following senses: 1. For one n-gram, one might want to know if the observed peaks and troughs are significant or not. 2. With n-grams such that n ≥ 2, it might be interesting if the elements of the n-gram co-occur more often than expected by chance if one wants to monitor the strength of collocations or constructions and its change over time. 3. When comparing the use of the same n-grams by different authors (or genres etc.), the significance of the differences is important. For the presentation of the test results, we decided to visually filter out the periods in which the observed differences of n-gram frequencies are most likely due to chance.

References

Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd, London (1950)