User guide
After selecting a corpus on the main page, use the search tool in the top right corner of the page (1) to find a word and display its collocations. When typing, suggestions of matching lemmata will be displayed. Clicking on a lemma will bring you to a view of its collocations.
You can switch between Polish and English language versions of the application at any moment (2).
The name of the corpus (in the figure below: korba – the Electronic Corpus of 17th- and 18th-century Polish Texts), the lemma of the currently viewed word (here: król ‘king’), its part-of speech (in square brackets, here: [n] for noun) and its raw frequency in the corpus (in parentheses) are displayed in the top left corner of the page.
The word’s collocations are grouped into lists by type of dependency relation they form. The dependency type and its direction is stated in the header of each list (3; here: an incoming subj dependency edge from the collocate to the main word, i.e. the word is a subject of the collocate). Hover the mouse over the relation name to display its short explanation. Below each collocate, a colored bar (4) with logDice value for the collocation, raw corpus frequency of its occurrences and relative frequency per 1 million tokens is displayed. The tint of the bar’s background corresponds to the collocation strength measured by logDice: the more saturated the color, the stronger the collocation. By default, the top 4 collocations are shown – click the button (7) to expand the list or contract it back.
Symbols displayed to the right of the collocate (5) correspond to subcopora in which the collocation is more prominent (absence of symbols means a rather uniform distribution across all text types in the whole corpus). Hover the cursor over a symbol to see a tooltip with the subcorpus name. Basic statistics for the collocation calculated within the subcorpus are also displayed (logDice, raw and per-million frequency).
Clicking the symbol below the collocate (6) shows a panel with example concordances. When hovering over each row, a tooltip is shown with basic metadata of the corpus text from which the sentence was drawn. To hide the panel, click the hide button (8).
The displayed collocations can be narrowed down according to logDice and/or raw frequency using the filter submenu in the top bar. One or both ends of the ranges can be specified. For example, by setting minimum logDice (9) to 10 and leaving maximum unspecified, only collocations with strength measure greater or equal 10 will be shown.
Whenever any collocations are hidden by some filters, a blue status bar is displayed above the collocation lists. Click the reset filters button (10) to clear the filters and display all collocations again.
Words can be compared in terms of their collocations via the compare submenu in the top bar. Two comparison modes are possible: between corpora and within a corpus. If the word’s collocations are indexed for at least one other corpus, the list of corpora available for comparison will be displayed (11). Click on one of them to display a comparison between collocations of the same word as extracted from the two corpora (current and selected). Comparison within the same corpus, in turn, is performed between two different words: start typing in the search area (12) to find the word you want to compare against. Click the reset comparison button (13) to leave comparison mode and go back to collocations of the current word.
Whenever in comparison mode, a blue status bar is displayed above the collocation lists. For comparison between corpora, the name of the second corpus is shown together with the current word’s raw frequency in that corpus.
For comparison within a corpus, the lemma, part-of-speech and raw frequency of the name of the second word is shown (here: królowa ‘queen’).
When in comparison mode, the collocations are still displayed in lists grouped by dependency relation, but there are several key differences with respect to the basic, single-word view. Consider the list of adjunct (collocates are modifiers of the main word) collocations for comparison between król ‘king’ and królowa ‘queen’) within the same corpus. The numeric values under each collocate (14) are no longer simple logDice measures, but their differences. Higher positive values mean collocations is more typical for the current word, while low negative values point to collocations more typical to the word we are comparing against. Values near 0 correspond to collocations equally strong (or weak) for both words. For example, the modifiers pruski ‘Prussian’ and duński ‘Danish’ are the most prominently tied to król rather than królowa and pertain to actual or fictitious male monarchs. At the bottom of the list, with an opposite tendency, there are anielski ‘angelic/of angels’ and niebo ‘[of] heaven’, used to describe Virgin Mary in many religious texts typical for the historical period.
Instead of subcorpus icons, two symbols (each or ) are displayed (15), signaling the presence or absence of given collocation in the corpus (in the latter case, logDice = 0 is assumed for the purpose of calculating the difference). Hover over each symbol to display basic statistics for each collocation.
By default, the 3 top and 3 bottom collocations on each list are displayed; the list can still be expanded and contracted with a button.
For collocations pertaining to both words, the concordances panel shows examples for both. The concordances for the collocation with the second word are always highlighted in blue.
When filtering by logDice in comparison mode, the filter range is applied to the absolute value of the logDice difference. For example, setting the minimum to 8 will restrict the lists to collocations with difference greater/equal to 8 or lesser/equal to -8.
Comparison between corpora works in an analogous way.