Total texts
10Folklore Lab
Data-driven monitoring of corpus annotation coverage, structure, and classification quality.
This page is an analytics lab inspired by proven workflows from international folklore corpora.
KZ+EN coverage
10ATU-linked texts
10Metadata-linked texts
10Citation-ready texts
10Documents
10Tokens
120Unique terms
115Hapax (single use)
110TTR
95.83%Average doc length
12Voyant-style text analytics
Most frequent terms
-
night
-
old
-
saves
-
silver
-
storm
-
across
-
ancestral
-
anger
-
around
-
beard
-
becomes
-
bell
-
body
-
border
Context snippets (KWIC)
-
At night, the kobyz trembles across the lake. Villagers treat the sound as an ancestral message.
-
An orphan girl loses her way at night. Moonlight draws a silver trail and leads her out of danger.
Frequent phrases (bi-grams)
Not enough data to render.
Analytical views
Term frequency chart
Not enough data to render.
Term trends (per 1000 tokens)
Not enough data to render.
Document length distribution
Not enough data to render.
Collection timeline (decades)
Not enough data to render.
ATU distribution (Top 12)
Not enough data to render.
Genre profile
Not enough data to render.
Regional map
Not enough data to render.
Collector activity
| Collector | Texts | First year | Last year |
|---|---|---|---|
| Айбек Н. | 1 | 1975 | 1975 |
| Әлихан Қ. | 1 | 1958 | 1958 |
| Гүлнар Е. | 1 | 1951 | 1951 |
| Данияр Т. | 1 | 1938 | 1938 |
| Ермек Р. | 1 | 1980 | 1980 |
| Марат С. | 1 | 1962 | 1962 |
| Нұрбек И. | 1 | 1968 | 1968 |
| Рауан Б. | 1 | 1949 | 1949 |
| Сабина Ө. | 1 | 1943 | 1943 |
| Салтанат Ж. | 1 | 1971 | 1971 |
Metadata field coverage
| Field | Linked texts |
|---|---|
| Тақырып | 10 |
| Орындау контексі | 10 |
Comparative benchmark with external corpora
| Corpus | Reference feature | Our adoption | Status | Source |
|---|---|---|---|---|
| AFT Corpus | Structured tale typing with ATU classes. | ATU distribution and linkage metrics are active. | Implemented | Open Humanities Data |
| SKVR (Finnish Literary Society) | Faceted filtering with export options (XML/CSV). | Faceted corpus exploration is implemented through filters and analytics tables. | Implemented | skvr.fi |
| Kivike (Estonian Literary Museum) | Rich metadata discovery by archive, geography, and person. | Coverage monitoring across passport and metadata layers is implemented. | Implemented | kivike.kirmus.ee |
| Pangloss Collection (CNRS) | Open linguistic audio archives with linked transcriptions. | Next step: integrate audio/ELAN timeline layers. | Next phase | CNRS |
| Meertens FACT | Automatic metadata enrichment and folktale classification. | Next step: automatic ATU/motif suggestion tooling. | Next phase | Meertens Institute |