A corpus of 42 books from European languages embracing four families.
Version 2 2017-09-22, 02:12Version 2 2017-09-22, 02:12
Version 1 2017-09-22, 01:56Version 1 2017-09-22, 01:56
dataset
posted on 2017-09-22, 02:12 authored by Candelario Hernández GómezCandelario Hernández Gómez, Rogelio Basurto-FloresRogelio Basurto-Flores, Lev GuzmanvLev GuzmanvA corpus of 42 books, three for each of 14 different European languages taken from the page www.gutenberg.org. The titles of the oeuvres and authors are written in the romanized way given in the page. The texts were chosen by no other reason that to be representative of each language and avoiding, as much as possible, the repetitive texts like poetry.
History
Usage metrics
Categories
- Knowledge and information management
- Central and Eastern European languages (incl. Russian)
- Comparative language studies
- Comparative and transnational literature
- Computational linguistics
- French language
- German language
- English language
- Italian language
- Latin and classical Greek literature
- Linguistic structures (incl. phonology, morphology and syntax)
- Literature in French
- Literature in German
- Literature in Italian
- Literature in Spanish and Portuguese
- Other European languages
- Other European literature
Keywords
BooksEuropean languagesInformation Engineering and TheoryCentral and Eastern European Languages (incl. Russian)Comparative Language StudiesComparative Literature StudiesComputational LinguisticsFrench LanguageGerman LanguageEnglish LanguageItalian LanguageLatin and Classical Greek LiteratureLinguistic Structures (incl. Grammar, Phonology, Lexicon, Semantics)Literature in FrenchLiterature in GermanLiterature in ItalianLiterature in Spanish and PortugueseOther European LanguagesOther European Literature
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC