An extended version of the VANiLLa dataset. It includes negative pairs, e.g. a question and a wrong answer. The Dataset1 is created by mixing questions and answers from records with different question relation/entity label combination. The Dataset2 is created from records with same question relation and different entity label, so the Dataset2 is "harder" to predict. Both datasets are splitted into train and test parts.
Article: A Both, A Gashkov, M Eltsova -- Similarity Detection of Natural-Language Questions and Answers using the VANiLLa dataset.
CITE THIS COLLECTION
DataCiteDataCite
3 Biotech3 Biotech
3D Printing in Medicine3D Printing in Medicine
3D Research3D Research
3D-Printed Materials and Systems3D-Printed Materials and Systems
4OR4OR
AAPG BulletinAAPG Bulletin
AAPS OpenAAPS Open
AAPS PharmSciTechAAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität HamburgAbhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)ABI Technik (German)
Academic MedicineAcademic Medicine
Academic PediatricsAcademic Pediatrics
Academic PsychiatryAcademic Psychiatry
Academic QuestionsAcademic Questions
Academy of Management DiscoveriesAcademy of Management Discoveries
Academy of Management JournalAcademy of Management Journal
Academy of Management Learning and EducationAcademy of Management Learning and Education
Academy of Management PerspectivesAcademy of Management Perspectives
Academy of Management ProceedingsAcademy of Management Proceedings
Academy of Management ReviewAcademy of Management Review
Gashkov, Alexander (2021). neVANiLLa dataset. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.5263142.v1