EASY

DANS - Data Archiving and Networked Services

Search datasets

Close Search help

SpokenSTS

Cite as:

Merkx, MSc D.G.M. (Radboud University); Frank, dr. S.L. (Radboud University); Ernestus, prof. dr. M.T.C. (Radboud University) (): SpokenSTS. DANS. https://doi.org/10.17026/dans-z48-3ev6

2021 Merkx, MSc D.G.M. (Radboud University); Frank, dr. S.L. (Radboud University); Ernestus, prof. dr. M.T.C. (Radboud University) 10.17026/dans-z48-3ev6

Spoken versions of the Semantic Textual Similarity dataset for testing semantic sentence level embeddings.
Contains thousands of sentence pairs annotated by humans for semantic similarity. The spoken sentences can be used in sentence embedding models to test whether your model learns to capture sentence semantics.
All sentences available in 6 synthetic Wavenet voices and a subset (5%) in 4 real voices recorded in a sound attenuated booth. Code to train a visually grounded spoken sentence embedding model and evaluation code is available at https://github.com/DannyMerkx/speech2image/tree/Interspeech21