Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task

Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee


Although the data-driven approaches of some recent bot building platforms make it possible for a wide range of users to easily create dialogue systems, those platforms don’t offer tools for quickly identifying which log dialogues contain problems. Thus, in this paper, we (1) introduce a new task, log dialogue ranking, where the ranker places problematic dialogues higher (2) provide a collection of human-bot conversations in the restaurant inquiry task labelled with dialogue quality for ranker training and evaluation (3) present a detailed description of the data collection pipeline, which is entirely based on crowd-sourcing (4) finally report a benchmark result of dialogue ranking, which shows the usability of the data and sets a baseline for future studies.


 DOI: 10.21437/Interspeech.2020-1341

Cite as: Xu, X., Zhang, Y., Liden, L., Lee, S. (2020) Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task. Proc. Interspeech 2020, 3920-3924, DOI: 10.21437/Interspeech.2020-1341.


@inproceedings{Xu2020,
  author={Xinnuo Xu and Yizhe Zhang and Lars Liden and Sungjin Lee},
  title={{Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3920--3924},
  doi={10.21437/Interspeech.2020-1341},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1341}
}