Training a dialogue generation model requires large-scale, high-quality dialogue data. Data augmentation is a common method to expand a dataset. Some common data augmentation methods such as back-translation and synonym replacement are used to augment dialogue datasets, but these methods are only aimed at word or sentence level changes and do not consider the relationship between dialogues to generate new dialogue data and the meaning of the generated sentences may conflict with the context of the dialogue. To solve this problem, a new data augmentation method is proposed for expanding the task-oriented dialogues. Solve the problem of insufficient training data by looking for a dialogue in the original dataset, then replacing the dialogue after comparing the belief state and dialogue acts. Experimental results on MultiWOZ dataset show that this method can improve the Combined Score by 0.99, 0.24 and 1.5 respectively compared with the data augmentation method at the character-level, word-level and the sentence-level.
|