Title Page
Abstract
Contents
Ⅰ. Introduction 10
Ⅱ. Related Work 12
2.1. Neural Conversation Modeling 12
2.2. Bandit 12
Ⅲ. Preliminaries 14
3.1. Notations 14
3.2. Problem Setting 14
Ⅳ. Stage 1: Subspace estimation for Generalized Bilinear Bandit 17
Ⅴ. Stage 2: Generalized Low-dimensional Linear Bandit 19
5.1. GBLB to GLB 19
5.2. GLOC 20
5.3. Low-rankGLOC 21
5.4. LowONS-GLM 22
5.5. Overall regret of ESTR-GLM 23
Ⅵ. Experiments 24
6.1. Simulation 24
6.2. Ubuntu Dialogue Corpus 25
Ⅶ. Conclusion 27
References 28
Appendix 33
Figure 1. Example of online model for response selection in chatbot 11
Figure 2. Simulation results for d₁=d₂=16 and r=2 24
Figure 3. Simulation results for d₁=d₂=16 and r=16 25
Figure 4. Experimental results using UDC. We set d₁=d₂=128 and assume rank=96. 26