研究文章
比起一个改进的基于变压器神经机器翻译策略:Interacting-Head关注
表5
使用四个引起模型综合评价得分在WMT17 EN-CS评价集和测试集。
|
| 模型 |
WMT17子集 |
头/头的大小 |
| 2/256 |
4/128 |
8/64 |
16/32 |
32/16 |
64/8 |
|
| (一) |
|
| 多线程的关注 |
dev |
11.69 |
13.96 |
13.76 |
14.14 |
12.10 |
11.98 |
| newstest2014 |
12.90 |
15.65 |
14.82 |
15.52 |
13.26 |
12.71 |
| newstest2015 |
11.03 |
12.48 |
11.66 |
12.62 |
10.04 |
10.08 |
| newstest2016 |
11.98 |
13.85 |
13.09 |
14.32 |
11.16 |
10.85 |
| newstest2017 |
10.26 |
12.45 |
11.71 |
12.36 |
9.96 |
10.14 |
|
| 多线程(头部大小=注意n) |
dev |
10.14 |
11.87 |
12.99 |
13.35 |
14.36 |
14.46 |
| newstest2014 |
11.32 |
13.34 |
14.04 |
14.49 |
15.79 |
15.28 |
| newstest2015 |
9.11 |
10.18 |
11.55 |
11.67 |
12.69 |
12.05 |
| newstest2016 |
10.23 |
11.86 |
12.51 |
12.62 |
13.73 |
13.83 |
| newstest2017 |
8.99 |
10.43 |
11.30 |
11.23 |
12.54 |
12.47 |
|
| 的头部特写的关注 |
dev |
9.80 |
10.06 |
11.77 |
12.06 |
12.75 |
12.68 |
| newstest2014 |
10.90 |
10.65 |
11.65 |
12.33 |
12.94 |
12.34 |
| newstest2015 |
8.17 |
8.54 |
10.06 |
11.70 |
11.51 |
11.57 |
| newstest2016 |
9.11 |
9.38 |
10.50 |
11.25 |
11.78 |
11.73 |
| newstest2017 |
8.50 |
8.63 |
9.92 |
11.05 |
11.52 |
11.83 |
|
| Interacting-head关注(模型) |
dev |
17.01 |
17.76 |
17.93 |
18.01 (+ 3.87) |
- - - - - - |
- - - - - - |
| newstest2014 |
18.71 |
19.48 |
20.01 |
20.14 (+ 4.62) |
- - - - - - |
- - - - - - |
| newstest2015 |
15.52 |
16.49 |
16.38 |
16.40 (+ 3.78) |
- - - - - - |
- - - - - - |
| newstest2016 |
17.20 |
18.37 |
18.29 |
18.74 (+ 4.42) |
- - - - - - |
- - - - - - |
| newstest2017 |
14.66 |
15.66 |
15.79 |
15.78 (+ 3.42) |
- - - - - - |
- - - - - - |
|
| (b) |
|
| 多线程的关注 |
dev |
15.13 |
12.33 |
15.27 |
10.30 |
29.60 |
16.80 |
| newstest2014 |
13.95 |
11.66 |
14.15 |
8.62 |
25.57 |
13.85 |
| newstest2015 |
23.68 |
17.96 |
23.34 |
17.17 |
39.53 |
24.06 |
| newstest2016 |
18.44 |
14.90 |
19.31 |
12.70 |
35.28 |
21.31 |
| newstest2017 |
20.70 |
18.20 |
21.06 |
16.21 |
36.44 |
21.63 |
|
| 多线程(头部大小=注意n) |
dev |
17.30 |
15.53 |
13.43 |
12.43 |
10.83 |
14.37 |
| newstest2014 |
15.02 |
14.35 |
12.12 |
11.12 |
9.19 |
8.56 |
| newstest2015 |
27.60 |
23.87 |
20.22 |
18.71 |
16.79 |
16.91 |
| newstest2016 |
21.34 |
20.34 |
18.24 |
16.24 |
13.80 |
14.37 |
| newstest2017 |
24.23 |
21.56 |
20.07 |
19.27 |
15.91 |
18.66 |
|
| 的头部特写的关注 |
dev |
28.37 |
43.43 |
27.00 |
34.33 |
27.57 |
27.51 |
| newstest2014 |
21.25 |
33.70 |
28.94 |
34.97 |
25.86 |
26.22 |
| newstest2015 |
38.03 |
55.35 |
37.95 |
47.21 |
42.48 |
46.87 |
| newstest2016 |
30.31 |
45.82 |
34.44 |
44.11 |
36.35 |
36.83 |
| newstest2017 |
31.01 |
47.75 |
35.77 |
41.63 |
35.86 |
35.76 |
|
| Interacting-head关注(模型) |
dev |
7.63 |
7.27 |
8.03 |
7.25 (3.05) |
- - - - - - |
- - - - - - |
| newstest2014 |
6.99 |
6.43 |
7.16 |
6.21 (2.41) |
- - - - - - |
- - - - - - |
| newstest2015 |
12.99 |
12.09 |
11.94 |
12.08 (5.09) |
- - - - - - |
- - - - - - |
| newstest2016 |
10.14 |
9.80 |
9.70 |
9.83 (2.87) |
- - - - - - |
- - - - - - |
| newstest2017 |
12.48 |
12.95 |
11.71 |
12.24 (3.97) |
- - - - - - |
- - - - - - |
|
| (c) |
|
| 多线程的关注 |
dev |
17.37 |
19.52 |
19.16 |
19.55 |
18.85 |
17.92 |
| newstest2014 |
18.41 |
20.82 |
20.42 |
20.80 |
19.92 |
18.93 |
| newstest2015 |
16.69 |
18.90 |
18.51 |
18.77 |
18.17 |
17.32 |
| newstest2016 |
17.21 |
19.59 |
19.05 |
19.36 |
18.44 |
17.45 |
| newstest2017 |
15.65 |
17.96 |
17.51 |
17.85 |
17.13 |
16.26 |
|
| 多线程(头部大小=注意n) |
dev |
16.09 |
17.62 |
18.65 |
18.91 |
19.68 |
18.10 |
| newstest2014 |
17.04 |
18.82 |
19.69 |
20.09 |
20.88 |
20.06 |
| newstest2015 |
15.30 |
16.68 |
18.08 |
18.16 |
19.05 |
18.37 |
| newstest2016 |
15.85 |
17.42 |
18.41 |
18.62 |
19.39 |
18.92 |
| newstest2017 |
14.70 |
16.00 |
16.99 |
17.10 |
19.99 |
17.42 |
|
| 的头部特写的关注 |
dev |
15.93 |
15.82 |
16.05 |
16.85 |
16.48 |
16.72 |
| newstest2014 |
16.86 |
16.58 |
16.33 |
17.04 |
16.76 |
16.76 |
| newstest2015 |
15.00 |
14.74 |
14.97 |
15.75 |
15.68 |
16.12 |
| newstest2016 |
15.59 |
15.08 |
15.07 |
15.87 |
15.50 |
15.85 |
| newstest2017 |
14.33 |
14.01 |
14.11 |
15.15 |
14.87 |
15.62 |
|
| Interacting-head关注 |
dev |
21.76 |
22.43 |
25.90 |
28.77 (+ 9.22) |
- - - - - - |
- - - - - - |
| newstest2014 |
23.30 |
24.28 |
28.07 |
30.62 (+ 9.82) |
- - - - - - |
- - - - - - |
| newstest2015 |
21.08 |
21.98 |
25.28 |
27.86 (+ 9.09) |
- - - - - - |
- - - - - - |
| newstest2016 |
21.99 |
22.95 |
21.98 |
22.57 (+ 3.21) |
- - - - - - |
- - - - - - |
| newstest2017 |
19.81 |
20.55 |
19.72 |
20.64 (+ 2.79) |
- - - - - - |
- - - - - - |
|
| (d) |
|
| 多线程的关注 |
dev |
39.80 |
43.15 |
42.51 |
43.27 |
41.93 |
40.36 |
| newstest2014 |
41.25 |
44.98 |
44.22 |
45.14 |
43.31 |
42.08 |
| newstest2015 |
39.04 |
42.13 |
41.42 |
42.20 |
40.74 |
39.46 |
| newstest2016 |
39.00 |
42.82 |
41.81 |
42.82 |
40.41 |
39.08 |
| newstest2017 |
37.29 |
40.34 |
39.80 |
40.46 |
38.83 |
37.75 |
|
| 多线程(头部大小=注意n) |
dev |
37.50 |
40.03 |
41.71 |
42.17 |
43.36 |
43.13 |
| newstest2014 |
38.90 |
41.91 |
43.11 |
43.66 |
44.95 |
44.47 |
| newstest2015 |
36.23 |
38.51 |
40.99 |
40.83 |
42.40 |
41.67 |
| newstest2016 |
36.87 |
39.16 |
40.81 |
40.99 |
42.41 |
42.19 |
| newstest2017 |
35.11 |
37.52 |
38.95 |
38.94 |
40.59 |
40.38 |
|
| 的头部特写的关注 |
dev |
36.01 |
37.56 |
38.58 |
40.74 |
41.37 |
40.95 |
| newstest2014 |
37.76 |
39.21 |
38.36 |
40.42 |
41.46 |
41.27 |
| newstest2015 |
34.23 |
35.22 |
36.80 |
39.02 |
40.06 |
40.81 |
| newstest2016 |
35.10 |
36.06 |
36.42 |
38.53 |
39.47 |
39.95 |
| newstest2017 |
33.50 |
34.71 |
35.46 |
37.76 |
38.62 |
38.41 |
|
| Interacting-head关注(模型) |
dev |
46.07 |
45.90 |
46.87 |
47.08 (+ 3.81) |
- - - - - - |
- - - - - - |
| newstest2014 |
48.17 |
48.07 |
49.13 |
49.96 (+ 4.82) |
- - - - - - |
- - - - - - |
| newstest2015 |
45.22 |
45.28 |
46.01 |
46.44 (+ 4.24) |
- - - - - - |
- - - - - - |
| newstest2016 |
45.87 |
45.94 |
46.87 |
47.24 (+ 4.42) |
- - - - - - |
- - - - - - |
| newstest2017 |
42.83 |
42.93 |
43.65 |
44.12 (+ 3.66) |
- - - - - - |
- - - - - - |
|
| (e) |
|
| 多线程的关注 |
dev |
1.42 |
1.38 |
1.37 |
1.40 |
1.32 |
1.20 |
| newstest2014 |
1.24 |
1.53 |
1.47 |
1.56 |
1.44 |
1.30 |
| newstest2015 |
1.09 |
1.30 |
1.27 |
1.32 |
1.24 |
1.11 |
| newstest2016 |
1.10 |
1.37 |
1.30 |
1.37 |
1.23 |
1.11 |
| newstest2017 |
1.02 |
1.22 |
1.20 |
1.25 |
1.44 |
1.06 |
|
| 多线程(头部大小=注意n) |
dev |
1.01 |
1.18 |
1.29 |
1.32 |
1.42 |
1.44 |
| newstest2014 |
1.07 |
1.29 |
1.39 |
1.44 |
1.53 |
1.53 |
| newstest2015 |
0.91 |
1.06 |
1.22 |
1.22 |
1.32 |
1.31 |
| newstest2016 |
0.96 |
1.12 |
1.23 |
1.27 |
1.35 |
1.37 |
| newstest2017 |
0.88 |
1.03 |
1.14 |
1.35 |
1.25 |
1.28 |
|
| 的头部特写的关注 |
dev |
0.90 |
0.74 |
0.92 |
0.80 |
0.79 |
|
| newstest2014 |
0.99 |
0.81 |
0.92 |
0.79 |
0.78 |
0.34 |
| newstest2015 |
0.79 |
0.63 |
0.90 |
0.70 |
0.70 |
0.32 |
| newstest2016 |
0.83 |
0.67 |
0.78 |
0.69 |
0.69 |
0.30 |
| newstest2017 |
0.78 |
0.61 |
0.77 |
0.68 |
0.67 |
0.33 |
|
| Interacting-head关注 |
dev |
1.63 |
1.74 (+ 0.36) |
1.64 |
|
|
|
| newstest2014 |
1.79 |
1.97 (+ 0.44) |
1.80 |
|
|
|
| newstest2015 |
1.55 |
1.65 (+ 0.35) |
1.56 |
|
|
|
| newstest2016 |
1.62 |
1.75 (+ 0.38) |
1.64 |
|
|
|
| newstest2017 |
1.43 |
1.55 (+ 0.33) |
1.47 |
|
|
|
|
| (f) |
|
| 多线程的关注 |
dev |
28.64 |
32.37 |
31.74 |
32.12 |
31.41 |
29.07 |
| newstest2014 |
30.37 |
34.58 |
33.72 |
34.47 |
33.02 |
31.14 |
| newstest2015 |
27.67 |
31.15 |
30.47 |
30.93 |
30.13 |
28.24 |
| newstest2016 |
28.10 |
32.11 |
31.22 |
31.89 |
30.07 |
28.22 |
| newstest2017 |
26.20 |
29.89 |
29.24 |
29.54 |
28.53 |
26.80 |
|
| 多线程(头部大小=注意n) |
dev |
26.65 |
28.94 |
30.62 |
31.06 |
32.26 |
32.57 |
| newstest2014 |
28.00 |
31.21 |
32.42 |
32.99 |
34.23 |
34.23 |
| newstest2015 |
25.02 |
27.25 |
29.69 |
29.68 |
31.07 |
31.05 |
| newstest2016 |
25.80 |
28.29 |
30.02 |
30.27 |
31.50 |
31.75 |
| newstest2017 |
24.27 |
26.59 |
28.11 |
28.11 |
29.76 |
30.05 |
|
| 的头部特写的关注 |
dev |
25.19 |
27.48 |
29.82 |
30.23 |
31.50 |
31.47 |
| newstest2014 |
27.12 |
29.28 |
29.94 |
30.20 |
31.75 |
31.82 |
| newstest2015 |
23.60 |
25.37 |
27.98 |
28.57 |
30.17 |
30.55 |
| newstest2016 |
24.51 |
26.37 |
28.09 |
28.39 |
29.68 |
29.64 |
| newstest2017 |
22.79 |
24.89 |
26.90 |
27.37 |
28.81 |
28.88 |
|
| Interacting-head关注 |
dev |
35.34 |
35.09 |
36.50 |
36.51 (+ 4.14) |
- - - - - - |
- - - - - - |
| newstest2014 |
37.77 |
37.59 |
38.75 |
39.83 (+ 5.25) |
- - - - - - |
- - - - - - |
| newstest2015 |
34.32 |
34.25 |
35.17 |
35.48 (+ 3.97) |
- - - - - - |
- - - - - - |
| newstest2016 |
35.28 |
35.39 |
35.68 |
36.94 (4.83) |
- - - - - - |
- - - - - - |
| newstest2017 |
32.40 |
32.58 |
34.81 |
33.89 (+ 4.00) |
- - - - - - |
- - - - - - |
|
|
|
请注意。表演的单位是(a)蓝色,(b)回答,(c)流星,(d) ROUGE_L, (e)酒,YiSi (f)。
|