I think you have over-fitting over your training data, you need to add some noise to your network.
1) you can add dataset augmentation before processing(skew, rotation, scale, color change, etc)
2) you can add regularization like l1,l2 weights regularization, dropout.
3) you can stop training before evaluation on test set starts to degrade.
4) also Batch Normalization also helps cause also adding some percent of noise into the network.