Skip to content

Grammar Error Correction Based on Tensor2Tensor

Notifications You must be signed in to change notification settings

SannyZhou/GEC-t2t

Folders and files

NameName
Last commit message
Last commit date

Latest commit

04e0245 · Aug 23, 2019

History

6 Commits
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 23, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019
Aug 22, 2019

Repository files navigation

GEC-t2t

Grammar Error Correction Based on Tensor2Tensor
A temp project of Deecamp.

Train

The overall training procedure includes pretrain and finetune.

  1. Subword-nmt
    The input of this model should be a BPE format.

  2. Pretrain
    In order to improve performance of this seq2seq task, the model needs to pretrain based on a large native corpus. The source sentences are generated by denoising on native corpus. The denoising method refers to https://github.com/zhawe01/fairseq-gec. The training step of pretrain depends on the size of native corpus and batchsize parameter, which should include one epoch of native corpus.
    Tips: The batchsize refers to the number of tokens.

  3. Finetune
    After pretrain, the model should be finetuned over gec corpus, such as CONLL-14.
    The training step depends on the loss and performance on your task.

Test

We use the tensorflow-serving on docker.

Reference

Subword-nmt Tensor2Tensor

About

Grammar Error Correction Based on Tensor2Tensor

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published