python - Converting untagged corpora to tagged (NLTK) -
i have plaintext corpora, want tag , save, can use further. what's best way this?
i have tagger made, can't figure out way change corpora isn't messy
take @ other tagged corpora, brown, output examples. give idea of tagged corpus should like. next, load corpus (with plaintextcorpusreader
) , iterate on sentences, tagging each sentence. write each tagged sentence file making string tagged sentence, in ' '.join([tuple2str(t) t in tagged_sent])
(after from nltk.tag.util import tuple2str
). , it's ok if code "messy" long job correctly. you're not looking elegant algorithm here, you're running specific script create custom corpus.
Comments
Post a Comment