7/29/2023 0 Comments Quotefix macThe reason for this is that UDPipe (and possibly also the pragmatic segmenter) cannot deal with XML-style annotations. The only aspect that may appear strange is the fact that we create two versions, one with and one without timestamp and then merge them at the end. We still have to decide how to cope with such cases.Īs you see, most of this is relatively straightforward. For instance UDPipe outputs the word "am" but also the separate forms "an dem" when parsing German text. In this step language-specific adjustments may be necessary for languages that feature contractions, cliticization, or similar phenomena. Even if some are not used for a specific language they may be used in another language. Ideally, we keep all columns found in UDPipe's output. You can find a Russian file attached to this website. The old version also takes in output from the separate lemmatizer, but this is no longer necessary with lemma information already present in UDPipe's output. You can modify Prannoy's script parser.py to work with UDPipe's ouptput instead of SyntaxNet's output (but these are very similar).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |