– 如何设计一种新的语言资源?
– 数据格式转换
– 如何记录数据
– 内容覆盖:方言、说话者、材料
– 文件结构
– 包含语音与字形标注层
– 在多个维度的变化与方言地区和二元音覆盖范围之间取得平衡
– 将原始语音学时间作为录音来捕捉与作为标注来捕捉有明显区分
– 层次结构
– 词典
– 文本
– 探索过程中逐步展现
– 实验研究
– 特定语音的参考预料
– Kappa系数
– windowdiff打分器
– 分词
– 断句
– 分段
– 词性
– 句法结构
– 浅层语义
– 对话与段落
<entry> <headword>whale</headword> <pos>noun</pos> <gloss>any of the larger cetacean mammals having a streamlined body and breathing through a blowhole on the head</gloss> </entry>
<entry> <headword>whale</headword> <pos>noun</pos> <sense> <gloss>any of the larger cetacean mammals having a streamlined body and breathing through a blowhole on the head</gloss> <synset>whale.n.02</synset> </sense> <sense> <gloss>a very large person; impressive in size or qualities</gloss> <synset>giant.n.04</synset> </sense> </entry>
<entry> <headword>whale</headword> <pos>noun</pos> <gloss synset="whale.n.02">any of the larger cetacean mammals having a streamlined body and breathing through a blowhole on the head</gloss> <gloss synset="giant.n.04">a very large person; impressive in size or qualities</gloss> </entry>
– 添加字段
– 验证词汇
上一篇:Python自然语言分析(续4)
下一篇:已经是最后一篇