正向最大匹配分词算法及KNN文本分类算法python实现

大小: 15KB

文件类型: .py

金币: 1

下载: 0 次

发布日期: 2021-01-09
语言: Python
标签: KNN NLP 文本分类 Python

高速下载

资源简介

这份代码是我们专业的一个实验，内容包含了文本分词和文本分类。分别使用了正向最大匹配算法和KNN算法。分词速度平均153295词/秒，189100字符/秒。文本分类使用tf-idf计算单词权重进行特征选择，我测试时选择前100个特征词，根据k的不同取值，分类的准确度平均为75%。

资源截图

小图大图

代码片段和文件信息

‘‘‘
2019/5/12
by zhyjc
## encoding = ‘gb18030‘errors = ‘ignore‘
‘‘‘
import os
import time
import math


class Trie_tree（object）:
        #定义一个字典树的类，用于正向最大匹配时对文本分词
    def __init__（self）:
        self.root = {}
        self.word_end = -1

    def tree_build（self dict_path）:
        f_dic = open（dict_path‘r‘encoding = ‘utf-8‘）    #词典
        strs = f_dic.readlines（）
        for word in strs:
            word = word.strip（‘ \n‘）
            self.insert（word）
        print（‘字典树建立完成！\n‘）
        return self        
    
    def insert（selfword）:
        cur_node = self.root
        for char in word:
            if not char in cur_node:
                cur_node[char] = {}
            cur_node = cur_node[char]
        cur_node[self.word_end] = True
    
    d

上一篇：python3.7 requests模拟新浪微博登录
下一篇：BP神经网络PYthon代码实现

共有条评论

正向最大匹配分词算法及KNN文本分类算法python实现

资源简介

资源截图

代码片段和文件信息

评论

相关资源