Python词频统计如何实现长尾词的识别与统计？

2026-04-03 01:3511阅读0评论SEO资讯

内容介绍
文章标签
相关推荐

本文共计678个文字，预计阅读时间需要3分钟。

篇首语：编程笔记

篇首语：本文由编程笔记#自由互联小编为大家整理，主要介绍了Python词频统计相关的知识，希望对你有一定的参考价值。利用Python做一个词频统计Gi

篇首语：本文由编程笔记#自由互联小编为大家整理，主要介绍了Python 词频统计相关的知识，希望对你有一定的参考价值。

利用Python做一个词频统计

GitHub地址：FightingBob

词频统计

　　对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码实现
1 import string 2 from os import path 3 with open(‘瓦尔登湖(英文版).txt‘,‘rb‘) as text1: 4 words = [word.strip(string.punctuation).lower() for word in str(text1.read()).split()] 5 words_index = set(words) 6 count_dict = {index:words.count(index) for index in words_index} 7 with open(path.dirname(__file__) + ‘/file1.txt‘,‘a+‘) as text2: 8 text2.writelines(‘以下是词频统计的结果：‘ + ‘‘) 9 for word in sorted(count_dict,key=lambda x:count_dict[x],reverse=True):10 text2.writelines(‘{}--{} times‘.format(word,count_dict[word]) + ‘‘)11 text1.close()12 text2.close()

代码解析　　
- 获取文件，以二进制格式打开文件，用于读取内容
  - 　　1 with open(‘瓦尔登湖(英文版).txt‘,‘rb‘) as text1:
- 获取单词列表
  - 先读取内容
    - 　　cOntent= text1.read()
  - 再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）
    - 　　words = content.split()
  - 单词大写改小写，去掉单词前后符号
    - 　　word,strip(string.punctuation).lower()
  - 去除重复的单词
    - 　　words_index = set(words)
- 设置单词：单词次数的字典　　　　　　
- 写入词频统计
  - 先创建文件，获取当前目录，并以追加写入的方式写入
    - 　　with open(path.dirname(__file__) + ‘/file1.txt‘,‘a+‘) as text2:
  - 换行写入
    - 　　text2.writelines(‘以下是词频统计的结果：‘ + ‘‘)
  - 对单词进行排序，根据次数从大到小
    - 　　sorted(count_dict,key=lambda x:count_dict[x],reverse=True)
  - 换行写入词频
    - 　　text2.writelines(‘{}--{} times‘.format(word,count_dict[word]) + ‘‘)
  - 关闭资源
    - 　　text1.close()
    - 　　text2.close()

GitHub地址：FightingBob　　　　　　　　　　

标签：开发笔记 Python 词频统计

本文共计678个文字，预计阅读时间需要3分钟。

篇首语：编程笔记

篇首语：本文由编程笔记#自由互联小编为大家整理，主要介绍了Python词频统计相关的知识，希望对你有一定的参考价值。利用Python做一个词频统计Gi

篇首语：本文由编程笔记#自由互联小编为大家整理，主要介绍了Python 词频统计相关的知识，希望对你有一定的参考价值。

利用Python做一个词频统计

GitHub地址：FightingBob

词频统计

　　对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码实现
1 import string 2 from os import path 3 with open(‘瓦尔登湖(英文版).txt‘,‘rb‘) as text1: 4 words = [word.strip(string.punctuation).lower() for word in str(text1.read()).split()] 5 words_index = set(words) 6 count_dict = {index:words.count(index) for index in words_index} 7 with open(path.dirname(__file__) + ‘/file1.txt‘,‘a+‘) as text2: 8 text2.writelines(‘以下是词频统计的结果：‘ + ‘‘) 9 for word in sorted(count_dict,key=lambda x:count_dict[x],reverse=True):10 text2.writelines(‘{}--{} times‘.format(word,count_dict[word]) + ‘‘)11 text1.close()12 text2.close()

代码解析　　
- 获取文件，以二进制格式打开文件，用于读取内容
  - 　　1 with open(‘瓦尔登湖(英文版).txt‘,‘rb‘) as text1:
- 获取单词列表
  - 先读取内容
    - 　　cOntent= text1.read()
  - 再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）
    - 　　words = content.split()
  - 单词大写改小写，去掉单词前后符号
    - 　　word,strip(string.punctuation).lower()
  - 去除重复的单词
    - 　　words_index = set(words)
- 设置单词：单词次数的字典　　　　　　
- 写入词频统计
  - 先创建文件，获取当前目录，并以追加写入的方式写入
    - 　　with open(path.dirname(__file__) + ‘/file1.txt‘,‘a+‘) as text2:
  - 换行写入
    - 　　text2.writelines(‘以下是词频统计的结果：‘ + ‘‘)
  - 对单词进行排序，根据次数从大到小
    - 　　sorted(count_dict,key=lambda x:count_dict[x],reverse=True)
  - 换行写入词频
    - 　　text2.writelines(‘{}--{} times‘.format(word,count_dict[word]) + ‘‘)
  - 关闭资源
    - 　　text1.close()
    - 　　text2.close()

GitHub地址：FightingBob　　　　　　　　　　

标签：开发笔记 Python 词频统计

词频统计

对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码实现

代码解析

获取文件，以二进制格式打开文件，用于读取内容

获取单词列表

先读取内容

再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）

单词大写改小写，去掉单词前后符号

去除重复的单词

设置单词：单词次数的字典

写入词频统计

先创建文件，获取当前目录，并以追加写入的方式写入

换行写入

对单词进行排序，根据次数从大到小

换行写入词频

关闭资源

相关推荐

词频统计

对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码实现

代码解析

获取文件，以二进制格式打开文件，用于读取内容

获取单词列表

先读取内容

再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）

单词大写改小写，去掉单词前后符号

去除重复的单词

设置单词：单词次数的字典

写入词频统计

先创建文件，获取当前目录，并以追加写入的方式写入

换行写入

对单词进行排序，根据次数从大到小

换行写入词频

关闭资源

相关推荐

　　对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码解析　　

设置单词：单词次数的字典　　　　　　

　　对纯英语的文本文件的英文单词出现的次数进行统计，并记录起来

代码解析　　

设置单词：单词次数的字典