Python pandas 数据排序有哪些高效技巧？

2026-04-30 15:131阅读0评论SEO基础

内容介绍
文章标签
相关推荐

本文共计1500个文字，预计阅读时间需要6分钟。

前言：pandas中排序的几种常用方法，主要包括sort_index和sort_values。

基本数据：pythonimport pandas as pdimport numpy as np

data={ 'brand': ['Python', 'C', 'C++', 'C#', 'Java', 'JavaScript'], 'price': [100, 200, 150, 120, 180, 160]}df=pd.DataFrame(data)

前言：

pandas中排序的几种常用方法，主要包括sort_index和sort_values。

基础数据：

import pandas as pd import numpy as np data = { 'brand':['Python', 'C', 'C++', 'C#', 'Java'], 'B':[4,6,8,12,10], 'A':[10,2,5,20,16], 'D':[6,18,14,6,12], 'years':[4,1,1,30,30], 'C':[8,12,18,8,2] } index = [9,3,4,5,2] df = pd.DataFrame(data=data, index = index) print("df数据：\n", df, '\n')

out：

df数据：
A B C D brand years
9 10 4 8 6 Python 4
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

按行索引排序：

print("按行索引排序:\n", df.sort_index(), '\n')

out：

按行索引排序:
A B C D brand years
2 16 10 2 12 Java 30
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
9 10 4 8 6 Python 4

通过设置参数ascending可以设置升序或者降序排序，默认情况下ascending=True，为升序排序。

设置ascending=False时，为降序排序。

print("按行索引降序排序:\n", df.sort_index(ascending=False), '\n')

out:

按行索引降序排序:
A B C D brand years
9 10 4 8 6 Python 4
5 20 12 8 6 C# 30
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1
2 16 10 2 12 Java 30

按列的名称排序：

设置参数axis=1实现按列的名称排序：

print("按列名称排序:\n", df.sort_index(axis=1), '\n')

out：

按列名称排序:
A B C D brand years
9 10 4 8 6 Python 4
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

同样，也可以设置ascending参数：

print("按列名称排序:\n", df.sort_index(axis=1, ascending=False), '\n')

out：

按列名称排序:
years brand D C B A
9 4 Python 6 8 4 10
3 1 C 18 12 6 2
4 1 C++ 14 18 8 5
5 30 C# 6 8 12 20
2 30 Java 12 2 10 16

按数值排序：

sort_values()是pandas中按数值排序的函数：

1、按单个列的值排序

sort_values()中设置单个列的列名，可以对单个列进行排序，通过设置ascending可以设置升序或者降序。

print("按列名称A排序:\n", df.sort_values('A'), '\n')

out：

按列名称排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

设置ascending=False进行降序排序：

print("按列名称A降序排序:\n", df.sort_values('A', ascending=False), '\n')

out：

按列名称A降序排序:
A B C D brand years
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30
9 10 4 8 6 Python 4
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1

按多个列的值排序：

先按year列的数据进行升序排序，year列相同的再看B列进行升序排序

print("按多个列排序:\n", df.sort_values(['years', 'B']), '\n')

out：

按多个列排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

也可以分别设置列的升序、降序来排序：

years列为升序，B列为降序。

print("按多个列排序:\n", df.sort_values(['years', 'B'], ascending=[True, False]), '\n')

out：

按多个列排序:
A B C D brand years
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1
9 10 4 8 6 Python 4
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

inplace使用：

inplace=True：不创建新的对象，直接对原始对象进行修改；默认是False，即创建新的对象进行修改，原对象不变，和深复制和浅复制有些类似。

df.sort_values('A', inplace=True) print("按A列排序:\n", df, '\n')

out:

按A列排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

缺失值：

含有nan值的数据排序：

data = { 'brand':['Python', 'C', 'C++', 'C#', 'Java'], 'B':[4,6,8,np.nan,10], 'A':[10,2,5,20,16], 'D':[6,18,14,6,12], 'years':[4,1,1,30,30], 'C':[8,12,18,8,2] } index = [9,3,4,5,2] df = pd.DataFrame(data=data, index = index) print("df数据：\n", df, '\n')

out:

df数据：
A B C D brand years
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
5 20 NaN 8 6 C# 30
2 16 10.0 2 12 Java 30

B列含有nan值，对B列进行排序，缺失值排在最前面：

print("按B列排序:\n", df.sort_values('B', na_position='first'), '\n')

按B列排序:
A B C D brand years
5 20 NaN 8 6 C# 30
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
2 16 10.0 2 12 Java 30

包含缺失值，缺失值排在最后：

print("按B列排序:\n", df.sort_values('B', na_position='last'), '\n')

out：

按B列排序:
A B C D brand years
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
2 16 10.0 2 12 Java 30
5 20 NaN 8 6 C# 30

到此这篇关于pythonpandas数据排序的几种常用方法的文章就介绍到这了,更多相关pythonpandas内容请搜索自由互联以前的文章或继续浏览下面的相关文章希望大家以后多多支持自由互联！

标签：几种常用方法

本文共计1500个文字，预计阅读时间需要6分钟。

前言：pandas中排序的几种常用方法，主要包括sort_index和sort_values。

基本数据：pythonimport pandas as pdimport numpy as np

data={ 'brand': ['Python', 'C', 'C++', 'C#', 'Java', 'JavaScript'], 'price': [100, 200, 150, 120, 180, 160]}df=pd.DataFrame(data)

前言：

pandas中排序的几种常用方法，主要包括sort_index和sort_values。

基础数据：

out：

df数据：
A B C D brand years
9 10 4 8 6 Python 4
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

按行索引排序：

print("按行索引排序:\n", df.sort_index(), '\n')

out：

按行索引排序:
A B C D brand years
2 16 10 2 12 Java 30
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
9 10 4 8 6 Python 4

通过设置参数ascending可以设置升序或者降序排序，默认情况下ascending=True，为升序排序。

设置ascending=False时，为降序排序。

print("按行索引降序排序:\n", df.sort_index(ascending=False), '\n')

out:

按行索引降序排序:
A B C D brand years
9 10 4 8 6 Python 4
5 20 12 8 6 C# 30
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1
2 16 10 2 12 Java 30

按列的名称排序：

设置参数axis=1实现按列的名称排序：

print("按列名称排序:\n", df.sort_index(axis=1), '\n')

out：

按列名称排序:
A B C D brand years
9 10 4 8 6 Python 4
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

同样，也可以设置ascending参数：

print("按列名称排序:\n", df.sort_index(axis=1, ascending=False), '\n')

out：

按列名称排序:
years brand D C B A
9 4 Python 6 8 4 10
3 1 C 18 12 6 2
4 1 C++ 14 18 8 5
5 30 C# 6 8 12 20
2 30 Java 12 2 10 16

按数值排序：

sort_values()是pandas中按数值排序的函数：

1、按单个列的值排序

sort_values()中设置单个列的列名，可以对单个列进行排序，通过设置ascending可以设置升序或者降序。

print("按列名称A排序:\n", df.sort_values('A'), '\n')

out：

按列名称排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

设置ascending=False进行降序排序：

print("按列名称A降序排序:\n", df.sort_values('A', ascending=False), '\n')

out：

按列名称A降序排序:
A B C D brand years
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30
9 10 4 8 6 Python 4
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1

按多个列的值排序：

先按year列的数据进行升序排序，year列相同的再看B列进行升序排序

print("按多个列排序:\n", df.sort_values(['years', 'B']), '\n')

out：

按多个列排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

也可以分别设置列的升序、降序来排序：

years列为升序，B列为降序。

print("按多个列排序:\n", df.sort_values(['years', 'B'], ascending=[True, False]), '\n')

out：

按多个列排序:
A B C D brand years
4 5 8 18 14 C++ 1
3 2 6 12 18 C 1
9 10 4 8 6 Python 4
5 20 12 8 6 C# 30
2 16 10 2 12 Java 30

inplace使用：

inplace=True：不创建新的对象，直接对原始对象进行修改；默认是False，即创建新的对象进行修改，原对象不变，和深复制和浅复制有些类似。

df.sort_values('A', inplace=True) print("按A列排序:\n", df, '\n')

out:

按A列排序:
A B C D brand years
3 2 6 12 18 C 1
4 5 8 18 14 C++ 1
9 10 4 8 6 Python 4
2 16 10 2 12 Java 30
5 20 12 8 6 C# 30

缺失值：

含有nan值的数据排序：

out:

df数据：
A B C D brand years
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
5 20 NaN 8 6 C# 30
2 16 10.0 2 12 Java 30

B列含有nan值，对B列进行排序，缺失值排在最前面：

print("按B列排序:\n", df.sort_values('B', na_position='first'), '\n')

按B列排序:
A B C D brand years
5 20 NaN 8 6 C# 30
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
2 16 10.0 2 12 Java 30

包含缺失值，缺失值排在最后：

print("按B列排序:\n", df.sort_values('B', na_position='last'), '\n')

out：

按B列排序:
A B C D brand years
9 10 4.0 8 6 Python 4
3 2 6.0 12 18 C 1
4 5 8.0 18 14 C++ 1
2 16 10.0 2 12 Java 30
5 20 NaN 8 6 C# 30

标签：几种常用方法

相关推荐

相关推荐