如何使用Python将两个数据集进行关联并拼接?
- 内容介绍
- 文章标签
- 相关推荐
本文共计195个文字,预计阅读时间需要1分钟。
使用`pandas`的`merge`函数连接两个数据集,并根据`user_id`和`coupon_id`字段进行左连接。以下为简化的代码:
pythont3=pd.merge(t3, t2, on=['user_id', 'coupon_id'], how='left')t3=pd.merge(t3, t2, on=['user_id', 'coupon_id'], how='left')
python_两个数据集拼接&join操作
t3 = dataset3[['user_id','coupon_id','date_received']]t3 = pd.merge(t3,t2,on=['user_id','coupon_id'],how='left')
t3['this_month_user_receive_same_coupon_lastone'] = t3.max_date_received - t3.date_received
t3['this_month_user_receive_same_coupon_firstone'] = t3.date_received - t3.min_date_received
#根据多个字段就进行merge
other_feature3 = pd.merge(t1,t,on='user_id')
other_feature3 = pd.merge(other_feature3,t3,on=['user_id','coupon_id'])
other_feature3 = pd.merge(other_feature3,t4,on=['user_id','date_received'])
other_feature3 = pd.merge(other_feature3,t5,on=['user_id','coupon_id','date_received'])
other_feature3 = pd.merge(other_feature3,t7,on=['user_id','coupon_id','date_received'])
other_feature3.to_csv('data/other_feature3.csv',index=None)
#拼接数据集
#两个数据框合并为一个
df_train_stmt = pd.concat([df_train_stmt,df_train_stmt_test],axis = 0)
本文共计195个文字,预计阅读时间需要1分钟。
使用`pandas`的`merge`函数连接两个数据集,并根据`user_id`和`coupon_id`字段进行左连接。以下为简化的代码:
pythont3=pd.merge(t3, t2, on=['user_id', 'coupon_id'], how='left')t3=pd.merge(t3, t2, on=['user_id', 'coupon_id'], how='left')
python_两个数据集拼接&join操作
t3 = dataset3[['user_id','coupon_id','date_received']]t3 = pd.merge(t3,t2,on=['user_id','coupon_id'],how='left')
t3['this_month_user_receive_same_coupon_lastone'] = t3.max_date_received - t3.date_received
t3['this_month_user_receive_same_coupon_firstone'] = t3.date_received - t3.min_date_received
#根据多个字段就进行merge
other_feature3 = pd.merge(t1,t,on='user_id')
other_feature3 = pd.merge(other_feature3,t3,on=['user_id','coupon_id'])
other_feature3 = pd.merge(other_feature3,t4,on=['user_id','date_received'])
other_feature3 = pd.merge(other_feature3,t5,on=['user_id','coupon_id','date_received'])
other_feature3 = pd.merge(other_feature3,t7,on=['user_id','coupon_id','date_received'])
other_feature3.to_csv('data/other_feature3.csv',index=None)
#拼接数据集
#两个数据框合并为一个
df_train_stmt = pd.concat([df_train_stmt,df_train_stmt_test],axis = 0)

