Scrapy Pipeline中process_item函数为何未在Python项目中触发?

2026-06-10 23:294阅读0评论SEO资源
  • 内容介绍
  • 文章标签
  • 相关推荐

本文共计498个文字,预计阅读时间需要2分钟。

Scrapy Pipeline中process_item函数为何未在Python项目中触发?

我有一段非常简单的代码,如下所示。捕获没问题,我能看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头有print语句。

我有一个非常简单的代码,如下所示。抓取没问题,我可以看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头的print语句从未执行过。在

蜘蛛:comosham.py在

Scrapy Pipeline中process_item函数为何未在Python项目中触发?

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.www.comoshambhala.com/singapore/classes/schedules",
"www.comoshambhala.com/singapore/about/location-contact",
"www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
"www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
]

def parse(self, response):
category = (response.url)[39:44]
print 'in parse'
if category == 'class':
pass
"""self.gen_req_class(response)"""
elif category == 'about':
print 'about to call parse_location'
self.parse_location(response)
elif category == 'rates':
pass
"""self.parse_rates(response)"""
else:
print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


def parse_location(self, response):
print 'in parse_location'
item = ComoShamLocation()
item['category'] = 'location'
loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
item['pin'] = (loc[5])[11:18]
item['phone'] = (loc[9])[6:20]
item['fax'] = (loc[10])[6:20]
item['email'] = loc[12]
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
return item

项目文件:

^{pr2}$
管道文件:

class ComoShamPipeline(object):
def __init__(self):
self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])


def process_item(self,item,spider):
print 'processing item now'
if item['category'] == 'location':
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
else:
pass

最终发现主要是以下两个原因
1将以下行添加到py设置! ITEM_PIPELINES = {‘[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]’: 300} 2当你的蜘蛛跑的时候交出物品! yield my_item
我的是第二个原因,设置完后立刻就好了
​下载scrapy开发手册

本文共计498个文字,预计阅读时间需要2分钟。

Scrapy Pipeline中process_item函数为何未在Python项目中触发?

我有一段非常简单的代码,如下所示。捕获没问题,我能看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头有print语句。

我有一个非常简单的代码,如下所示。抓取没问题,我可以看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头的print语句从未执行过。在

蜘蛛:comosham.py在

Scrapy Pipeline中process_item函数为何未在Python项目中触发?

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.www.comoshambhala.com/singapore/classes/schedules",
"www.comoshambhala.com/singapore/about/location-contact",
"www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
"www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
]

def parse(self, response):
category = (response.url)[39:44]
print 'in parse'
if category == 'class':
pass
"""self.gen_req_class(response)"""
elif category == 'about':
print 'about to call parse_location'
self.parse_location(response)
elif category == 'rates':
pass
"""self.parse_rates(response)"""
else:
print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


def parse_location(self, response):
print 'in parse_location'
item = ComoShamLocation()
item['category'] = 'location'
loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
item['pin'] = (loc[5])[11:18]
item['phone'] = (loc[9])[6:20]
item['fax'] = (loc[10])[6:20]
item['email'] = loc[12]
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
return item

项目文件:

^{pr2}$
管道文件:

class ComoShamPipeline(object):
def __init__(self):
self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])


def process_item(self,item,spider):
print 'processing item now'
if item['category'] == 'location':
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
else:
pass

最终发现主要是以下两个原因
1将以下行添加到py设置! ITEM_PIPELINES = {‘[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]’: 300} 2当你的蜘蛛跑的时候交出物品! yield my_item
我的是第二个原因,设置完后立刻就好了
​下载scrapy开发手册