2017-02-20 31 views
0

Перемещение с моим проектом python, но я наткнулся на еще одну неприятную фазу.Python, сравнивающий элемент get_text с элементом списка

У меня нет фрагмента кода, который находит последнюю дату публикации с форума, сохраняет его как в временную переменную (требуется использовать ее для проверки против каждой даты), так и общедоступную/глобальную для дальнейшего использования на протяжении всего объем.

Однако метод, который я пытаюсь использовать, - это получить все последние даты дат с форума и сравнить их с уже существующими датами в CSV-файле, чтобы узнать, были ли сделаны какие-либо новые сообщения, а если нет, scrape/mine данные.

Но это точная часть, с которой я борюсь, не может сравнивать мой минный элемент (get_text) с элементом из списка .csv.

Любые идеи были бы восприимчивыми, попробовал несколько методов, оставил его с последним ниже, который все еще не работает.

Код:

#Preparing csv file to be read through to check if dates match 
storedDates = open(os.path.expanduser("PostDates.csv")) 
csv_storedDates = csv.reader(storedDates) 
dateRow = list(csv_storedDates) #Storing all the dates as a "List" object 
listLength = len(dateRow) #Grabbing the csv List length 
startingDate = 0 #Variable for looping through each date for each post. 

lPostDate = lPostDate2 = "" 

#Looping through 6 times (As that's how many pages each forum has, and collecting Next Page Link,Each Thread Title, It's Link 
#.. last post date (To know how recent it is) and assigning next page link to current url, and continuing loop. 
while number < 6: 
    for postDate in soup.find_all(title=re.compile("^Replies:")): 
     tempData = "" 
     tempData += (postDate.get_text("\n", strip=True)[0:10] + "\n") 
     lPostDate += (postDate.get_text("\n", strip=True)[0:10] + "\n") 
     if any(tempData in s for s in dateRow[startingDate]): 
      print("Matched a date" + tempData + "to one from database" + dateRow[startingDate]) 
      startingDate +=1 
     else : 
      startingDate += 1 
      print("Date " + tempData + "was not matched to anything" + str(dateRow[startingDate])) 

Это только часть кода, однако это единственный бит я пытаюсь получить работу в данный момент. Предположим, что в PostDates.csv уже есть информация. Кроме того, это то, как выглядит результат:

Date 02-11-2017 
was not matched to anything['02-11-2017'] 
Date 01-10-2017 
was not matched to anything['01-10-2017'] 
Date 02-12-2017 
was not matched to anything['02-12-2017'] 
Date 10-01-2016 
was not matched to anything['10-01-2016'] 
Date 09-30-2016 
was not matched to anything['09-30-2016'] 
Date 08-10-2016 
was not matched to anything['08-10-2016'] 
Date 10-01-2015 
was not matched to anything['10-01-2015'] 
Date 10-01-2015 
was not matched to anything['10-01-2015'] 
Date 08-29-2015 
was not matched to anything['08-29-2015'] 
Date 03-16-2015 
was not matched to anything['03-16-2015'] 
Date 07-16-2014 
was not matched to anything['07-16-2014'] 
Date 07-13-2014 
was not matched to anything['07-13-2014'] 
Date 02-11-2014 
was not matched to anything['02-11-2014'] 
Date 07-02-2013 
was not matched to anything['07-02-2013'] 
Date 06-28-2013 
was not matched to anything['06-28-2013'] 
Date 04-22-2013 
was not matched to anything['04-22-2013'] 
Date 05-28-2012 
was not matched to anything['05-28-2012'] 
Date 05-25-2012 
was not matched to anything['05-25-2012'] 
Date 05-09-2012 
was not matched to anything['05-09-2012'] 
Date 06-10-2010 
was not matched to anything['06-10-2010'] 
Date 01-18-2010 
was not matched to anything['01-18-2010'] 
Date 01-18-2010 
was not matched to anything['01-18-2010'] 
Date 12-29-2009 
was not matched to anything['12-29-2009'] 
Date 06-08-2009 
was not matched to anything['06-08-2009'] 
Date 02-02-2009 
was not matched to anything['02-02-2009'] 
Date 11-24-2008 
was not matched to anything['11-24-2008'] 
Date 09-02-2008 
was not matched to anything['09-02-2008'] 
Date 08-07-2008 
was not matched to anything['08-07-2008'] 
Date 06-05-2008 
was not matched to anything['06-05-2008'] 
Date 05-22-2008 
was not matched to anything['05-22-2008'] 
Date 04-21-2008 
was not matched to anything['04-21-2008'] 
Date 03-29-2008 
was not matched to anything['03-29-2008'] 
1 
Date 02-11-2017 
was not matched to anything['02-11-2017'] 
Date 01-10-2017 
was not matched to anything['01-10-2017'] 
Date 11-07-2007 
was not matched to anything['11-07-2007'] 
Date 11-07-2007 
was not matched to anything['11-07-2007'] 
Date 09-19-2007 
was not matched to anything['09-19-2007'] 
Date 09-01-2007 
was not matched to anything['09-01-2007'] 
Date 08-31-2007 
was not matched to anything['08-31-2007'] 
Date 08-31-2007 
was not matched to anything['08-31-2007'] 
Date 08-30-2007 
was not matched to anything['08-30-2007'] 
Date 08-24-2007 
was not matched to anything['08-24-2007'] 
Date 08-19-2007 
was not matched to anything['08-19-2007'] 
Date 08-08-2007 
was not matched to anything['08-08-2007'] 
Date 08-03-2007 
was not matched to anything['08-03-2007'] 
Date 07-29-2007 
was not matched to anything['07-29-2007'] 
Date 07-18-2007 
was not matched to anything['07-18-2007'] 
Date 06-26-2007 
was not matched to anything['06-26-2007'] 
Date 06-26-2007 
was not matched to anything['06-26-2007'] 
Date 01-12-2007 
was not matched to anything['01-12-2007'] 
Date 12-05-2006 
was not matched to anything['12-05-2006'] 
Date 11-16-2006 
was not matched to anything['11-16-2006'] 
Date 11-05-2006 
was not matched to anything['11-05-2006'] 
Date 11-05-2006 
was not matched to anything['11-05-2006'] 
Date 11-03-2006 
was not matched to anything['11-03-2006'] 
Date 09-19-2006 
was not matched to anything['09-19-2006'] 
Date 09-19-2006 
was not matched to anything['09-19-2006'] 
Date 09-19-2006 
was not matched to anything['09-19-2006'] 
Date 09-12-2006 
was not matched to anything['09-12-2006'] 
Date 08-17-2006 
was not matched to anything['08-17-2006'] 
Date 08-07-2006 
was not matched to anything['08-07-2006'] 
Date 08-02-2006 
was not matched to anything['08-02-2006'] 
Date 07-16-2006 
was not matched to anything['07-16-2006'] 
Date 07-07-2006 
was not matched to anything['07-07-2006'] 

Я не сделал больше не вставить после страницы Задачи и результаты 2, как это 6 страниц так долго, так что довольно много данных.

И вот как это выглядит, когда он был Царапины до и хранится в CSV-файле (переменная dateRow):

Date, 
02-11-2017 
01-10-2017 
02-12-2017 
10-01-2016 
09-30-2016 
08-10-2016 
10-01-2015 
10-01-2015 
08-29-2015 
03-16-2015 
07-16-2014 
07-13-2014 
02-11-2014 
07-02-2013 
06-28-2013 
04-22-2013 
05-28-2012 
05-25-2012 
05-09-2012 
06-10-2010 
01-18-2010 
01-18-2010 
12-29-2009 
06-08-2009 
02-02-2009 
11-24-2008 
09-02-2008 
08-07-2008 
06-05-2008 
05-22-2008 
04-21-2008 
03-29-2008 
02-11-2017 
01-10-2017 
11-07-2007 
11-07-2007 
09-19-2007 
09-01-2007 
08-31-2007 
08-31-2007 

Любые советы, как обрабатывать его, чтобы он нашел бы даты соответсвующей бы очень оценили, спасибо!

+0

попробуйте напечатать 'type (dateRow [startDate])' –

+0

@Piotr Пробовал, это результат, который я получаю: Norbis

+0

и 'type (dateRow [началоDate] [0])'? –

ответ

1

Просто подведем итог нашему разговору в комментариях: Вы набрали any(tempData in s for s in dateRow[startingDate]), и я подумал, что это должно быть несоответствие типа. Ну, это оказалось. Это связано с тем, что any() определяется следующим образом:

any (iterable) Возвращает True, если любой элемент итерабельности является истинным. Если , итерабельность пуста, верните False. Эквивалент:

def any(iterable): 
    for element in iterable: 
     if element: 
      return True 
    return False 

И ваш код, когда разлучает дает что-то вроде этого:

>>> # Curly brackets make it syntactically correct 
>>> iterable = (tempData in s for s in dateRow[startingDate]) 
>>> any(iterable) 
False 

но это действительно итерацию? См.:

>>> type(iterable) 
<class 'generator'> 

Это не так! Ха! Но это:

>>> type([tempData in s for s in dateRow[startingDate]]) 
<class 'list'> 

Иберируется!

>>> hasattr([tempData in s for s in dateRow[startingDate]], '__iter__') 
True 

Проблема решена, просто не забудьте добавить скобки вокруг генератора, чтобы сделать ее итерируемой!