Перемещение с моим проектом python, но я наткнулся на еще одну неприятную фазу.Python, сравнивающий элемент get_text с элементом списка
У меня нет фрагмента кода, который находит последнюю дату публикации с форума, сохраняет его как в временную переменную (требуется использовать ее для проверки против каждой даты), так и общедоступную/глобальную для дальнейшего использования на протяжении всего объем.
Однако метод, который я пытаюсь использовать, - это получить все последние даты дат с форума и сравнить их с уже существующими датами в CSV-файле, чтобы узнать, были ли сделаны какие-либо новые сообщения, а если нет, scrape/mine данные.
Но это точная часть, с которой я борюсь, не может сравнивать мой минный элемент (get_text) с элементом из списка .csv.
Любые идеи были бы восприимчивыми, попробовал несколько методов, оставил его с последним ниже, который все еще не работает.
Код:
#Preparing csv file to be read through to check if dates match
storedDates = open(os.path.expanduser("PostDates.csv"))
csv_storedDates = csv.reader(storedDates)
dateRow = list(csv_storedDates) #Storing all the dates as a "List" object
listLength = len(dateRow) #Grabbing the csv List length
startingDate = 0 #Variable for looping through each date for each post.
lPostDate = lPostDate2 = ""
#Looping through 6 times (As that's how many pages each forum has, and collecting Next Page Link,Each Thread Title, It's Link
#.. last post date (To know how recent it is) and assigning next page link to current url, and continuing loop.
while number < 6:
for postDate in soup.find_all(title=re.compile("^Replies:")):
tempData = ""
tempData += (postDate.get_text("\n", strip=True)[0:10] + "\n")
lPostDate += (postDate.get_text("\n", strip=True)[0:10] + "\n")
if any(tempData in s for s in dateRow[startingDate]):
print("Matched a date" + tempData + "to one from database" + dateRow[startingDate])
startingDate +=1
else :
startingDate += 1
print("Date " + tempData + "was not matched to anything" + str(dateRow[startingDate]))
Это только часть кода, однако это единственный бит я пытаюсь получить работу в данный момент. Предположим, что в PostDates.csv уже есть информация. Кроме того, это то, как выглядит результат:
Date 02-11-2017
was not matched to anything['02-11-2017']
Date 01-10-2017
was not matched to anything['01-10-2017']
Date 02-12-2017
was not matched to anything['02-12-2017']
Date 10-01-2016
was not matched to anything['10-01-2016']
Date 09-30-2016
was not matched to anything['09-30-2016']
Date 08-10-2016
was not matched to anything['08-10-2016']
Date 10-01-2015
was not matched to anything['10-01-2015']
Date 10-01-2015
was not matched to anything['10-01-2015']
Date 08-29-2015
was not matched to anything['08-29-2015']
Date 03-16-2015
was not matched to anything['03-16-2015']
Date 07-16-2014
was not matched to anything['07-16-2014']
Date 07-13-2014
was not matched to anything['07-13-2014']
Date 02-11-2014
was not matched to anything['02-11-2014']
Date 07-02-2013
was not matched to anything['07-02-2013']
Date 06-28-2013
was not matched to anything['06-28-2013']
Date 04-22-2013
was not matched to anything['04-22-2013']
Date 05-28-2012
was not matched to anything['05-28-2012']
Date 05-25-2012
was not matched to anything['05-25-2012']
Date 05-09-2012
was not matched to anything['05-09-2012']
Date 06-10-2010
was not matched to anything['06-10-2010']
Date 01-18-2010
was not matched to anything['01-18-2010']
Date 01-18-2010
was not matched to anything['01-18-2010']
Date 12-29-2009
was not matched to anything['12-29-2009']
Date 06-08-2009
was not matched to anything['06-08-2009']
Date 02-02-2009
was not matched to anything['02-02-2009']
Date 11-24-2008
was not matched to anything['11-24-2008']
Date 09-02-2008
was not matched to anything['09-02-2008']
Date 08-07-2008
was not matched to anything['08-07-2008']
Date 06-05-2008
was not matched to anything['06-05-2008']
Date 05-22-2008
was not matched to anything['05-22-2008']
Date 04-21-2008
was not matched to anything['04-21-2008']
Date 03-29-2008
was not matched to anything['03-29-2008']
1
Date 02-11-2017
was not matched to anything['02-11-2017']
Date 01-10-2017
was not matched to anything['01-10-2017']
Date 11-07-2007
was not matched to anything['11-07-2007']
Date 11-07-2007
was not matched to anything['11-07-2007']
Date 09-19-2007
was not matched to anything['09-19-2007']
Date 09-01-2007
was not matched to anything['09-01-2007']
Date 08-31-2007
was not matched to anything['08-31-2007']
Date 08-31-2007
was not matched to anything['08-31-2007']
Date 08-30-2007
was not matched to anything['08-30-2007']
Date 08-24-2007
was not matched to anything['08-24-2007']
Date 08-19-2007
was not matched to anything['08-19-2007']
Date 08-08-2007
was not matched to anything['08-08-2007']
Date 08-03-2007
was not matched to anything['08-03-2007']
Date 07-29-2007
was not matched to anything['07-29-2007']
Date 07-18-2007
was not matched to anything['07-18-2007']
Date 06-26-2007
was not matched to anything['06-26-2007']
Date 06-26-2007
was not matched to anything['06-26-2007']
Date 01-12-2007
was not matched to anything['01-12-2007']
Date 12-05-2006
was not matched to anything['12-05-2006']
Date 11-16-2006
was not matched to anything['11-16-2006']
Date 11-05-2006
was not matched to anything['11-05-2006']
Date 11-05-2006
was not matched to anything['11-05-2006']
Date 11-03-2006
was not matched to anything['11-03-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-12-2006
was not matched to anything['09-12-2006']
Date 08-17-2006
was not matched to anything['08-17-2006']
Date 08-07-2006
was not matched to anything['08-07-2006']
Date 08-02-2006
was not matched to anything['08-02-2006']
Date 07-16-2006
was not matched to anything['07-16-2006']
Date 07-07-2006
was not matched to anything['07-07-2006']
Я не сделал больше не вставить после страницы Задачи и результаты 2, как это 6 страниц так долго, так что довольно много данных.
И вот как это выглядит, когда он был Царапины до и хранится в CSV-файле (переменная dateRow):
Date,
02-11-2017
01-10-2017
02-12-2017
10-01-2016
09-30-2016
08-10-2016
10-01-2015
10-01-2015
08-29-2015
03-16-2015
07-16-2014
07-13-2014
02-11-2014
07-02-2013
06-28-2013
04-22-2013
05-28-2012
05-25-2012
05-09-2012
06-10-2010
01-18-2010
01-18-2010
12-29-2009
06-08-2009
02-02-2009
11-24-2008
09-02-2008
08-07-2008
06-05-2008
05-22-2008
04-21-2008
03-29-2008
02-11-2017
01-10-2017
11-07-2007
11-07-2007
09-19-2007
09-01-2007
08-31-2007
08-31-2007
Любые советы, как обрабатывать его, чтобы он нашел бы даты соответсвующей бы очень оценили, спасибо!
попробуйте напечатать 'type (dateRow [startDate])' –
@Piotr Пробовал, это результат, который я получаю: –
Norbis
и 'type (dateRow [началоDate] [0])'? –