Я пытаюсь очистить table, но я столкнулся с некоторой проблемой. Я хочу убедиться, что данные под каждым заголовком (например, Cert Issued (30)
) сгруппированы с соответствующим заголовком.Сброс строки информации под заголовком таблицы, когда теги html для строки не вложены под тегом заголовка
Проблема возникает, когда я пытаюсь работать с html ниже.
<tr>
<td>
<table class="EPSBResultGrid" cellspacing="0" rules="cols" border="1" style="border-color:DarkGray;border-collapse:collapse;">
<tbody><tr class="EPSBResultGridHeader">
<th scope="col">Cred</th><th scope="col">Description</th><th scope="col">Effective</th><th scope="col">Expiration</th><th scope="col">Restricted To</th>
</tr><tr class="EPSBResultGridHeader">
<td colspan="9" style="border-width:1px;border-style:solid;font-weight:bold;">Do Not Print (00)</td>
</tr><tr class="EPSBResultGridItem">
<td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_CRED_CODE">RANK1</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_CRED_DESC">Rank I</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EFF_DATE_txtDateMM">07<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateMM" value="07"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EFF_DATE_txtDateDD">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateDD" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EFF_DATE_txtDateYYYY">2000<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EFF_DATE_txtDateYYYY" value="2000"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl01" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl02" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl03" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl04" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl04" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EXP_DATE_txtDateMM">06<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateMM" value="06"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EXP_DATE_txtDateDD">30<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateDD" value="30"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl03$EXP_DATE_txtDateYYYY">2020<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl03_EXP_DATE_txtDateYYYY" value="2020"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl06" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl07" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl08" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl09" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl03_ctl09" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl03_ORG_NAME"></span></td>
</tr><tr class="EPSBResultGridHeader">
<td colspan="9" style="border-width:1px;border-style:solid;font-weight:bold;">Cert Issued (30)</td>
</tr><tr class="EPSBResultGridAlternatingItem">
<td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_CRED_CODE">G20</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_CRED_DESC">Middle School Teaching Field: Social Studies</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EFF_DATE_txtDateMM">07<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateMM" value="07"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EFF_DATE_txtDateDD">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateDD" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EFF_DATE_txtDateYYYY">1995<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EFF_DATE_txtDateYYYY" value="1995"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl01" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl02" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl03" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl04" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl04" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EXP_DATE_txtDateMM">06<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateMM" value="06"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EXP_DATE_txtDateDD">30<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateDD" value="30"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl05$EXP_DATE_txtDateYYYY">2020<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl05_EXP_DATE_txtDateYYYY" value="2020"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl06" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl07" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl08" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl09" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl05_ctl09" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl05_ORG_NAME"></span></td>
</tr><tr class="EPSBResultGridItem">
<td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_CRED_CODE">G71</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_CRED_DESC">Middle School Teaching Field: Mathematics</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EFF_DATE_txtDateMM">07<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateMM" value="07"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EFF_DATE_txtDateDD">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateDD" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EFF_DATE_txtDateYYYY">1995<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EFF_DATE_txtDateYYYY" value="1995"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl01" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl02" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl03" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl04" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl04" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EXP_DATE_txtDateMM">06<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateMM" value="06"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EXP_DATE_txtDateDD">30<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateDD" value="30"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl06$EXP_DATE_txtDateYYYY">2020<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl06_EXP_DATE_txtDateYYYY" value="2020"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl06" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl07" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl08" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl09" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl06_ctl09" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl06_ORG_NAME"></span></td>
</tr><tr class="EPSBResultGridAlternatingItem">
<td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_CRED_CODE">PCS</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_CRED_DESC">Provisional Certificate For Guidance Counselor, Secondary Grades 5-12</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EFF_DATE_txtDateMM">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateMM" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EFF_DATE_txtDateDD">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateDD" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EFF_DATE_txtDateYYYY">2016<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EFF_DATE_txtDateYYYY" value="2016"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl01" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl02" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl03" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl04" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl04" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EXP_DATE_txtDateMM">06<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateMM" value="06"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EXP_DATE_txtDateDD">30<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateDD" value="30"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl07$EXP_DATE_txtDateYYYY">2020<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl07_EXP_DATE_txtDateYYYY" value="2020"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl06" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl07" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl08" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl09" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl07_ctl09" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl07_ORG_NAME"></span></td>
</tr><tr class="EPSBResultGridItem">
<td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_CRED_CODE">PMBF</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_CRED_DESC">Provisional Certificate For Teaching In The Middle Grades 5-8 (And For Other Assignments As Identified By Kentucky Program Of Studies)</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EFF_DATE_txtDateMM">07<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateMM" value="07"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EFF_DATE_txtDateDD">01<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateDD" value="01"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EFF_DATE_txtDateYYYY">2015<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EFF_DATE_txtDateYYYY" value="2015"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl01" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl02" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl03" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl04" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl04" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><nobr><span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EXP_DATE_txtDateMM">06<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateMM" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateMM" value="06"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EXP_DATE_txtDateDD">30<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateDD" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateDD" value="30"></span>-<span class="" id="ctl00$ContentPlaceHolder1$ctl00$ctl08$EXP_DATE_txtDateYYYY">2020<input type="hidden" id="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateYYYY" name="ctl00_ContentPlaceHolder1_ctl00_ctl08_EXP_DATE_txtDateYYYY" value="2020"></span></nobr><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl06" style="color:Red;display:none;">You must enter a day.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl07" style="color:Red;display:none;">You must enter a month.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl08" style="color:Red;display:none;">You must enter a year.</span><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl09" evaluationfunction="cb_verifydate_ctl00_ContentPlaceHolder1_ctl00_ctl08_ctl09" style="color:Red;visibility:hidden;">Invalid Date.</span></td><td><span id="ctl00_ContentPlaceHolder1_ctl00_ctl08_ORG_NAME"></span></td>
</tr>
</tbody></table><br><span><span>Note: Suspended and revoked credentials are shown with red text with a strike through line.</span></span>
</td>
</tr>
инфо (class="EPSBResultGridItem"
& class="EPSBResultGridAlternatingItem"
) под заголовками (class="EPSBResultHeader"
) не являются вложенными под ними, и в результате у меня были проблемы с поиском пути, чтобы убедиться, что информация по каждому заголовку группируется с правильным заголовком.
Это мой код:
count = 0
header = tree.xpath(
'.//table/tr[@class="EPSBResultGridHeader"]')
difference = 10 - len(header)
for i in range(0, difference):
header.append('')
for license_row in header:
count = count + 1
try:
header_data = license_row.xpath(".//text()")
header_data = clean(header_data)
nested_data = license_row.xpath(".//following-sibling::tr//text()")
nested_data = clean(nested_data)
print count, header_data
print count, nested_data
except AttributeError:
header_data = ''
# Append licensure data
if count == 1:
lheader1.append(header_data)
lheader_info1(nested_data)
if count == 2:
lheader2.append(header_data)
lheader_info2(nested_data)
if count == 3:
lheader3.append(header_data)
lheader_info3(nested_data)
if count == 4:
lheader4.append(header_data)
lheader_info4(nested_data)
if count == 5:
lheader5.append(header_data)
lheader_info5(nested_data)
Моя конечная цель иметь такой вывод:
>>>print lheader_info2
['RANK1', 'Rank I', '07-01-2018', '06-30-2021']
>>>print lheader_info3
['G20', 'Middle School Teaching Field: Social Studies----', 'G30', 'Middle School Teaching Field: English And Communications----', 'ILE2', 'Professional Certificate For Instructional Leadership -- Early Elementary School Principal, Grades K-4; Level II', '07-01-2017', '06-30-2021', 'ILM2', 'Professional Certificate For Instructional Leadership--Middle Grade School Principal, Grades 5-8; Level II', '07-01-2017', '06-30-2021', 'ILV2', 'Professional Certificate For Instructional Leadership--Supervisor Of Instruction, Grades K-12; Level II', '07-01-2018', '06-30-2021', 'PMBF', 'Provisional Certificate For Teaching In The Middle Grades 5-8 (And For Other Assignments As Identified By Kentucky Program Of Studies)', '07-01-2016', '06-30-2021']
Я использую lxml
, но я также использовал BeautifulSoup
если это кажется лучший способ сделать это.
Я думаю, что вы хотите 'данные [header_name] .extend ([td.get_text (strip = True) для td в строке.find_all ("td")]) ' –
@PadraicCunningham для всех ячеек, да, просто хотел дать образец того, как «группировать» строки в основном. Благодаря! – alecxe
@PadraicCunningham Я не решался назвать ни один из ваших ответов в качестве принятого ответа, так как оба они превосходны, но я закончил использование alecxe's. Я узнал больше, пытаясь понять оба ваших ответа, чем в течение недели. Еще раз спасибо! – otteheng