Я пытаюсь разобрать txt-файл из EDGAR, но с разными типами записей, есть разные форматы отчетов, даже если они все txt-файлы. У меня нет проблем с использованием BeautifulSoup для анализа отчетов XML, однако я наткнулся на этот тип отчета:Разбор текстового файла с пользовательскими тегами
<SEC-DOCUMENT>0001047469-13-001017.txt : 20130214
<SEC-HEADER>0001047469-13-001017.hdr.sgml : 20130214
<ACCEPTANCE-DATETIME>20130214060031
ACCESSION NUMBER: 0001047469-13-001017
CONFORMED SUBMISSION TYPE: 13F-HR
PUBLIC DOCUMENT COUNT: 1
CONFORMED PERIOD OF REPORT: 20121231
FILED AS OF DATE: 20130214
DATE AS OF CHANGE: 20130214
EFFECTIVENESS DATE: 20130214
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME: BILL & MELINDA GATES FOUNDATION TRUST
CENTRAL INDEX KEY: 0001166559
IRS NUMBER: 911663695
STATE OF INCORPORATION: WA
FISCAL YEAR END: 1231
FILING VALUES:
FORM TYPE: 13F-HR
SEC ACT: 1934 Act
SEC FILE NUMBER: 028-10098
FILM NUMBER: 13605999
BUSINESS ADDRESS:
STREET 1: 2365 CARILLON POINT
CITY: KIRKLAND
STATE: WA
ZIP: 98033
BUSINESS PHONE: 4258897900
MAIL ADDRESS:
STREET 1: 2365 CARILLON POINT
CITY: KIRKLAND
STATE: WA
ZIP: 98033
FORMER COMPANY:
FORMER CONFORMED NAME: GATES BILL & MELINDA FOUNDATION
DATE OF NAME CHANGE: 20020205
</SEC-HEADER>
<DOCUMENT>
<TYPE>13F-HR
<SEQUENCE>1
<FILENAME>a2212666z13f-hr.txt
<DESCRIPTION>13F-HR
<TEXT>
<Page>
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
WASHINGTON, D.C. 20549
FORM 13F
FORM 13F COVER PAGE
Report for the Calendar Year or Quarter Ended: December 31, 2012
-----------------------
Check Here if Amendment//; Amendment Number:
---------
This Amendment (Check only one.): // is a restatement.
// adds new holdings entries.
Institutional Investment Manager Filing this Report:
Name: Bill & Melinda Gates Foundation Trust
-------------------------------------
Address: 2365 Carillon Point
-------------------------------------
Kirkland, WA 98033
-------------------------------------
Form 13F File Number: 28-10098
---------------------
The institutional investment manager filing this report and the person by whom
it is signed hereby represent that the person signing the report is authorized
to submit it, that all information contained herein is true, correct and
complete, and that it is understood that all required items, statements,
schedules, lists, and tables, are considered integral parts of this form.
Person Signing this Report on Behalf of Reporting Manager:
Name: Michael Larson
-------------------------------
Title: Authorized Agent
-------------------------------
Phone: (425) 889-7900
-------------------------------
Signature, Place, and Date of Signing:
/s/ Michael Larson Kirkland, Washington February 14, 2013
------------------------------- -------------------- -----------------
[Signature] [City, State] [Date]
Report Type (Check only one.):
/X/ 13F HOLDINGS REPORT. (Check here if all holdings of this reporting
manager are reported in this report.)
// 13F NOTICE. (Check here if no holdings reported are in this report,
and all holdings are reported by other reporting manager(s).)
// 13F COMBINATION REPORT. (Check here if a portion of the holdings for this
reporting manager are reported in this report and a portion are reported by
other reporting manager(s).)
<Page>
FORM 13F SUMMARY PAGE
Report Summary:
Number of Other Included Managers: 0
--------------------
Form 13F Information Table Entry Total: 26
--------------------
Form 13F Information Table Value Total: $ 16,788,719
--------------------
(thousands)
List of Other Included Managers:
Provide a numbered list of the name(s) and Form 13F file number(s) of all
institutional investment managers with respect to which this report is filed,
other than the manager filing this report.
NONE
2
<Page>
FORM 13 INFORMATION TABLE
As of December 31, 2012
<Table>
<Caption>
VOTING AUTHORITY
VALUE SHRS OR SH/ PUT/ INVESTMENT OTHER ----------------------
NAME OF ISSUER TITLE OF CLASS CUSIP (x$1000) PRN AMOUNT PRN CALL DISCRETION MANAGERS SOLE SHARED NONE
---------------------------- ---------------- --------- ---------- ------------ --- ---- ---------- -------- ---------- ------ ----
<S> <C> <C> <C> <C> <C> <C> <C> <C> <C> <C> <C>
AUTOLIV INC COM 052800109 8,329 123,600 SH SOLE 123,600
AUTONATION INC COM 05329W102 75,379 1,898,716 SH SOLE 1,898,716
BERKSHIRE HATHAWAY INC DEL CL B NEW 084670702 7,811,199 87,081,373 SH SOLE 87,081,373
BP PLC SPONSORED ADR 055622104 297,018 7,133,000 SH SOLE 7,133,000
CANADIAN NATL RY CO COM 136375102 779,358 8,563,437 SH SOLE 8,563,437
CATERPILLAR INC DEL COM 149123101 919,168 10,260,857 SH SOLE 10,260,857
COCA COLA CO COM 191216100 1,232,573 34,002,000 SH SOLE 34,002,000
COCA COLA FEMSA SAB DE CV SPON ADR REP L 191241108 926,242 6,214,719 SH SOLE 6,214,719
CROWN CASTLE INTL CORP COM 228227104 384,822 5,332,900 SH SOLE 5,332,900
DIAMOND FOODS INC COM 252603105 6,031 441,163 SH SOLE 441,163
ECOLAB INC COM 278865100 313,946 4,366,425 SH SOLE 4,366,425
EXXON MOBIL CORP COM 30231G102 661,576 7,643,858 SH SOLE 7,643,858
FEDEX CORP COM 31428X106 277,453 3,024,999 SH SOLE 3,024,999
FOMENTO ECONOMICO MEXICANO SPON ADR UNITS 344419106 21,953 218,000 SH SOLE 218,000
GRUPO TELEVISA SA SPON ADR REP ORD 40049J206 448,647 16,879,103 SH SOLE 16,879,103
LIBERTY GLOBAL INC COM SER A 530555101 133,508 2,119,515 SH SOLE 2,119,515
LIBERTY GLOBAL INC COM SER C 530555309 41,507 706,507 SH SOLE 706,507
MCDONALDS CORP COM 580135101 870,853 9,872,500 SH SOLE 9,872,500
ORBOTECH LTD ORD M75253100 6,973 823,300 SH SOLE 823,300
PROCTER & GAMBLE CO COM 742718109 101,835 1,500,000 SH SOLE 1,500,000
REPUBLIC SVCS INC COM 760759100 39,596 1,350,000 SH SOLE 1,350,000
SIGNET JEWELERS LIMITED SHS G81276100 9,993 187,130 SH SOLE 187,130
TOYOTA MOTOR CORP SP ADR REP2COM 892331307 14,295 153,300 SH SOLE 153,300
WAL-MART STORES INC COM 931142103 757,558 11,103,000 SH SOLE 11,103,000
WASTE MGMT INC COM 94106L109 628,700 18,633,672 SH SOLE 18,633,672
WILLIS GROUP HOLDINGS PUBLIC SHS G96666105 20,209 602,700 SH SOLE 602,700
---------- ------------
16,788,719 240,235,774
</Table>
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>
Как вы можете видеть этот файл является просто TXT файл с пользовательскими тегами.
Мой вопрос: как настроить целевые тексты в пределах определенного тега? например, мне нужны только тексты внутри тега TEXT из указанного выше txt-файла.
«Мне нужны только тексты внутри тега из указанного выше txt-файла». Я не понимаю. Какая именно часть входа заинтересована? –
мой плохой я положил «<>» вокруг ТЕКСТА. Я исправил ошибку. Поэтому скажите, что мне нужны данные в теге TEXT. как мне это сделать? спасибо –
Вы уверены, что это полный вход? Это недействительно XML. –