Я пытаюсь разобрать http://www.google.com/finance?q=INDEXDJX:.DJI и я не могу достичь этого я не могу понять, почему:Огромная проблема разобрать этот «простой» страницы HTML

symbol_list: ["GOOG" "AAPL" "MSFT" "INDEXDJX:.DJI"] 
foreach symbol symbol_list [ 
    url0: rejoin [http://www.google.com/finance/historical?q= symbol] 
    ;stock-data: read/lines url 
    dir: make-dir/deep to-rebol-file "askpoweruser/stock-download/google/" 
    either none? filename: find symbol ":" [filename: symbol 
    url: rejoin [url0 "&output=csv"] 
    content: read url 
    out-string: copy rejoin ["Time;Open;High;Low;Close;Volume" newline] 
    reversed-quotes: reverse parse/all content ",^/" 

    foreach [v c l h o d] reversed-quotes [ 
     either not (error? try [d: to-date d]) [ 
      d: rejoin [d/year "-" d/month "-" d/day] 
      append out-string rejoin [d ";" o ";" h ";" l ";" c ";" v newline] 


    write to-rebol-file rejoin [dir symbol ".csv"] out-string 
    filename: next next filename 
    out: copy [] 
    for i 0 1 1 [ 
    p: i 
    url: rejoin [url0 "&start=" (p * 200) "&num=" ((p + 1) * 200)] 
    content: read url 
    rule: [to "<table" thru "<table" to ">" thru ">" 
    to "<table" thru "<table" to ">" thru ">" 
    to "<table" thru "<table" to ">" thru ">" 
    copy quotes to </table> to end 
    parse content rule 

parse load/markup quotes [ 
    some [set tag tag! (probe tag) | set x string! (
     if (not none? tag) [ 
     if ((left-range tag 3) = "<td") [ 
      replace/all (replace/all x "^/" "") "," "" 
      append out x 
    ;write/lines to-rebol-file rejoin [dir filename "_" p ".html"] quotes 

    write to-rebol-file rejoin [dir filename "_temp" ".txt"] mold out 
    remove/part out 2 
    out-string: copy rejoin ["Time;Open;High;Low;Close;Volume" newline] 
    out: reverse out 
    insert/only out "" 1 

    foreach [x v c l h o d] out [ 

     either not (error? try [d: to-date d]) [ 
      d: rejoin [d/year "-" d/month "-" d/day] 
      append out-string rejoin [d ";" o ";" h ";" l ";" c ";" v newline] 
      probe d 
write/lines to-rebol-file rejoin [dir filename ".csv"] out-string 


Наконец я это по-другому (см мой собственный ответ ниже), используя парсер вместо загрузки/разметки, которая, как представляется, будет проще в первом, но Google HTML, кажется, не очень любезно, поэтому я изменил свое мнение:

parse quotes [ 
    some [to "<td" thru "<td" to ">" thru ">" [copy x to "<" | copy x to end] (append out replace/all x "^/" "")] 
    to end 

выхода образца:


Зачем использовать [к «<таблица» через «<таблица»]? Это то же самое, что и [thru «



Наконец я отказалась от использования нагрузки/разметки и непосредственно использовать парсер, теперь он работает:

symbol_list: ["GOOG" "AAPL" "MSFT" "INDEXDJX:.DJI"] 
foreach symbol symbol_list [ 
    url0: rejoin [http://www.google.com/finance/historical?q= symbol] 
    ;stock-data: read/lines url 
    dir: make-dir/deep to-rebol-file "askpoweruser/stock-download/google/" 
    either none? filename: find symbol ":" [filename: symbol 
    url: rejoin [url0 "&output=csv"] 
    content: read url 
    out-string: copy rejoin ["Time;Open;High;Low;Close;Volume" newline] 
    reversed-quotes: reverse parse/all content ",^/" 

    foreach [v c l h o d] reversed-quotes [ 
     either not (error? try [d: to-date d]) [ 
      d: rejoin [d/year "-" d/month "-" d/day] 
      append out-string rejoin [d ";" o ";" h ";" l ";" c ";" v newline] 


    write to-rebol-file rejoin [dir symbol ".csv"] out-string 
    filename: next next filename 
    out: copy [] 
    for i 0 1 1 [ 
    p: i 
    url: rejoin [url0 "&start=" (p * 200) "&num=" ((p + 1) * 200)] 
    content: read url 
    rule: [to "<table" thru "<table" to ">" thru ">" 
    to "<table" thru "<table" to ">" thru ">" 
    to "<table" thru "<table" to ">" thru ">" 
    copy quotes to </table> to end 
    parse content rule 

parse quotes [ 
    some [to "<td" thru "<td" to ">" thru ">" [copy x to "<" | copy x to end] (append out replace/all x "^/" "")] 
    to end 
    ;write/lines to-rebol-file rejoin [dir filename "_" p ".html"] quotes 

    write to-rebol-file rejoin [dir filename "_temp" ".txt"] mold out 
    ;remove/part out 2 
    out-string: copy rejoin ["Time;Open;High;Low;Close;Volume" newline] 
    out: reverse out  

    foreach [v c l h o d] out [ 
     d: parse/all d " ," 
     d: to-date rejoin [d/4 "-" d/1 "-" d/2] 
     d: rejoin [d/year "-" d/month "-" d/day] 
     append out-string rejoin [d ";" o ";" h ";" l ";" c ";" v newline] 
    write to-rebol-file rejoin [dir filename ".csv"] out-string 
