2016-05-30 3 views
0

Я ищу разбивку абзаца на предложения, а затем в строки «взорванных», но нужно сохранить пунктуации как элементы массива.PHP preg_split или preg_match предложения, но сохраняйте знаки препинания в Array

Пример текста:

$meta = 'I am looking to break this paragraph into chunks. 
     I have researched, tried and tested various combinations; however, I cannot 
     seem to make it work. Would anyone help me figure this out? 
     I thank you in advance...' 

Нужный результат будет:

Array ([0] => 
      Array ([0] => I [1] => am [2] => looking [3] => to [4] => break [5] => [6] => this [7] => paragraph [8] => into [9] => chunks [10] => .) 
     [1] =>  
      Array ([0] => I [2] => have [3] => researched [4] => , [5] => tried [...... 
      ......] [5] => figure [6] => this [7] => out [8] => ?) 
     [3] => 
      Array ([0] => I [1] => thank [2] => you [3] => in [4] => advance [5] => ...) 
    ) 

Я попытался с помощью:

$s = preg_split('/\s*[!?.]\s*/u', $meta, -1, PREG_SPLIT_NO_EMPTY); 

отделить приговоры, но в то время как это работает, то пунктуация исчезает.

Я действительно ценю помощь с созданием этого два массива уровня с пунктуацией

ответ

1

Вы могли бы сделать то, что вы хотите использовать preg_match:

$meta = 'I am looking to break this paragraph into chunks. 
     I have researched, tried and tested various combinations; however, I cannot 
     seem to make it work. Would anyone help me figure this out? 
     I thank you in advance...'; 

preg_match_all('/(\w+|[.;?,]+)/', $meta, $m); 
print_r($m); 

Пояснение:

/   : regex delimiter 
    (  : begin group 1 
    \w+  : 1 or more aphanumeric character <=> [a-zA-Z0-9_] 
    |  : OR 
    [.;?,]+ : 1 or more punctuation 
)   : end of group 1 
/   : regex delimiter 

Этот будет соответствовать и хранить в группе 1 evry word каждую группу символов пунктуации.

Если вы хотите быть юникода совместимы, вы могли бы использовать \p{L} для любого письма и \p{P} пунктуации:

/(\p{L}+|\p{P}+)/ 

Выход:

Array 
(
    [0] => Array 
     (
      [0] => I 
      [1] => am 
      [2] => looking 
      [3] => to 
      [4] => break 
      [5] => this 
      [6] => paragraph 
      [7] => into 
      [8] => chunks 
      [9] => . 
      [10] => I 
      [11] => have 
      [12] => researched 
      [13] => , 
      [14] => tried 
      [15] => and 
      [16] => tested 
      [17] => various 
      [18] => combinations 
      [19] => ; 
      [20] => however 
      [21] => , 
      [22] => I 
      [23] => cannot 
      [24] => seem 
      [25] => to 
      [26] => make 
      [27] => it 
      [28] => work 
      [29] => . 
      [30] => Would 
      [31] => anyone 
      [32] => help 
      [33] => me 
      [34] => figure 
      [35] => this 
      [36] => out 
      [37] => ? 
      [38] => I 
      [39] => thank 
      [40] => you 
      [41] => in 
      [42] => advance 
      [43] => ... 
     ) 

    [1] => Array 
     (
      [0] => I 
      [1] => am 
      [2] => looking 
      [3] => to 
      [4] => break 
      [5] => this 
      [6] => paragraph 
      [7] => into 
      [8] => chunks 
      [9] => . 
      [10] => I 
      [11] => have 
      [12] => researched 
      [13] => , 
      [14] => tried 
      [15] => and 
      [16] => tested 
      [17] => various 
      [18] => combinations 
      [19] => ; 
      [20] => however 
      [21] => , 
      [22] => I 
      [23] => cannot 
      [24] => seem 
      [25] => to 
      [26] => make 
      [27] => it 
      [28] => work 
      [29] => . 
      [30] => Would 
      [31] => anyone 
      [32] => help 
      [33] => me 
      [34] => figure 
      [35] => this 
      [36] => out 
      [37] => ? 
      [38] => I 
      [39] => thank 
      [40] => you 
      [41] => in 
      [42] => advance 
      [43] => ... 
     ) 

) 
+0

благодаря @Toto, который работал. Если у вас будет возможность объяснить, что вы написали, чтобы помочь мне понять и узнать, я буду благодарен – Jacob

+1

@Jacob: см. Мое редактирование. – Toto

+0

спасибо, что нашли время, чтобы объяснить это! – Jacob

 Смежные вопросы

  • Нет связанных вопросов^_^