2016-06-21 2 views
-1

У меня есть данные, глядя, как показано ниже:временные ряды данных процесса для каждого второго в R

# Data Sample 
Time    Price V1   Time2 V2 
2016-06-20 05:09:44 2086.50 1 05:09:44.284670 -1 
2016-06-20 05:09:45 2086.75 5 05:09:45.212413 1 
2016-06-20 05:09:45 2086.75 10 05:09:45.212413 1 
2016-06-20 05:09:45 2086.75 1 05:09:45.212413 1 
2016-06-20 05:09:46 2086.75 1 05:09:46.745124 1 
2016-06-20 05:09:46 2086.75 1 05:09:46.745124 1 
2016-06-20 05:09:46 2086.75 1 05:09:46.819954 1 
2016-06-20 05:09:49 2086.75 1 05:09:49.279392 1 
2016-06-20 05:09:49 2086.75 1 05:09:49.279392 1 
2016-06-20 05:09:49 2086.75 1 05:09:49.352346 1 
2016-06-20 05:09:49 2086.50 2 05:09:49.964023 -1 
2016-06-20 05:09:49 2086.50 1 05:09:49.964023 -1 
2016-06-20 05:09:55 2086.50 1 05:09:55.343324 -1 
2016-06-20 05:09:57 2086.75 1 05:09:57.551886 1 
2016-06-20 05:09:57 2086.75 1 05:09:57.650549 1 
2016-06-20 05:09:57 2086.75 1 05:09:57.654352 1 
2016-06-20 05:09:57 2086.75 1 05:09:57.654352 1 
2016-06-20 05:09:57 2086.75 1 05:09:57.726578 1 

Я хочу, чтобы очистить данные, так что я Суммируя все V1 для каждого в пределах секунд. Так что мой желаемый результат будет выглядеть следующим образом:

# Desired Example 
Time    V1  
2016-06-20 05:09:44 1 
2016-06-20 05:09:45 16 
2016-06-20 05:09:46 3 
2016-06-20 05:09:47 0 
2016-06-20 05:09:48 0 
2016-06-20 05:09:49 6 
2016-06-20 05:09:50 0 
2016-06-20 05:09:51 0 
2016-06-20 05:09:52 0 
2016-06-20 05:09:53 0 
2016-06-20 05:09:54 0 
2016-06-20 05:09:55 1 
2016-06-20 05:09:56 0 
2016-06-20 05:09:57 5 

Я поворачиваюсь столбец «Время» по характеру и разделить их и обрабатывать их в списке. Однако данные очень большие, и для вычисления требуется слишком много времени. Есть ли способ сделать это, возможно, через какую-то функцию в зоопарке?

Ниже приведен аналогичный набор данных с использованием dput:

структура (список (V3 = с (2086.5, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086.75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,5, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75, 2086,75), V4 = c (1L, 5L, 10L, 1L, 6L, 8L, 1L , 4 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 8L, 1L, 1L, 1L, 4L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), V6 = c ("05: 09: 44.284670 «, 05: 09: 45.212413», «0: 0: 45.212413», «0: 0: "," 05: 09: 45.301513 ", " 05: 09: 45.301513 "," 05: 09: 45.389110 "," 05: 09: 45.392840 ", " 05: 09: 45.475688 "," 05: 09: 45.543980 "," 05: 09: 46.745124 ", " 05: 09: 46.745124 "," 05: 09: 46.819954 "," 05: 09: 49.279392 ", " 05: 09: 49.279392 "," 05: 09: 49.352346 " "," 05: 09: 49.964023 ", " 05: 09: 49.964023 "," 05: 09: 49.964023 "," 05: 09: 49.964023 ", " 05: 09: 55.343324 "," 0 5: 09: 57.551886 "," 05: 09: 57.650549 ", " 05: 09: 57.654352 "," 05: 09: 57.654352 "," 05: 09: 57.726578 ", " 05: 09: 57.728848 "," 05: 09: 58.286788 "," 05: 10: 00.390708 ", " 05: 10: 00.473617 "," 05: 10: 00.494903 "," 05: 10: 00.564042 ", " 05: 10: 08.24907 "," 05: 10: 09.633247 "," 05: 10: 09.633247 "," 0: 05: 09.633247 " 05: 10: 09.633247 "," 05: 10: 09.633247 "," 0: 05: 09.830544 " 05: 10: 09.924001 "," 05: 10: 09.924001 "), V7 = c (-1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, -1L, -1L, -1L, -1L, -1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, -1L, -1L , -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, 1L, 1L, 1L, 1L, 1L, 1L)). Имена = с ("V3", "В4", "V6", "V7"), row.names = с (Н.А., 50L), класс = "data.frame")

+0

Не 'df%>% group_by (время)%>% суммировать (V1_new = sum (V1))' достаточно быстро? – Sumedh

+0

Я использую R для этого. Является ли это R? – jay2020

+0

Да, я должен был упомянуть. Используйте пакет 'dplyr'. – Sumedh

ответ

0

data.table очень быстро. Попытайтесь:

library(data.table) 
library(lubridate) 

mydata<-data.table(mydata) 
mydata$Time<-ymd_hms(mydata$Time) 
setkey(mydata, Time) 

mydata.summed<-mydata[, .(V1 = sum(V1)), by = Time] # sums by each second 

mydata2<-data.table(Time = seq(min(mydata$Time), max(mydata$Time), by = 1)) 
#create a new data.table to fill in the seconds you do not have values for 

mydata<-mydata.summed[mydata2] 
#merge them. see ?data.table for more information here 

mydata[is.na(mydata)]<-0 
#change the NAs that were created by the merge to 0 

head(mydata, 10) 

        Time V1 
1: 2016-06-20 05:09:44 1 
2: 2016-06-20 05:09:45 16 
3: 2016-06-20 05:09:46 3 
4: 2016-06-20 05:09:47 0 
5: 2016-06-20 05:09:48 0 
6: 2016-06-20 05:09:49 6 
7: 2016-06-20 05:09:50 0 
8: 2016-06-20 05:09:51 0 
9: 2016-06-20 05:09:52 0 
10: 2016-06-20 05:09:53 0