2016-01-07 6 views
3

Я запускаю кластер из 5 узлов с версией 3.7.0.2, и после нескольких часов использования все 5 экземпляров разбились. Я видел некоторые другие сообщения об аварии в этой версии. Должен ли я загрузить версию 3.7.1? Устранит ли он крах?Воздушная катастрофа внезапной аварии

Linux aerospike2 4.2.0-18-родовой # 22-Ubuntu SMP Пт 6 ноября 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 15,10)

конфигурации:

# Aerospike database configuration file. 

service { 
    user root 
    group root 
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1. 
    pidfile /var/run/aerospike/asd.pid 
    service-threads 32 
    transaction-queues 32 
    transaction-threads-per-queue 32 
     batch-index-threads 32 
    proto-fd-max 15000 
     batch-max-requests 200000 
} 

logging { 
    # Log file must be an absolute path. 
    file /var/log/aerospike/aerospike.log { 
     context any info 
    } 
} 

network { 
    service { 
     address 10.240.0.6 
     port 3000 
    } 

    heartbeat { 
       mode mesh 
       address 10.240.0.6 # IP of the NIC on which this node is listening 
       mesh-seed-address-port 10.240.0.6 3002 
       mesh-seed-address-port 10.240.0.5 3002 

       port 3002 

     interval 150 
     timeout 10 
    } 

    fabric { 
     port 3001 
    } 

    info { 
     port 3003 
    } 
} 

namespace test { 
    replication-factor 10 
    memory-size 3500M 
    default-ttl 0 # 30 days, use 0 to never expire/evict. 
     ldt-enabled true 

    storage-engine device { 
      file /data/aerospike.dat 
      write-block-size 1M 
      filesize 300G 
      # data-in-memory true 
     } 
} 

ЖУРНАЛОВ:

Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads 
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services... 
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each 
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats 
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization 
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002 
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0 
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored 
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142 
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32 
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers 
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers 
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4 
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ... 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived 
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal() 
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started 
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started 
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived 
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142 
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5 
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled 
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx) 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated 
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message. 
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal() 
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal() 
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ... 
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads 
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake! 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084) system memory: free 6590544kb (86 percent free) 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090) ClusterSize 5 ::: objects 13593274 ::: sub_objects 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099) rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0) 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116) heartbeat_received: self 0 : foreign 322 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0) 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203) partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec 
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed) 
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed) 
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s) 
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH. 
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH. 
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH. 
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH. 
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor) 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa] 
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed] 
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><> Aerospike Community Edition build 3.7.1 <><><><><><><><><><> 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file. 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) user root 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) group root 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1. 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) pidfile /var/run/aerospike/asd.pid 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service-threads 32 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-queues 32 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-threads-per-queue 32 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   batch-index-threads 32 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) proto-fd-max 15000 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   batch-max-requests 200000 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Log file must be an absolute path. 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /var/log/aerospike/aerospike.log { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  context any info 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  #address any 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  port 3000 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) heartbeat { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)     mode mesh 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)     mesh-seed-address-port 10.240.0.6 3002 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)     mesh-seed-address-port 10.240.0.5 3002 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)     port 3002 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  interval 150 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  timeout 10 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) fabric { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  port 3001 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) info { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)  port 3003 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) replication-factor 10 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) memory-size 3500M 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) default-ttl 0 # 30 days, use 0 to never expire/evict. 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   ldt-enabled true 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) storage-engine device { 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   file /data/aerospike.dat 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   write-block-size 1M 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   filesize 300G 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)   } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) } 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000 
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14 
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142 
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start 
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200 
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576 
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index 
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test" 
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0% 
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0% 
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0% 
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1% 
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1% 
+0

Ну, единственный критический в этом журнале отрывка это одна: 7 января 2016 11:28:54 GMT: КРИТИЧЕСКАЯ (demarshal): (thr_demarshal.c : thr_demarshal_resume: 124) невозможно возобновить сокет FD -1 на экземпляре epoll FD 115: 9 (дескриптор плохих файлов) Исходный код для кода сбоя доступен по адресу https://github.com/aerospike/aerospike-server/blob /master/as/src/base/thr_demarshal.c (для любого, кто более глубоко разбирается в отладке EPOLL) –

+0

Похоже, что они исправили хотя бы что-то n 3.7.1, не уверен, что это может быть вашей ошибкой, хотя, поскольку этот код существует с самого первого фиксации. https://github.com/aerospike/aerospike-server/commit/40a555821d09a629a4c7c579bc98215e8408c13a#diff-1ece2e0a32a7d80dcdfba2b8020a4b5c –

ответ

0

Это было исправлено в версии 3.7.1 и выше для аэросистем.

Более подробная информация о выпуске и Jira:

[AER-4487], [AER-4690] - (кластеризация/миграции) Гонка состояние вызывает неправильное сердцебиение Fd сохранены, а затем не вынимается.

Также см:

https://discuss.aerospike.com/t/aerospike-crash/2327

 Смежные вопросы

  • Нет связанных вопросов^_^