Apache kylin: создание куба выходит из строя на шаге 5 - слишком большой размер KeyValue

Я начал использовать Apache kylin (версия 1.5.3). При создании куба я получаю сообщение об ошибке на шаге 5 «Сохраняем статистику Cuboid». В журнале указано:Apache kylin: создание куба выходит из строя на шаге 5 - слишком большой размер KeyValue

java.lang.IllegalArgumentException: KeyValue size too large 
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) 
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1038) 
at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:242) 
at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208) 
at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:113) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)

Сначала я попытался создать тот же куб с меньшим размером, и он работает. Также работает создание куба antoher с параметрами left out. Но когда я пытаюсь создать один куб со всеми этими (13) измерениями, он терпит неудачу. Я также устал устанавливать hbase.client.keyvalue.maxsize в 0, чтобы отключить проверку. По-прежнему такая же ошибка.

Кто-нибудь знает, в чем проблема, и как я могу ее решить?

Я использую kylin поверх Sandbox HDP 2.4.

Спасибо за помощь заранее

Сорен

источник

2016-08-09 Søren

Что такое "hbase.client.keyvalue.maxsize" в вашей конфигурации hbase? –

"hbase.client.keyvalue.maxsize" установлен в 0 атм. Поэтому обычно проверка должна быть отключена. –

Попробуйте kylin.hbase.client.keyvalue.maxsize = 1048576 –

Убедитесь, что значение kylin.hbase.client.keyvalue.maxsize (который находится в конфигурационном файле Kylin - конф/kylin.properteis) и hbase.client .keyvalue.maxsize (который находится в файле конфигурации hbase) одинаковы. Обычно мы получаем размер ключа значения слишком большую ошибку, когда значение kylin.hbase.client.keyvalue.maxsize больше hbase.client.keyvalue.maxsize

Пожалуйста, найдите ниже свойств образца Kylin

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
kylin.metadata.url=kylin_[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50 
kylin.hbase.client.keyvalue.maxsize=1048576

Внутренних свойства набор kylin.hbase.client.keyvalue.maxsize = 1048576

источник

2016-08-10 21:09:09

@ Nithin K Анил

не можете найти kylin.hbase.client.keyvalue.maxsize в kylin.properties. Kylin.properties выглядит следующим образом:

> [[email protected] conf]# cat kylin.properties 
# 
# Licensed to the Apache Software Foundation (ASF) under one or more 
# contributor license agreements. See the NOTICE file distributed with 
# this work for additional information regarding copyright ownership. 
# The ASF licenses this file to You under the Apache License, Version 2.0 
# (the "License"); you may not use this file except in compliance with 
# the License. You may obtain a copy of the License at 
# 
# http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS" BASIS, 
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
# See the License for the specific language governing permissions and 
# limitations under the License. 
# 

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50

источник

2016-08-11 09:56:49

Установите kylin.hbase.client.keyvalue.maxsize = 1048576 в файле свойств –

Мы попали ключевые ограничения на сращивания машины, а раньше ...

Также помните из KeyValue спецификации ключ требуется, чтобы вписаться в короткий. KeyValue # getRowOffset()

источник

2016-08-11 19:43:59

Apache kylin: создание куба выходит из строя на шаге 5 - слишком большой размер KeyValue

ответ

Смежные вопросы