2016-12-21 10 views
1

Я пытаюсь запустить сценарий pyspark на BigInsights на Cloud 4.2 Enterprise, который обращается к таблице Hive.Spark Hive report ClassNotFoundException: com.ibm.biginsights.bigsql.sync.BIEventListener

Сначала я создаю улья таблицу:

[[email protected] ~]$ hive 
hive> CREATE TABLE pokes (foo INT, bar STRING); 
OK 
Time taken: 2.147 seconds 
hive> LOAD DATA LOCAL INPATH '/usr/iop/4.2.0.0/hive/doc/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes; 
Loading data to table default.pokes 
Table default.pokes stats: [numFiles=1, numRows=0, totalSize=5812, rawDataSize=0] 
OK 
Time taken: 0.49 seconds 
hive> 

Затем я создаю простой pyspark скрипт:

[[email protected] ~]$ cat test_pokes.py 
from pyspark import SparkContext 

sc = SparkContext() 

from pyspark.sql import HiveContext 
hc = HiveContext(sc) 

pokesRdd = hc.sql('select * from pokes') 
print(pokesRdd.collect()) 

я попытка выполнить с:

[[email protected] ~]$ spark-submit \ 
    --master yarn-cluster \ 
    --deploy-mode cluster \ 
    --jars /usr/iop/4.2.0.0/hive/lib/datanucleus-api-jdo-3.2.6.jar, \ 
      /usr/iop/4.2.0.0/hive/lib/datanucleus-core-3.2.10.jar, \ 
      /usr/iop/4.2.0.0/hive/lib/datanucleus-rdbms-3.2.9.jar \ 
    --files /usr/iop/4.2.0.0/hive/conf/hive-site.xml \ 
    test_pokes.py 

Однако я сталкиваюсь ошибка:

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly 
Traceback (most recent call last): 
    File "test_pokes.py", line 8, in <module> 
    pokesRdd = hc.sql('select * from pokes') 
    File "/disk2/local/usercache/biadmin/appcache/application_1477084339086_0485/container_e09_1477084339086_0485_02_000001/pyspark.zip/pyspark/sql/context.py", line 580, in sql 
    ... 
    File /container_e09_1477084339086_0485_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. 
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
    at 
    ... 
    ... 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
    at 
    ... 
    ... 27 more 
Caused by: MetaException(message:Failed to instantiate listener named: com.ibm.biginsights.bigsql.sync.BIEventListener, reason: java.lang.ClassNotFoundException: com.ibm.biginsights.bigsql.sync.BIEventListener) 
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreListeners(MetaStoreUtils.java:1478) 
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:481) 
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
    ... 32 more 

Смотрите также предыдущие ошибки, связанные с этим вопросом:

ответ

2

Решение было использовать улей-site.xml из папки с искровым клиента:

[[email protected] ~]$ spark-submit \ 
    --master yarn-cluster \ 
    --deploy-mode cluster \ 
    --jars /usr/iop/4.2.0.0/hive/lib/datanucleus-api-jdo-3.2.6.jar, \ 
      /usr/iop/4.2.0.0/hive/lib/datanucleus-core-3.2.10.jar, \ 
      /usr/iop/4.2.0.0/hive/lib/datanucleus-rdbms-3.2.9.jar \ 
    --files /usr/iop/current/spark-client/conf/hive-site.xml \ 
test_pokes.py 

Это снято в документах: http://www.ibm.com/support/knowledgecenter/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.product.doc/doc/bi_spark.html