2016-01-27 4 views
0

Мы экспериментируем с проблемами, когда мы используем операцию «mahout spark-rowsimilarity». У нас есть входная матрица с 100k строк и 100 элементов и процесс выдает исключение из «Исключение в задаче 0.0 на этапе 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space», и мы пытаемся увеличить JAVA HEAP MEMORY, MAHOUT HEAP MEMORY и spark.driver.memory.Исключение в задаче 0.0 в стадии 13.0 (TID 13) java.lang.OutOfMemoryError: пространство кучи Java

варианты окружающей среды: Mahout: 0.11.1 Свечи: 1.6.0

Mahout командной строки:.

/opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o test_output.tmp --maxObservations 500 --maxSimilaritiesPerRow 100 --omitStrength --master local --sparkExecutorMem 8g 

Этот процесс выполняется на машине со следующими характеристиками:

Mem RAM: 8gb 
CPU with 8 cores 

файл .profile:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 
export HADOOP_HOME=/opt/hadoop-2.6.0 
export SPARK_HOME=/opt/spark 
export MAHOUT_HOME=/opt/mahout 
export MAHOUT_HEAPSIZE=8192 

Выдает исключение:

16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13) 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66) 
     at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70) 
     at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
     at org.apache.spark.scheduler.Task.run(Task.scala:89) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1 attempts 
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout 
     at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) 
     at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) 
     at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1 attempts 
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout 
     at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) 
     at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) 
     at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468) 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) 
     at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
     at scala.concurrent.Await$.result(package.scala:107) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
     ... 

Можете ли вы посоветовать?

Спасибо за продвижение. Cheers.

ответ

1

У меня была аналогичная проблема, и решить ее путем возврата этой фиксации:

https://github.com/apache/mahout/pull/10/commits/162c5ca36e00af91a9599075332c577d9b1a13c4

+0

Большое вам спасибо за ваш ответ, он работал !! Я не понимаю, почему это не включено в исправление в apache-mahout. У вас есть информация об этом? Еще раз спасибо – galix85