У меня возникли проблемы с пользовательскими классами в giraph. Я сделал формат VertexInput и выходной, но я всегда получаю следующее сообщение об ошибке:java.io.IOException: confirmRemaining: осталось только 0 байт, пытаясь прочитать 1
java.io.IOException: ensureRemaining: Only * bytes remaining, trying to read *
с различными значениями, где «*» помещаются.
Это было протестировано на кластере с одним узлом.
Эта проблема возникает, когда vertexIterator делает next(), и больше нет вершин слева. Этот итератор он вызывается из метода flush, но я не понимаю, в основном, почему метод «next()» терпит неудачу. Вот некоторые журналы и классы ...
Мой журнал следующее:
15/09/08 00:52:21 INFO bsp.BspService: BspService: Connecting to ZooKeeper with job giraph_yarn_application_1441683854213_0001, 1 on localhost:22181
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:host.name=localhost
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_79
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.class.path=.:${CLASSPATH}:./**/
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/l$
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.version=3.13.0-62-generic
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.dir=/app/hadoop/tmp/nm-local-dir/usercache/hduser/appcache/application_1441683854213_0001/container_1441683854213_0001_01_000003
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:22181 sessionTimeout=60000 [email protected]
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:22181. Will not attempt to authenticate using SASL (unknown error)
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:22181, initiating session
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:22181, sessionid = 0x14fab0de0bb0002, negotiated timeout = 40000
15/09/08 00:52:21 INFO bsp.BspService: process: Asynchronous connection complete.
15/09/08 00:52:21 INFO netty.NettyServer: NettyServer: Using execution group with 8 threads for requestFrameDecoder.
15/09/08 00:52:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/08 00:52:21 INFO netty.NettyServer: start: Started server communication server: localhost/127.0.0.1:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
15/09/08 00:52:21 INFO netty.NettyClient: NettyClient: Using execution handler with 8 threads after request-encoder.
15/09/08 00:52:21 INFO graph.GraphTaskManager: setup: Registering health of this worker...
15/09/08 00:52:21 INFO yarn.GiraphYarnTask: [STATUS: task-1] WORKER_ONLY starting...
15/09/08 00:52:22 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1441683854213_0001/_masterJobState)
15/09/08 00:52:22 INFO bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir already exists!
15/09/08 00:52:22 INFO bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir already exists!
15/09/08 00:52:22 INFO worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir/0/_superstepD$
15/09/08 00:52:22 INFO netty.NettyServer: start: Using Netty without authentication.
15/09/08 00:52:22 INFO bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
15/09/08 00:52:22 INFO worker.BspServiceWorker: startSuperstep: Master(hostname=localhost, MRtaskID=0, port=30000)
15/09/08 00:52:22 INFO worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/giraph_yarn_application_1441683854$
15/09/08 00:52:22 INFO yarn.GiraphYarnTask: [STATUS: task-1] startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1
15/09/08 00:52:22 INFO netty.NettyClient: Using Netty without authentication.
15/09/08 00:52:22 INFO netty.NettyClient: Using Netty without authentication.
15/09/08 00:52:22 INFO netty.NettyClient: connectAllAddresses: Successfully added 2 connections, (2 total connected) 0 failed, 0 failures total.
15/09/08 00:52:22 INFO netty.NettyServer: start: Using Netty without authentication.
15/09/08 00:52:22 INFO handler.RequestDecoder: decode: Server window metrics MBytes/sec received = 0, MBytesReceived = 0.0001, ave received req MBytes = 0.0001, secs waited = 1.44168435E9
15/09/08 00:52:22 INFO worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 1 total splits.
15/09/08 00:52:22 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0, overall roughly 0.0% input splits rese$
15/09/08 00:52:22 INFO worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://hdnode01:54310/u$
15/09/08 00:52:22 INFO worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0 (v=6, e=10)
15/09/08 00:52:22 INFO worker.InputSplitsCallable: call: Loaded 1 input splits in 0.16241108 secs, (v=6, e=10) 36.94329 vertices/sec, 61.572155 edges/sec
15/09/08 00:52:22 ERROR utils.LogStacktraceCallable: Execution of callable failed
java.lang.IllegalStateException: next: IOException
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:101)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1
at org.apache.giraph.utils.UnsafeReads.ensureRemaining(UnsafeReads.java:77)
at org.apache.giraph.utils.UnsafeArrayReads.readByte(UnsafeArrayReads.java:123)
at org.apache.giraph.utils.UnsafeReads.readLine(UnsafeReads.java:100)
at pruebas.TextAndDoubleComplexWritable.readFields(TextAndDoubleComplexWritable.java:37)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:540)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
... 11 more
15/09/08 00:52:22 ERROR worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHea$
15/09/08 00:52:22 ERROR yarn.GiraphYarnTask: GiraphYarnTask threw a top-level exception, failing task
java.lang.RuntimeException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for [email protected]0
at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:104)
at org.apache.giraph.yarn.GiraphYarnTask.main(GiraphYarnTask.java:183)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for [email protected]0
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:92)
... 1 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: next: IOException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
... 10 more
Caused by: java.lang.IllegalStateException: next: IOException
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:101)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1
at org.apache.giraph.utils.UnsafeReads.ensureRemaining(UnsafeReads.java:77)
at org.apache.giraph.utils.UnsafeArrayReads.readByte(UnsafeArrayReads.java:123)
at org.apache.giraph.utils.UnsafeReads.readLine(UnsafeReads.java:100)
at pruebas.TextAndDoubleComplexWritable.readFields(TextAndDoubleComplexWritable.java:37)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:540)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
... 11 more
Мой формат ввода:
package pruebas;
import org.apache.giraph.edge.Edge;
import org.apache.giraph.edge.EdgeFactory;
import org.apache.giraph.io.formats.AdjacencyListTextVertexInputFormat;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
/**
* @author hduser
*
*/
public class IdTextWithComplexValueInputFormat
extends
AdjacencyListTextVertexInputFormat<Text, TextAndDoubleComplexWritable, DoubleWritable> {
@Override
public AdjacencyListTextVertexReader createVertexReader(InputSplit split,
TaskAttemptContext context) {
return new TextComplexValueDoubleAdjacencyListVertexReader();
}
protected class TextComplexValueDoubleAdjacencyListVertexReader extends
AdjacencyListTextVertexReader {
/**
* Constructor with
* {@link AdjacencyListTextVertexInputFormat.LineSanitizer}.
*
* @param lineSanitizer
* the sanitizer to use for reading
*/
public TextComplexValueDoubleAdjacencyListVertexReader() {
super();
}
@Override
public Text decodeId(String s) {
return new Text(s);
}
@Override
public TextAndDoubleComplexWritable decodeValue(String s) {
TextAndDoubleComplexWritable valorComplejo = new TextAndDoubleComplexWritable();
valorComplejo.setVertexData(Double.valueOf(s));
valorComplejo.setIds_vertices_anteriores("");
return valorComplejo;
}
@Override
public Edge<Text, DoubleWritable> decodeEdge(String s1, String s2) {
return EdgeFactory.create(new Text(s1),
new DoubleWritable(Double.valueOf(s2)));
}
}
}
TextAndDoubleComplexWritable:
package pruebas;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class TextAndDoubleComplexWritable implements Writable {
private String idsVerticesAnteriores;
private double vertexData;
public TextAndDoubleComplexWritable() {
super();
this.idsVerticesAnteriores = "";
}
public TextAndDoubleComplexWritable(double vertexData) {
super();
this.vertexData = vertexData;
}
public TextAndDoubleComplexWritable(String ids_vertices_anteriores,
double vertexData) {
super();
this.idsVerticesAnteriores = ids_vertices_anteriores;
this.vertexData = vertexData;
}
public void write(DataOutput out) throws IOException {
out.writeUTF(idsVerticesAnteriores);
}
public void readFields(DataInput in) throws IOException {
idsVerticesAnteriores = in.readLine();
}
public String getIds_vertices_anteriores() {
return idsVerticesAnteriores;
}
public void setIds_vertices_anteriores(String ids_vertices_anteriores) {
this.idsVerticesAnteriores = ids_vertices_anteriores;
}
public double getVertexData() {
return vertexData;
}
public void setVertexData(double vertexData) {
this.vertexData = vertexData;
}
}
Мой входной файл:
Portada 0.0 Sugerencias 1.0
Sugerencias 3.0 Portada 1.0
и я выполнить его с помощью этой команды:
$HADOOP_HOME/bin/yarn jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner lectura_de_grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif pruebas.IdTextWithComplexValueInputFormat -vip /user/hduser/input/wiki-graph-chiquito.txt -op /user/hduser/output/caminosNavegacionales -w 2 -yh 250
Любая помощь будет оценен по достоинству!
UPDATE: Мой входной файл был неправ. Giraph (или мой пример) не очень хорошо обрабатывает исходящий в неперечисленную вершину.
Но проблема все еще происходит. Я обновил данные файла по моему первоначальному вопросу.
UPDATE 2: OutputFormat он не используется, и алгоритм вычисления также не выполняется. Я удаляю оба, чтобы помочь прояснить вопрос.
Обновление 3, 19/11/2015: Проблема не во входном формате, формат ввода работал хорошо и полностью считывал данные. Проблема была в классе TextAndDoubleComplexWritable
, я добавлю его в свой первоначальный вопрос, для лучшего объяснения окончательного решения для этого (я тоже добавил ответ).
Спасибо за ваш ответ, но если вы прочитали весь мой вопрос, проблема возникает, когда метод next(), который он вызывается, и этот метод должен остановить итерацию, когда читатель попадает в EOF, правильно? но это не так. И я не знаю почему, и именно поэтому я спрашиваю здесь! ;) – chomp
Пожалуйста, @ash или сторонники (upvoters) вашего ответа, будет здорово, если вы сможете дать мне дополнительную информацию для решения моей проблемы, я обновил свой вопрос с помощью немного дополнительной информации, пытаясь помочь как можно больше для этого .. – chomp
Вы решили проблему? Без понимания формата ввода, ожидаемого с помощью giraph, мне будет сложно помочь больше, чем это. – ash