After load data: all segments of datasource are _not_ available

I’m new to apache druid and trying to get my first datasource. For this I’m using the example ‘wikipedia’ data set.

The task for ‘load data’ succeeds, but the afterwards the (22) segements of the datasource ‘wikipedia’ is published but not available. Hence there is no way to use it.

What I’m missing here? Is there an additional log to get more information about the problem?

You can check the Coordinator log/ Historical log for more details.
The log path should be :if you are running in quickstart mode

apache-druid-0.16.0-incubating/var/sv/historical.log

apache-druid-0.16.0-incubating/var/sv/coordinator-overlord.log

Thanks,

Vaibhav

You should see something like below in the Historical log of the segments are loded by the historical successfully :

2019-10-25T15:31:59,182 INFO [ZKCoordinator–0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment Test_seg_2017-06-27T00:00:00.000Z_2017-06-28T00:00:00.000Z_2019-10-25T15:31:19.753Z

2019-10-25T15:31:59,251 INFO [ZKCoordinator–0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[Test_seg_2017-06-27T00:00:00.000Z_2017-06-28T00:00:00.000Z_2019-10-25T15:31:19.753Z] at new path[/druid/segments/localhost:8083/localhost:8083_historical__default_tier_2019-10-25T15:31:59.250Z_19c2d4e930e744aa979f63c75a662f6a1]

2019-10-25T15:31:59,256 INFO [ZKCoordinator–0] org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: Test_seg_2017-06-27T00:00:00.000Z_2017-06-28T00:00:00.000Z_2019-10-25T15:31:19.753Z]

Thanks,
Vaibhav

Hhm, tried that many times now - but without success. See the attached logs for my last try. Overlord says that the (import) task succeeds, but I can’t see anything of what Vaibhav expects in historical.log.

My system details:

  • Fedora 30 (amd64)
  • Intel® Core™ i7-8700 CPU @ 3.20GHz, 32GB RAM
  • OpenJDK Runtime Environment 18.9 (build 11.0.5+10) (from Fedora RPMs)
  • druid 0.16.0
  • I’m starting druid with ./bin/start-micro-quickstart

historical.log (97.9 KB)

coordinator-overlord.log (156 KB)

From the attached log it seems you are running Java 11

2019-10-27T16:35:53,060 INFO [main] org.apache.druid.cli.CliHistorical - * java.home: /usr/lib/jvm/java-11-openjdk-11.0.5.10-0.fc30.x86_64

Could you please use **Java 8 (8u92+)** ? In my knoweledge Druid is not yet certified and tested on Java11 .


Please see the : Prerequisites:
[http://druid.incubator.apache.org/docs/latest/tutorials/index.html](http://druid.incubator.apache.org/docs/latest/tutorials/index.html)


Thanks,
Vaibhav

I have the same issue. Tried nano and micro settings. Switched from java 11 back to java 8. Loading CSV tasks show success - but none of the segments ever become “available” The status is orange and show 0.0%. I’m running Ubuntu 18 server in a Virtual Box VM.

this statnds out as a warming in the logs:
2019-10-28T19:56:04,796 WARN [Coordinator-Exec–0] org.apache.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Uh… I have no servers. Not assigning anything…

the console shows 6 servers running … will keep reading …

Hi MB,

Please run sample ingestion and attach below details to analyze on what is happening in your case:

[1] Indexing log

[2] Coordinator log

[3] Historical log

[4] Output of :
ps -eaf | grep -i java

from the vm host where you are running druid in quick start mode.

[5] As you said, ingestion job runs successfully, Could you also verify if the segments are created in deep storage, If you are running the micro-quickstart you need to check below directory :

apache-druid-0.16.0-incubating/var/druid/segments

apache-druid-0.16.0-incubating/var/druid/segment-cache
[6] Zookeeper log.

Thanks,

Vaibhav

Will do Vaibhav, thanks.

FYI - I did the same install on an Ubuntu Desktop VM - and it works. The NIC setup is different and I’m wondering if that’s a problem, since it one of the messages is “Uh … no servers found …” although all services appear to be up , and the overloard/coordinator process listens on two ports including 127.0.0.1:1527 …

Hi MRB,

Could you verify if your druid service(especially historical/coordinator) is listening on the respective port: I have posted here the list of default ports, please refer the plain ports [ assuming you are running micro-quickstart with default settings].

**Plain Ports**                 TLS port      ** Service Name**

   8081                       8281          coordinator

   8090                       8290         Overload

   8083                       8283         Historical

   8082                       8282           Broker

   8888                       9088           Router

   8091                       8291          Middle Manager

If the processes or alive and listening on the port it could be some connection related issue while talking to the zookeeper, May be logs will have more details on the issue.

Thanks,
Vaibhav

Hi Vaibhav,

Here are my processes:
tcp6 0 0 :::8091 :::* LISTEN 1464/java

tcp6 0 0 :::2181 :::* LISTEN 1444/java

tcp6 0 0 :::8081 :::* LISTEN 1445/java

tcp6 0 0 :::8082 :::* LISTEN 1448/java

tcp6 0 0 :::8083 :::* LISTEN 1460/java

tcp6 0 0 127.0.0.1:1527 :::* LISTEN 1445/java

tcp6 0 0 :::8888 :::* LISTEN 1454/java

logs are attached …

I changed the networking setup on ubuntu desktop to match the failing ubuntu server to test - and the indexing again was sucessful on the dersktop.
I will send the zooker log but I cant find in. Looked under zk?!

Thanks again,

Mark

mrblogs.tar.gz (72.9 KB)

Hi MRB,

Thank you for the logs. I have been through the logs and observe below :
As indexing task log is not attached I have picked up index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z from the MM log to see what happens.
[1] MM log :
2019-10-29T20:32:22,897 INFO [WorkerTaskManager-CompletedTasksCleaner] org.apache.druid.indexing.worker.WorkerTaskManager - Deleting completed task[index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z] information, overlord task status[SUCCESS].
[2] Coordinator Overlord lask:
2019-10-29T20:31:21,271 INFO [qtp1677115089-82] org.apache.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z]: SegmentTransactionalInsertAction{segmentsToBeOverwritten=null, segments=[DataSegment{binaryVersion=9, id=wikiticker-2015-09-12-sampled-test_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-10-29T20:31:04.031Z, loadSpec={type=>local, path=>/home/mrb/apache-druid-0.16.0-incubating/var/druid/segments/wikiticker-2015-09-12-sampled-test/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-10-29T20:31:04.031Z/0/index.zip}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, namespace, page, regionIsoCode, regionName, user], metrics=[count, sum_added, sum_deleted, sum_delta, sum_metroCode], shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, size=4322529}], startMetadata=null, endMetadata=null}
2019-10-29T20:31:21,385 INFO [qtp1677115089-82] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Published segment [wikiticker-2015-09-12-sampled-test_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-10-29T20:31:04.031Z] to DB with used flag [true], json[{“dataSource”:“wikiticker-2015-09-12-sampled-test”,“interval”:“2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z”,“version”:“2019-10-29T20:31:04.031Z”,“loadSpec”:{“type”:“local”,“path”:"/home/mrb/apache-druid-0.16.0-incubating/var/druid/segments/wikiticker-2015-09-12-sampled-test/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-10-29T20:31:04.031Z/0/index.zip"},“dimensions”:“channel,cityName,comment,countryIsoCode,countryName,isAnonymous,isMinor,isNew,isRobot,isUnpatrolled,namespace,page,regionIsoCode,regionName,user”,“metrics”:“count,sum_added,sum_deleted,sum_delta,sum_metroCode”,“shardSpec”:{“type”:“numbered”,“partitionNum”:0,“partitions”:0},“binaryVersion”:9,“size”:4322529,“identifier”:“wikiticker-2015-09-12-sampled-test_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-10-29T20:31:04.031Z”}]
2019-10-29T20:31:22,390 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[localhost:8091] wrote SUCCESS status for task [index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z] on [TaskLocation{host=‘localhost’, port=8100, tlsPort=-1}]
2019-10-29T20:31:22,390 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[localhost:8091] completed task[index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z] with status[SUCCESS]
2019-10-29T20:31:22,391 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Received SUCCESS status for task: index_parallel_wikiticker-2015-09-12-sampled-test_2019-10-29T20:30:40.053Z
2019-10-29T20:31:24,814 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader - Found [187] used segments.
2019-10-29T20:37:54,188 INFO [org.apache.druid.metadata.SQLMetadataSegmentManager-Exec–0] org.apache.druid.metadata.SQLMetadataSegmentManager - Polled and found 188 segments in the database
[3] Historical log:
2019-10-29T20:11:48,561 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]

All the highlighted parts are self-explanatory, which shows that the indexing job ran successfully / segments are created into deep storage/ MetaDb database was updated and the coordinator is able to see 188 segments from DB. Till here everything seems ok !!!

Now, As the Coordinator does not directly communicate with a historical process when assigning it a new segment; instead, the Coordinator creates some temporary information about the new segment under load queue path of the historical process. Once this request is seen, the historical process will load the segment and begin servicing it.

It seems the Issue to be somewhere at COORDINATOR/ZOOKEEPER/HISTORICAL while communicating to historical to load the segments. I also don’t see further log printed after 20:11:48 in the historical log.

[a] Could you connect to your zookeeper server and see if these paths exist:
ls /druid
[servedSegments, coordinator, listeners, loadQueue, discovery, internal-discovery, indexer, announcements, overlord, segments]

Hi Vaibhav,

I connected to zk and confirmed the existence of the folders : [servedSegments, coordinator, listeners, loadQueue, discovery, internal-discovery, indexer, announcements, overlord, segments]
[b] Connect to your metadb and run: and see what does it return
select count(*) from DRUID_SEGMENTS where used=true;
I don’t know how to connect to metadb, is that a derby database? I will loo into it…

Thanks again,

Mark

[b] Connect to your metadb and run: and see what does it return
select count(*) from DRUID_SEGMENTS where used=true;

Hi Mark,

To reproduce this issue, I just set up ubuntu-18.04.3-live-server in VirtualBox with default settings and ran the ingestion for wikiticker-2015-09-12-sampled.json.gz but was unsuccessful to reproduce this issue.

Am I missing something while reproducing this issue?

What I see further in your coordinator log is that It has never performed the LoadRule call in your case.I suspect if something has messed up with the Load rule.

In normal scenario , Below message could be seen in the coordinator log :

2019-11-01T23:04:21,970 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning ‘primary’ for segment [wikiticker-2015-09-12-sampled_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-11-01T23:03:17.051Z] to server [localhost:8083] in tier [_default_tier]

>> Could you please connect to your metastore db (derby) and query below tables :
DRUID_RULES

DRUID_SEGMEMENTS

Ok I attached the results of both selects to the metadata.db. I may have screwed up a rule accidently - I don’t know… I’m brand new to Druid … Thank you. -Mark

Thanks,

Mark

MRBout.txt (431 KB)

I figured it out --I screwed up the cluster defaults and set replication to 0 … so once I changed to 1 everything became available.

Thank you so much for your help - I’m sorry I wasted your time. I will now read through the docs.

Thanks again,

Mark

Fantastic !!:+1:.

Thanks,

Vaibhav

Hey MRB,

I’m currently facing the same issue, but unable to find how to set the replica. Can you pint me in the right direction.

Hi Shubham,

You can set the replication for the data-source from the metaData-DB. The table name is DRUID_RULES

Or this can be achieved from the Druid unified console which runs on port 8888 under the data-source -> Retention Rules. Screen-shot attached for the same.

Thanks,

Hemanth

Shubham,

I accidently screwed up my cluster default settings. But I did change it back to 1, I believe it defaults at 2.

In the console, click datasources, next to a datasource click the wrench, select edit retention rules. click the link to edit cluster defaults, be sure # of replicants is greater than 0. That’s what worked for me.

-mark