High query segment time on historicals and many pending segment scans

historical config (1 node) :
510 GB disk,122880 MB RAM,16 vCPU

jvm.config:

-server

-Xms10g

-Xmx10g

-XX:NewSize=5g

-XX:MaxNewSize=5g

-XX:MaxDirectMemorySize=15g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/opt/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

runtime properties:

druid.server.http.numThreads=25

Processing threads and buffers

druid.processing.numThreads=13

druid.cache.type=local

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.server.tier=hot

druid.cache.sizeInBytes=5000000000

druid.historical.cache.unCacheable=

druid.query.groupBy.maxIntermediateRows=60000000

druid.query.groupBy.maxResults=24000000

Segment storage

druid.segmentCache.locations=[{“path”:"/x/path",“maxSize”:120000000000}]

druid.server.maxSize=120000000000

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“io.druid.server.metrics.HistoricalMetricsMonitor”, “com.metamx.metrics.JvmMonitor”]

Data size is about 80gb with 200 segments , each of ~400 mb size ; indexed with target partition size of 25000000 and monthly granularity giving 200 shards in 14 intervals. 30 dimensions and 20 metrics

i am getting these logs for groupby queries with 2 dimensions and 5 metrics (slightly better for timeseries) : query/segment/time of ~10s

jvm/gc/time ~200

jvm/gc/count ~ 1500

no swaps are occurring and all segments are memory mapped still segment/scan/pending is 200 for each new query ; why is there so much latency?

Does this help?
http://druid.io/docs/0.9.2/operations/performance-faq.html