Large Dimension value data not working

Hi guys,

I am using Druid 0.9.1

I have inserted data by local indexer.

I am not getting any result if my dimension value is large.

timeseries, topN, groupBy is not working.

here is my i/p json—

{“timestamp”:“2016-07-23T00:00:00.000Z”,“spt”:“all sports,baseball,basketball,college sports,cycling,football,golf,hockey,motor sports,skiing,snowboarding,soccer,track and field,water sports,weightlifting”,“nplt”:“democratic party,eligible voters,environmentalism,government and politics,independent party,republican party”,“abr”:“acura,audi,bmw,honda,infiniti,jaguar,land rover,lexus,mercedes benz,mini,porsche,saab,saturn,scion,subaru,toyota,volvo”,“tvl”:“all travel,cruise travel,domestic travel,frequent travel,international travel,summer travel,thanksgiving travel,vacation travel”,“artc”:“art,books & magazines,crafts and hobbies,photography,reading”,“tvp”:“all tv programming,animation & cartoon tv,drama tv,reality tv”,“hinc”:“hhi $100k +,hhi $60k - $74k,hhi $75k - $99k”,“htp”:“single-family house”,“st”:“CauC-Snew south walesS”,“otp”:“business,c-level executives,career & employment,finance professionals,it professionals,small business professionals”,“db”:“apple”,“hmfm”:“affluent households,animal lovers,cat lovers,dog lovers,green living,home & family,home decorating,home improvement,new years resolution makers,online dating,outdoor enthusiasts,parenting,school & education”,“oind”:“business services,construction,finance,government,healthcare,manufacturing,real estate,retail,software,wholesalers”,“devt”:“tablet”,“sfcl”:“clothes shopping,cosmetics,jewelry and fashion accessories,online shoppers,online shoppers luxury,shoppers childrens clothing,shoppers cosmetics luxury,shoppers mens clothing luxury,shoppers mens clothing,shoppers womens clothing,shoppers womens clothing luxury,style & fashion,trendsetters”,“ct”:“CauC-Snew south walesS-TashfieldT”,“mart”:“married,single”,“entms”:“all music,country music,hip hop & rap music,pop cultre & celebrity gossip,rock music”,“pf”:“auto insurance,homeowners insurance,life insurance,online banking,personal finance,personal or health insurance,real estate,retirement planning,stocks”,“vgm”:“all video games,console games,xbox 360 users”,“entmv”:“action & adventure movies,comedy movies,drama movies,movies all,romance movies,sci-fi movies,summer blockbusters”,“ck”:“alcoholic beverages,cooking & recipes,food & beverages,holiday bakers,restaurants & dining,thanksgiving food”,“csz”:“medium (50-249),medium-large (250-999),small (1-49)”,“auto”:“all automobiles,automobile domestic,automobile foreign,luxury cars,mid-sized cars,sport utility vehicles (suvs),trucks,vans & minivans”,“ln”:“en”,“dmk”:“finance decision makers,it decision makers,sales and marketing decision makers,small business decision makers”,“prnt”:“declared parents,parents of pre-teens,parents of teenagers”,“fbvg”:“food & beverages”,“snrt”:“board members,executives,mid-management,non-management,small businesses”,“lang”:“english”,“pgrp”:“business professionals,finance professionals,high income professionals,it professionals,sales and marketing professionals,small business professionals”,“sevt”:“academy awards (oscars),back to school,billboard latin music awards,black friday and cyber monday,boston marathon,fifa world cup,golden globe awards,gospel music association dove awards,grammy awards,holiday shopping,march college basketball,masters tournament,mlb all-star game,nba finals,nfl draft,olympic sports,peoples choice awards,pga championship,stanley cup finals,summer olympic sports,super bowl,the open championship (golf),tour de france,winter olympic ice hockey,winter olympic skiing,winter olympic sports”,“dm”:“ipad”,“edu”:“high school degree,some college”,“chrt”:“charitable donors”,“hval”:“home value - 200k-400k,home value - 400k-750k”,“zip”:“Z2131Z”,“hld”:“1”,“cnt”:“CauC”,“cown”:“private”,“didtp”:“idfa”,“pid”:“12968”,“styp”:“2”,“cred”:“excellent,good”,“ethn”:“white”,“tech”:“computers & software,computers & technology,electronics & gadgets,home audio & video,mobile phones”,“farea”:“c-suite,education,finance,hr,information technology,marketing,medical/health,sales”,“hlds”:“cinco de mayo,earth day,easter,fathers day,fourth of july,halloween,memorial day,mothers day,st. patricks day,valentines day”,“hlv”:“dieting and weight loss,health & fitness”,“pet”:“pet owners”,“hten”:“owns primary residence”,“stud”:“college students”,“smd”:“influencers,social media users”}

I have processed 500 segment and each segment I have 50k rows like this.

can we process this type of large dimension values?

Because if I remove large dimension values then it is working fine.

Thanks,

Jitesh

Hey Jitesh,

What do you mean by “not working”? Do you get an error? No results? Something else? Are any of your Druid nodes logging exceptions?

Thanks for quick reply.

no exception in logs and not returning any result. I am seeing sometime its reply quickly with no data.

Hi

can we use Multi-value dimensions in to put data in segment ?

http://druid.io/docs/latest/querying/multi-value-dimensions.html

if we can use this, please can you tell me how to store multi-value data in segment.

Hi guys,

I am using Druid 0.9.1

I have inserted data by local indexer.

I am not getting any result if my dimension value is large.

timeseries, topN, groupBy is not working.

here is my i/p json—

{“timestamp”:“2016-07-23T00:00:00.000Z”,“spt”:“all sports,baseball,basketball,college sports,cycling,football,golf,hockey,motor sports,skiing,snowboarding,soccer,track and field,water sports,weightlifting”,“nplt”:“democratic party,eligible voters,environmentalism,government and politics,independent party,republican party”,“abr”:“acura,audi,bmw,honda,infiniti,jaguar,land rover,lexus,mercedes benz,mini,porsche,saab,saturn,scion,subaru,toyota,volvo”,“tvl”:“all travel,cruise travel,domestic travel,frequent travel,international travel,summer travel,thanksgiving travel,vacation travel”,“artc”:“art,books & magazines,crafts and hobbies,photography,reading”,“tvp”:“all tv programming,animation & cartoon tv,drama tv,reality tv”,“hinc”:“hhi $100k +,hhi $60k - $74k,hhi $75k - $99k”,“htp”:“single-family house”,“st”:“CauC-Snew south walesS”,“otp”:“business,c-level executives,career & employment,finance professionals,it professionals,small business professionals”,“db”:“apple”,“hmfm”:“affluent households,animal lovers,cat lovers,dog lovers,green living,home & family,home decorating,home improvement,new years resolution makers,online dating,outdoor enthusiasts,parenting,school & education”,“oind”:“business services,construction,finance,government,healthcare,manufacturing,real estate,retail,software,wholesalers”,“devt”:“tablet”,“sfcl”:“clothes shopping,cosmetics,jewelry and fashion accessories,online shoppers,online shoppers luxury,shoppers childrens clothing,shoppers cosmetics luxury,shoppers mens clothing luxury,shoppers mens clothing,shoppers womens clothing,shoppers womens clothing luxury,style & fashion,trendsetters”,“ct”:“CauC-Snew south walesS-TashfieldT”,“mart”:“married,single”,“entms”:“all music,country music,hip hop & rap music,pop cultre & celebrity gossip,rock music”,“pf”:“auto insurance,homeowners insurance,life insurance,online banking,personal finance,personal or health insurance,real estate,retirement planning,stocks”,“vgm”:“all video games,console games,xbox 360 users”,“entmv”:“action & adventure movies,comedy movies,drama movies,movies all,romance movies,sci-fi movies,summer blockbusters”,“ck”:“alcoholic beverages,cooking & recipes,food & beverages,holiday bakers,restaurants & dining,thanksgiving food”,“csz”:“medium (50-249),medium-large (250-999),small (1-49)”,“auto”:“all automobiles,automobile domestic,automobile foreign,luxury cars,mid-sized cars,sport utility vehicles (suvs),trucks,vans & minivans”,“ln”:“en”,“dmk”:“finance decision makers,it decision makers,sales and marketing decision makers,small business decision makers”,“prnt”:“declared parents,parents of pre-teens,parents of teenagers”,“fbvg”:“food & beverages”,“snrt”:“board members,executives,mid-management,non-management,small businesses”,“lang”:“english”,“pgrp”:“business professionals,finance professionals,high income professionals,it professionals,sales and marketing professionals,small business professionals”,“sevt”:“academy awards (oscars),back to school,billboard latin music awards,black friday and cyber monday,boston marathon,fifa world cup,golden globe awards,gospel music association dove awards,grammy awards,holiday shopping,march college basketball,masters tournament,mlb all-star game,nba finals,nfl draft,olympic sports,peoples choice awards,pga championship,stanley cup finals,summer olympic sports,super bowl,the open championship (golf),tour de france,winter olympic ice hockey,winter olympic skiing,winter olympic sports”,“dm”:“ipad”,“edu”:“high school degree,some college”,“chrt”:“charitable donors”,“hval”:“home value - 200k-400k,home value - 400k-750k”,“zip”:“Z2131Z”,“hld”:“1”,“cnt”:“CauC”,“cown”:“private”,“didtp”:“idfa”,“pid”:“12968”,“styp”:“2”,“cred”:“excellent,good”,“ethn”:“white”,“tech”:“computers & software,computers & technology,electronics & gadgets,home audio & video,mobile phones”,“farea”:“c-suite,education,finance,hr,information technology,marketing,medical/health,sales”,“hlds”:“cinco de mayo,earth day,easter,fathers day,fourth of july,halloween,memorial day,mothers day,st. patricks day,valentines day”,“hlv”:“dieting and weight loss,health & fitness”,“pet”:“pet owners”,“hten”:“owns primary residence”,“stud”:“college students”,“smd”:“influencers,social media users”}

I have processed 500 segment and each segment I have 50k rows like this.

can we process this type of large dimension values?

Because if I remove large dimension values then it is working fine.

Thanks,

Jitesh

historical node not returning any data for simple query also.

query :

{

“queryType”: “timeseries”,

“dataSource”: “test_cluster”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “count”,

“name”: “numz_count”,

“fieldName”: “numz”

}

],

“intervals”: [

“2016-07-28T00:00/2016-07-28T00:01”

]

}

I am attaching logs when we query to historical node.

Please help me to figure it out.

Thanks,

Jitesh

historical-logs.txt (26.1 KB)

Jitesh, can you include the entire historical log and also the input and output of your query?
This includes the command line argument that you used.