Optimize rows configuration.

Hi all,

I have a question about the groupBy.result and the groupBy.intermediateRows … if I think fine … The historical, the broker and the realtime tasks have this properties:

  1. druid.query.groupBy.maxIntermediateRows
  2. druid.query.groupBy.maxResults
    The first is used to compute the groupBy queries but this size is not the result is only the number of rows that the the nodes use to compute the result.

The second value is the maximum number of the rows that the node can compute to make a result and return back to the broker.

When the historical and realtime tasks get back the result to the broker. The broker merges this results and does something similar. If the rows increase more that the number of the second properties the node throws a exception like: " Limit number of rows." or something similar, isn’t??

I also know that the groupBy queries are processing using the HEAP memory so:

  1. I think that I should configure these properties based on my heap memory… Is this correct? How can I calculate the correct values to this properties??

  2. I think that the broker need higher values than historical or realtime tasks because the broker need to merge these values to get a big value … isn’t ??

Regards and thanks,

Andres

Hi, see inline.

Hi Frangjin,

I’m going to continue my configuration testing and when I have something more clear, I will be loved for do some documentation and share with yours. :slight_smile:

I have a one more question:

Do you have some idea that how I can calculate the size (bytes) of a single row???

Regards,

Andrés Gómez

Developer****

redborder.net / agomez@redborder.net

Phone: +34 955 60 11 60

0e6e8de_1.png

square-twitter-20.png square-google-plus-20.png square-linkedin-20.png

Piénsalo antes de imprimir este mensaje

Este correo electrónico, incluidos sus anexos, se dirige exclusivamente a su destinatario. Contiene información CONFIDENCIAL cuya divulgación está prohibida por la ley o puede estar sometida a secreto profesional. Si ha recibido este mensaje por error, le rogamos nos lo comunique inmediatamente y proceda a su destrucción.

This email, including attachments, is intended exclusively for its addressee. It contains information that is CONFIDENTIAL whose disclosure is prohibited by law and may be covered by legal privilege. If you have received this email in error, please notify the sender and delete it from your system.

En 3 de octubre de 2015 en 3:37:02, Fangjin (fangjinyang@gmail.com) escrito:

Do you mean a row in Druid or a row in your raw data?

Also, what is the use case?

I mean a row in Druid.

And I will like know the row size to estimate how many rows we can save on RAM when I do groupByQueries.

For example:

1 row ~ 1024 bytes

I have 30 GBytes of RAM on my broker and I choose use 10 Gbytes to groupByQueries.

I can configure the numbers of row using:

  1. druid.query.groupBy.maxIntermediateRows
  2. druid.query.groupBy.maxResults
    so —>

10 Gbytes RAM / 1024 bytes row = 10.000.000 rows

Is it more clear?

Regards,

Andrés Gómez

Developer****

redborder.net / agomez@redborder.net

Phone: +34 955 60 11 60

0e6e8de_1.png

square-twitter-20.png square-google-plus-20.png square-linkedin-20.png

Piénsalo antes de imprimir este mensaje

Este correo electrónico, incluidos sus anexos, se dirige exclusivamente a su destinatario. Contiene información CONFIDENCIAL cuya divulgación está prohibida por la ley o puede estar sometida a secreto profesional. Si ha recibido este mensaje por error, le rogamos nos lo comunique inmediatamente y proceda a su destrucción.

This email, including attachments, is intended exclusively for its addressee. It contains information that is CONFIDENTIAL whose disclosure is prohibited by law and may be covered by legal privilege. If you have received this email in error, please notify the sender and delete it from your system.

En 6 de octubre de 2015 en 17:00:26, Fangjin Yang (fangjinyang@gmail.com) escrito:

No, this is difficult to measure because you will have strings in your result that will be of variable size.