TimeStamp Column in Data and general data ingestion

Hi Druid Developers,

I am a newbie to Druid and as a first step i am trying to load data into druid. Pardon me if my question is too naive.

i am looking into doc ‘http://druid.io/docs/latest/ingestion/’ for data ingestion.

From what i can understand is having Time-stamp column in your data file mandatory. Is this true? All i have is a date column at beginning of the file.If this is not true not then what should i mention is the spec file below

"timestampSpec" : {
        "column" : "timestamp",
        "format" : "auto"

Below is the first line of data. Can you please let me know if this format is acceptable. values in red are dimension values and rest are metric values. For now i want to load metric values as is into druid without any aggregation.Should i load them as dimension values as well or should i load them still as metricSpec but since i am loading all dimensions into druid it wont aggregate them.

3/7/2015^20^1366318122374311707 ^2216341272195416704 ^0^N^1514^Y^USD^Unbranded^87.9900^1^1^1.8300^0.0000^0.0000^0.0000^0.0000^0.0000^1.8300^0^0.483^0^0^0^87.99

Regards

Karteek

Hey,

i am not sure, if druid “auto” covers your specific timestamp format.

I assume you have day and month without leading zeros, so the format is probably “d/M/yyyy”.

DateTimeFormatter fmt = DateTimeFormat.forPattern(“d/M/yyyy”);

System.out.println(fmt.print(System.currentTimeMillis()));

``

See also Timestamp Spec:

Cheers.

Ah, and your other questions. Sure, looks acceptable to me. Delimiter is “^”, which you have to specify for your csv format. And then you also specify all dimensions in the order they appear in your csv line. And then you specify those columns, that you wanna include or exclude from that.

Thank a lot olaf. Will try and keep you posted.

karteek

Ok i tried loading the data ( batch ingestion) and i get following error. As a first step i just tried to load 1 line of data and i get following error. Can any one please help.? Also i am attaching the complete file we captured after running indexing command.

SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container

org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions**.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table ‘druid.druid_segments’ doesn’t exist [**statement:“SELECT id FROM druid_segments WHERE id = :identifier”, located:“SELECT id FROM druid_segments WHERE id = :identifier”, rewritten:“SELECT id FROM druid_segments WHERE id = ?”, arguments:{ positional:{}, named:{identifier:‘QBRData_2015-07-08T00:00:00.000Z_2015-07-09T00:00:00.000Z_2015-07-08T23:32:08.503Z’}, finder:}]

Data:

3/7/2015^20^1366318122374311707 ^2216341272195416704 ^0^N^1514^Y^USD^Unbranded^87.9900^1^1^1.8300^0.0000^0.0000^0.0000^0.0000^0.0000^1.8300^0^0.483^0^0^0^87.99

Indexing :

{

“type”: “index”,

“spec”: {

“dataSchema”: {

“dataSource”: “QBRData”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “tsv”,

“delimiter”: “^”,

“timestampSpec”: {

“column”: “transaction_reference_date”,

“format”: “M/d/yyyy”

},

“dimensionsSpec”: {

“dimensions”: [

“time_of_day”,

“rcvr_id”,

“sndr_id”,

“pkey”,

“is_pmt_xbdr_y_n”,

“pmt_flow_key”,

“is_pmt_official_y_n”,

“transaction_currency”,

“Transaction_Type”

],

“dimensionExclusions”: ,

“spatialDimensions”:

},

“columns”: [

“transaction_reference_date”,

“time_of_day”,

“rcvr_id”,

“sndr_id”,

“pkey”,

“is_pmt_xbdr_y_n”,

“pmt_flow_key”,

“is_pmt_official_y_n”,

“transaction_currency”,

“Transaction_Type”

]

}

},

“metricsSpec”: [

{

“type”: “doubleSum”,

“name”: “NTPV_USD_AMT”,

“fieldName”: “NTPV_USD_AMT”

},

{

“type”: “doubleSum”,

“name”: “NET_CNT”,

“fieldName”: “NET_CNT”

},

{

“type”: “doubleSum”,

“name”: “SUCCESS_CNT”,

“fieldName”: “SUCCESS_CNT”

},

{

“type”: “doubleSum”,

“name”: “BA_Revenue”,

“fieldName”: “BA_Revenue”

},

{

“type”: “doubleSum”,

“name”: “Seller_fx_Revenue”,

“fieldName”: “Seller_fx_Revenue”

},

{

“type”: “doubleSum”,

“name”: “Buyer_fx_Revenue”,

“fieldName”: “Buyer_fx_Revenue”

},

{

“type”: “doubleSum”,

“name”: “XBDR_Revenue”,

“fieldName”: “XBDR_Revenue”

},

{

“type”: “doubleSum”,

“name”: “Revenue_share”,

“fieldName”: “Revenue_share”

},

{

“type”: “doubleSum”,

“name”: “Pass_Through_Revenue”,

“fieldName”: “Pass_Through_Revenue”

},

{

“type”: “doubleSum”,

“name”: “Total_Sales_Revenue”,

“fieldName”: “Total_Sales_Revenue”

},

{

“type”: “doubleSum”,

“name”: “ACH_Cost”,

“fieldName”: “ACH_Cost”

},

{

“type”: “doubleSum”,

“name”: “CC_Cost”,

“fieldName”: “CC_Cost”

},

{

“type”: “doubleSum”,

“name”: “PP_Loss”,

“fieldName”: “PP_Loss”

},

{

“type”: “doubleSum”,

“name”: “Seller_Loss”,

“fieldName”: “Seller_Loss”

},

{

“type”: “doubleSum”,

“name”: “Buyer_Loss”,

“fieldName”: “Buyer_Loss”

},

{

“type”: “doubleSum”,

“name”: “LOC_CURR_NTPV”,

“fieldName”: “LOC_CURR_NTPV”

}

],

“granularitySpec”: {

“intervals”: [

“2015-01-01/2015-12-31”

]

}

},

“ioConfig”: {

“type”: “index”,

“firehose”: {

“type”: “local”,

“baseDir”: “/x/home/tsoliman/druid/data/batch/”,

“filter”: “SQLAExport.txt”

}

}

}

}

druid_segments_not_found.txt (55.3 KB)

Hm, so druid can connect to mysql, but the table “druid_segments” have not been created. Can you check, if the user that you connect with to the database, has permission to create tables? Did ya check the logfiles?

Also, why you use tsv? And not csv? The metricsSpec also looks a bit weird. You reference columns, that you did not define.

Olaf,

i will check on the permission for create table in my sql.

Regrading the data file itself

  1. I used TSV because for csv, in parspec there is no option to use delimiter. all i can use is list listDelimiter which is for multi-value dimension.i got bit confused there i and i just went ahead with TSV.
  2. Regarding your concerns with metricsSpec, should i refer all those columns in “columns” : ? Please let me know.
    Karteek

Olaf,

Thanks a lot for all the help. we were able to load and index data and were able to retrieve it through a query today.

Karteek

Hi Karteek ,
sorry to bump this very old message ,

what was the resolution for your issue ?

What "format " did you use in your timestamp spec ?

Thanks,

Anoosha