Cannot Parse Data - Druid - rabbitmq-postgres

Hey Guys,

So i was trying druid-rabbitmq-postgres…

I was able to start druid up … but now the next problem came :stuck_out_tongue:

Druid is not ingesting the data … I think its the issue with timestamp …

the data to be ingested is like this

{“created_at”: “2015-08-12 18:28:00”,“updated_at”: “2015-08-12 00:00:00”,“name”: “demo_adsets”,

“ad_set_id”: “112504826”,“ad_account_id”: “29”,“fb_ad_campaign_id”: “28”,“campaign_status”: “ACTIVE”,

“daily_budget”: “2500”,“lifetime_budget”: “300”,“budget_remaining”: “0”,

“start_time”: “2015-08-10”,“end_time”: “2015-08-10”,“updated_time”: “0”,“created_time”: “0”,

“bid_type”: “0”,“bid_info”: “0”,“targeting”: “0”,“is_autobid”: “false”,“user_status”: “0”,“campaign_schedule”: “0”, },

and my realtime.spec file is configured like this :

[

{

“dataSchema” : {

“dataSource” : “fb_ad_sets”,

“parser” : {

“type” : “base64”,

“parseSpec” : {

“format” : “json”,

“timestampSpec” : {

“column” : “created_at”,

“format” : “iso”

},

“dimensionsSpec” : {

“dimensions”: [“updated_at”,“name”, “ad_set_id”,“ad_account_id”,“fb_ad_campaign_id”,“campaign_status”, “daily_budget”,

“lifetime_budget”, “budget_remaining”, “start_time”, “end_time”, “updated_time”, “created_time”, “bid_type”, “bid_info”, “targeting”,

“is_autobid”, “user_status”, “campaign_schedule”],

“dimensionExclusions” : ,

“spatialDimensions” :

}

}

},

“metricsSpec” : [

{

“type” : “count”,

“name” : “count”

},

{

“type” : “doubleSum”,

“name” : “added”,

“fieldName” : “ad_set_id”

},

{

“type” : “doubleSum”,

“name” : “deleted”,

“fieldName” : “campaign_status”

}

],

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “DAY”,

“queryGranularity” : “NONE”

}

},

“ioConfig” : {

“type” : “realtime”,

“firehose” : {

“type” : “rabbitmq”,

“connection” : {

“host”: “localhost”,

“port”: “5672”,

“username”: “guest”,

“password”: “guest”,

“virtualHost”: “/”,

“uri”: “amqp://localhost/”

},

“config” : {

“exchange”: “postgres”,

“queue” : “transactions”,

“routingKey”: “publish_transactions”,

“durable”: “true”,

“exclusive”: “false”,

“autoDelete”: “false”,

“maxRetries”: “10”,

“retryIntervalSeconds”: “1”,

“maxDurationSeconds”: “300”

}

},

“plumber”: {

“type”: “realtime”

}

},

“tuningConfig”: {

“type” : “realtime”,

“maxRowsInMemory”: 500000,

“intermediatePersistPeriod”: “PT10m”,

“windowPeriod”: “PT10m”,

“basePersistDirectory”: “/tmp/realtime/basePersist”,

“rejectionPolicy”: {

“type”: “serverTime”

}

}

}

]

How do I enable debug logging for viewing the exception trace :

I tried these :

druid.request.logging.type=file

druid.request.logging.dir=log

com.metamx.emitter.logging.level=error

com.metamx.emitter.logging.type=file

com.metamx.emitter.logging.dir=log

Tried running the realtime node with this command too ‘-Ddruid.emitter.logging.logLevel=debug’

Realtime node gives the following error :

2015-08-12T06:37:31,220 ERROR [MonitorScheduler-0] io.druid.segment.realtime.RealtimeMetricsMonitor - [1] Unparseable events! Turn on debug logging to see exception stack trace.

2015-08-12T06:37:31,220 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2015-08-12T06:37:31.220Z”,“service”:“realtime”,“host”:“127.0.0.1:8083”,“metric”:“events/unparseable”,“value”:1,“user2”:“fb_ad_sets”}]

2015-08-12T06:37:31,220 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2015-08-12T06:37:31.220Z”,“service”:“realtime”,“host”:“127.0.0.1:8083”,“metric”:“events/processed”,“value”:0,“user2”:“fb_ad_sets”}]

Thanks ,

Shantanu

“2015-08-12 18:28:00” is not ISO format (https://en.wikipedia.org/wiki/ISO_8601)

You can use the custom format for Druid to be able to read your timestamp

Hey Fangjin,

Yeah , I figured that out … Postgres is changing the ISO datetime format to “2015-08-12 18:28:00” removing the T and timezone , when changing to JSON.

I tried setting the custom date time format like this :

“timestampSpec” : {

“column” : “created_at”,

“format” : “YYYY-MM-DD hh:mm:ss”

},

But did’nt work .

I tried to enable logging by including “druid.emitter.logging.logLevel=error” line in realtime/runtime.properties and common.runtime.properties, I also tried to include in the command to run the realtime node as following :

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=config/realtime/realtime.spec -Ddruid.emitter.logging.logLevel=error -classpath config/_common:lib/*:config/realtime io.druid.cli.Main server realtime

What Am I doing wrong ?

Thanks,

Shantanu

Shantanu, your time format is slightly off. Druid uses Joda time format specifiers as documented here http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

In your case, you want “yyyy-MM-dd HH:mm:ss”

Hey Fangjin,

Here’s my sample event :

{“created_at”: “2015-08-16T22:43:46.019137+05:30”,“updated_at”: “2015-08-10 00:00:00”,“name”: “demo_adsets”,“ad_set_id”: “112504826”,“ad_account_id”: “29”,“fb_ad_campaign_id”: “28”,“campaign_status”: “ACTIVE”,“daily_budget”: “2500”,“lifetime_budget”: “300”,“budget_remaining”: “0”,“start_time”: “2015-08-10”,“end_time”: “2015-08-10”,“updated_time”: “0”,“created_time”: “0”,“bid_type”: “0”,“bid_info”: “0”,“targeting”: “0”,“is_autobid”: “false”,“user_status”: “0”,“campaign_schedule”: “0”, }

And my realtime spec file is as following :

[

{

“dataSchema” : {

  "dataSource" : "fb_ad_sets",

  "parser" : {

    "type" : "base64",

    "parseSpec" : {

      "format" : "json",

      "timestampSpec" : {

        "column" : "created_at",

        "format" : "iso"

      },

      "dimensionsSpec" : {

        "dimensions": ["updated_at","name", "ad_set_id","ad_account_id","fb_ad_campaign_id","campaign_status", "daily_budget", 

        "lifetime_budget", "budget_remaining", "start_time", "end_time", "updated_time", "created_time", "bid_type", "bid_info", "targeting", 

        "is_autobid", "user_status", "campaign_schedule"],

        "dimensionExclusions" : [],

        "spatialDimensions" : []

      }

    }

  },

  "metricsSpec" : [

    {

      "type" : "count",

      "name" : "count"

    },

    {

      "type" : "doubleSum",

      "name" : "added",

      "fieldName" : "ad_set_id"

    },

    {

      "type" : "doubleSum",

      "name" : "deleted",

      "fieldName" : "campaign_status"

    }

  ],

  "granularitySpec" : {

    "type" : "uniform",

    "segmentGranularity" : "DAY",

    "queryGranularity" : "NONE"

  }

},

"ioConfig" : {

  "type" : "realtime",

  "firehose" : {

    "type" : "rabbitmq",

 "connection" : {

   "host": "localhost",

   "port": "5672",

   "username": "guest",

   "password": "guest",

   "virtualHost": "/",

   "uri": "amqp://localhost/"

 },

 "config" : {

   "exchange": "postgres",

   "queue" : "transactions",

   "routingKey": "publish_transactions",

   "durable": "true",

   "exclusive": "false",

   "autoDelete": "false",

   "maxRetries": "10",

   "retryIntervalSeconds": "1",

   "maxDurationSeconds": "300" 

 }

   }, 

   "plumber": {

    "type": "realtime"

  }

}, 

"tuningConfig": {

  "type" : "realtime",

  "maxRowsInMemory": 500000,

  "intermediatePersistPeriod": "PT10m",

  "windowPeriod": "PT10m",

  "basePersistDirectory": "\/tmp\/realtime\/basePersist",

  "rejectionPolicy": {

    "type": "serverTime"

  }

}

}

]

Thanks ,

Shantanu

I was able to ingest that event with a few small changes. Comments inline.

Hey Fangjin,

Here’s my sample event :

{“created_at”: “2015-08-16T22:43:46.019137+05:30”,“updated_at”: “2015-08-10 00:00:00”,“name”: “demo_adsets”,“ad_set_id”: “112504826”,“ad_account_id”: “29”,“fb_ad_campaign_id”: “28”,“campaign_status”: “ACTIVE”,“daily_budget”: “2500”,“lifetime_budget”: “300”,“budget_remaining”: “0”,“start_time”: “2015-08-10”,“end_time”: “2015-08-10”,“updated_time”: “0”,“created_time”: “0”,“bid_type”: “0”,“bid_info”: “0”,“targeting”: “0”,“is_autobid”: “false”,“user_status”: “0”,“campaign_schedule”: “0”, }

This is not properly formed json. There is a comma at the end.

And my realtime spec file is as following :

[

{

“dataSchema” : {

  "dataSource" : "fb_ad_sets",
  "parser" : {
    "type" : "base64",
    "parseSpec" : {
      "format" : "json",
      "timestampSpec" : {
        "column" : "created_at",
        "format" : "iso"
      },
      "dimensionsSpec" : {
        "dimensions": ["updated_at","name", "ad_set_id","ad_account_id","fb_ad_campaign_id","campaign_status", "daily_budget", 
        "lifetime_budget", "budget_remaining", "start_time", "end_time", "updated_time", "created_time", "bid_type", "bid_info", "targeting", 
        "is_autobid", "user_status", "campaign_schedule"],
        "dimensionExclusions" : [],
        "spatialDimensions" : []
      }
    }
  },
  "metricsSpec" : [
    {
      "type" : "count",
      "name" : "count"
    }, 
    {
      "type" : "doubleSum",
      "name" : "added",
      "fieldName" : "ad_set_id"
    },

<<< This is not a metric

    {
      "type" : "doubleSum",
      "name" : "deleted",
      "fieldName" : "campaign_status"
    }

<<< This is not a metric