spatialDimensions specification for TSV file using lat/lon columns

Assuming an example TSV file format as follows:

columnA|columnB|latitude|longitude

And the columns and dimensions are specified as:

  "columns" : [ "columnA",        
                "columnB",
                "latitude",
                "longitude"
   ],
  "delimiter":"|",
  "dimensionsSpec" : {
    "dimensions" : [
                    "columnA",            
                    "columnB",
                    "latitude",
                    "longitude"
    ],

``

The documentation gives the following example assuming a JSON label with an array of two values

    "spatialDimensions" : [
      {
        "dimName": "coorindates",
        "dims": ["latitude", "longitude"]
      }
    ]

``

The documentation says dimName is required and is “The name of the spatial dimension. A spatial dimension may be constructed from multiple other dimensions or it may already exist as part of an event. If a spatial dimension already exists, it must be an array of coordinate values.”

So the logical JSON structure for multiple other dimensions is this, which does parse the data and start the map reduce tasks

    "spatialDimensions" : [
      {
        "dimName" : "latitude",
        "dims" : []
      },
      {
        "dimName" : "longitude",
        "dims" : []
      },
    ]

``

but when processing the data, this throws an java.lang.IllegalArgumentException when inserting into the RTree as shown below. This is where the insert checks that the incoming array of floats (coords) contains the proper number of dimensions, which is specified when the rtree is created.

/** @param coords - the coordinates of the entry

  • @param entry - the integer to insert
    */
    public void insert(float coords, int entry) {
    Preconditions.checkArgument(coords.length == numDims);
    insertInner(new Point(coords, entry, bitmapFactory));
    }

``

I’m guessing at the proper JSON structure, am I specifying this incorrectly ? It looks like the number of dimensions for the RTREE is mismatching the incoming values in the float array for the RTree index.

2016-08-18T21:54:57,091 INFO [main] org.apache.hadoop.mapred.JobClient - Task Id : attempt_201608160124_0063_r_000007_0, Status : FAILED on node b141-18
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at com.metamx.collections.spatial.RTree.insert(RTree.java:89)
at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:974)
at io.druid.segment.IndexMerger.merge(IndexMerger.java:423)
at io.druid.segment.IndexMerger.persist(IndexMerger.java:195)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.persist(IndexGeneratorJob.java:501)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:672)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:620)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:458)
at org.apache.hadoop.mapred.Child$4.run(Child.java:278)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.Child.main(Child.java:267)

Hello,

If you’re composing the spatial dimension from two other dimensions, you’ll want to use this syntax:

"spatialDimensions" : [ { "dimName": "coorindates", "dims": ["latitude", "longitude"] } ]

“dims” there indicates the individual component dimensions that are used to construct the new spatial “coordinates” dimension

Hi Jonathan -

That was the first thing I tried. But when including “latitude” and “longitude” in the columns and dimensions list (as shown in the original post), and using your recommended snippet, the hadoop indexing job fails during parsing with an error indicating it is looking for column “coordinates”.

2016-08-22T17:00:26,794 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:115) [druid-services-0.9.1.1.jar:0.9.1.1]
at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.1.1.jar:0.9.1.1]
Caused by: java.lang.IllegalArgumentException: Instantiation of [simple type, class io.druid.data.input.impl.DelimitedParseSpec] value failed: column[coordinates] not in columns.
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[jackson-databind-2.4.6.jar:2.4.6]

Adding the column name “coordinates” to the end of the column list fixes that. This isn’t very well documented, but seems to work !

You spelt “coordinates” incorrectly in the spec.