indexing-hadoop is possible to occur IndexOutOfBoundsException

to see the class IndexGeneratorJob , it use SortableBytes.useSortableBytesAsMapOutputKey(job); to set the comparator sort class . My questions is about the class SortableBytesGroupingComparator and SortableBytesSortingComparator of SortableBytes; both of these two code have a similar code fragment in their compare method like these :

int b1Length = ByteBuffer.wrap(b1, s1 + 4, l1 - 4).getInt();
int b2Length = ByteBuffer.wrap(b2, s2 + 4, l2 - 4).getInt();

final int retVal = compareBytes(
    b1, s1 + 8, b1Length,
    b2, s2 + 8, b2Length
);
In the SortableBytes scenario ,the b1Length and b2Length vars really mean the shard number of the group key ,so the compareBytes is possible to occur the IndexOutOfBoundsException .
am I right ?

does anyone can explain the indexing-hadoop ‘s IndexGeneratorJob sorting 、grouping、 partition theory? I am curious about the SortableKeys compare method implementation, it seems that the Sortablekeys byte format is like follow:

groupkeylength(4 bytes) + groupKey [ shardnum (4bytes)+timestamp(8bytes)+partitionNum(4bytes)+options(xxx) ] + sortKey[ timestamp(8bytes) + hashval (128 bytes) ]

but the SortableBytesSortingComparator and SortableBytesGroupingComparator seem not directly use the group key or sort key part of the SortableBytes to compare the sequence. also I doubt about its implementation as my last post showed that the shard number may cause the OutOfBoundsException if the shard number is larger than the SortableBytes remaining bytes length.

在 2016年11月8日星期二 UTC+8下午4:25:51,weijie tong写道:

I recompute the logic and find that using the shard number won’t out of the byte array bounds ,but I recommend we should rewrite the compare logic to be clear .for example we use the group key part bytes to group ,use the group key and the sort key part for sort ,or the committers may give some explain.

在 2016年11月10日星期四 UTC+8下午2:03:31,weijie tong写道:

does anyone can explain the indexing-hadoop ‘s IndexGeneratorJob sorting 、grouping、 partition theory? I am curious about the SortableKeys compare method implementation, it seems that the Sortablekeys byte format is like follow:

groupkeylength(4 bytes) + groupKey [ shardnum (4bytes)+timestamp(8bytes)+partitionNum(4bytes)+options(xxx) ] + sortKey[ timestamp(8bytes) + hashval (128 bytes) ]

but the SortableBytesSortingComparator and SortableBytesGroupingComparator seem not directly use the group key or sort key part of the SortableBytes to compare the sequence. also I doubt about its implementation as my last post showed that the shard number may cause the OutOfBoundsException if the shard number is larger than the SortableBytes remaining bytes length.

在 2016年11月8日星期二 UTC+8下午4:25:51,weijie tong写道:

to see the class IndexGeneratorJob , it use SortableBytes.useSortableBytesAsMapOutputKey(job); to set the comparator sort class . My questions is about the class SortableBytesGroupingComparator and SortableBytesSortingComparator of SortableBytes; both of these two code have a similar code fragment in their compare method like these :

int b1Length = ByteBuffer.wrap(b1, s1 + 4, l1 - 4).getInt();
int b2Length = ByteBuffer.wrap(b2, s2 + 4, l2 - 4).getInt();

final int retVal = compareBytes(
    b1, s1 + 8, b1Length,
    b2, s2 + 8, b2Length
);
In the SortableBytes scenario ,the b1Length and b2Length vars really mean the shard number of the group key ,so the compareBytes is possible to occur the IndexOutOfBoundsException .
am I right ?


for spelling update: groupkeylength(4 bytes) + groupKey [ shardnum (4bytes)+timestamp(8bytes)+partitionNum(4bytes)+options(xxx) ] + sortKey[ timestamp(8bytes) + hashval (16 bytes) ]