12 Mar 2017

Redshift Limitations

Recently have been toying around with Redshift (a popular Data-Warehouse tool by AWS) a little bit and see some minor but obvious limitations in SQL limitations:

FULL OUTER JOIN + ON Func(x) = Func(y) : Not Supported

Although Redshift supports most variants of FULL OUTER JOIN, however, when the JOIN condition contains a function on both sides, it gives up complaining "ERROR:  XX000: could not devise a query plan for the given query".

CREATE TABLE x (a integer);
INSERT INTO x VALUES (1);  

SELECT x1.a, x2.a
FROM x x1
FULL OUTER JOIN x x2
     ON LOWER(x1.a) = LOWER(x2.a); -- Fails
  -- ON LOWER(x1.a) = x2.a; -- Works
  -- ON x1.a = LOWER(x2.a); -- Works
  -- ON x1.a = x2.a; -- Obviously Works

Workaround: The obvious way around it is to move one of the functions to a separate SQL (for e.g create a Temp Table and use that in the final computation), something like this:

CREATE TEMP TABLE x1(a) AS SELECT LOWER(a) FROM x;

SELECT x1.a, x2.a
FROM x1
FULL OUTER JOIN x x2
ON x1.a = LOWER(x2.a); -- Works

1 comment:

marcusAaptiv said...

Thanks! I was having this exact issue (the full outer join problem) and didn't find any help until I found this. Would be great to get this onto Stack Overflow? If I ask the question, will you answer it?

Thanks!