12 Mar 2017

Redshift Limitations

Recently have been toying around with Redshift (a popular Data-Warehouse tool by AWS) a little bit and see some minor but obvious limitations in SQL limitations:

FULL OUTER JOIN + ON Func(x) = Func(y) : Not Supported

Although Redshift supports most variants of FULL OUTER JOIN, however, when the JOIN condition contains a function on both sides, it gives up complaining "ERROR:  XX000: could not devise a query plan for the given query".

CREATE TABLE x (a integer);
INSERT INTO x VALUES (1);  

SELECT x1.a, x2.a
FROM x x1
FULL OUTER JOIN x x2
     ON LOWER(x1.a) = LOWER(x2.a); -- Fails
  -- ON LOWER(x1.a) = x2.a; -- Works
  -- ON x1.a = LOWER(x2.a); -- Works
  -- ON x1.a = x2.a; -- Obviously Works

Workaround: The obvious way around it is to move one of the functions to a separate SQL (for e.g create a Temp Table and use that in the final computation), something like this:

CREATE TEMP TABLE x1(a) AS SELECT LOWER(a) FROM x;

SELECT x1.a, x2.a
FROM x1
FULL OUTER JOIN x x2
ON x1.a = LOWER(x2.a); -- Works

2 comments:

marcusAaptiv said...

Thanks! I was having this exact issue (the full outer join problem) and didn't find any help until I found this. Would be great to get this onto Stack Overflow? If I ask the question, will you answer it?

Thanks!

Unknown said...

I really like and appreciate your work which is about redshift. The article you have shared here is great. I read your post with carefully, the points you mentioned can be very helpful. It is nice seeing your wonderful post. redshift data warehouse