- Row Numbering
- Support: Yes.
- Use: Row_Number() function numbers rows generated in a result-set.
- Example:
SELECT
row_number() OVER (ORDER BY marks DESC) AS rn,
name
FROM x; - Review: Some databases have different variants that accomplish this (for e.g. Oracle has a pseudo column called ROWNUM), but PostgreSQL fully supports the SQL Compliant syntax.
- Rank()
- Support: Yes.
- Use: Rank() and Dense_Rank() functions number the rank of the compared item.
- Example:
SELECT
rank() OVER (ORDER BY marks DESC) AS rn,
dense_rank() OVER (ORDER BY marks DESC) AS drn,
name
FROM x;
- Review: Its useful and fully supported.
- Window Clause
- Support
- OVER (PARTITION BY): Yes
- OVER (ORDER BY): Yes
- OVER (RANGE): Yes
- Use: Read more here.
- Example:
- Review: These are extremely helpful for people serious about data-extraction / reporting and fully supported.
- NTile
- Support: . Yes
- Use: Ntile().
- Example:
SELECT
ntile(4) OVER (ORDER BY marks DESC) AS quartile,
ntile(10) OVER (ORDER BY marks DESC) AS decile,
ntile(100) OVER (ORDER BY marks DESC) AS percentile,
name
FROM x;
- Review: Versatile and fully supported.
- Nested OLAP Aggregations
- Support: No
- But doable with alternative SQL? : Yes
- Is that as Performant? : Mostly No
- Description: Allow something like
SELECT
subject,
AVG(SUM(marks) GROUP BY class)
FROM marks
GROUP BY subject; - Alternative: This could be done with Sub-Selects like this:
SELECT
subject,
AVG(sum_marks) AS avg
FROM (
SELECT
subject
class,
SUM(marks) AS sum_marks
FROM marks
GROUP BY subject, class
) mrk
GROUP BY subject; - Review: In the two examples we are trying to calculate the Per-Subject-Average of (Total marks obtained in different classes). Although PostgreSQL doesn't support this form of nested-aggregates, it clearly is a neat form of doing things. The alternative, acceptably looks like a kludge, and it would be a nice to have feature.
- GROUPING SETS
- Support: Yes (in 9.5)
- Alternative: This could be alternatively done with UNION ALL like this:
SELECT SubjectID,
NULL AS StudentID,
AVG(marks)FROM marksGROUP BY SubjectID
UNION ALL
SELECT NULL AS SubjectID,
StudentID,
AVG(marks)FROM marksGROUP BY StudentID; - Review: Popular databases (Oracle / MSSQL) support this well. PostgreSQL has had this has on the ToDo list from at least a decade ! Looking at the alternative, one can see that this is not just lengthy (and repetitive .. thus error-prone), but also non-performant (simply because it requires multiple-runs of the same data-set).
- History:
- ROLLUP
- Description: An obvious extension to GROUPING BY (explained above), ROLLUP could be explained with a simple example:
GROUP BY ROLLUP (Year, SubjectID, StudentID)
is equivalent to
GROUP BY GROUPING SETS
(Year, SubjectID, StudentID)
(Year, SubjectID)(Year)
(); - Support: Yes (in 9.5)
- Alternative: This could be alternatively done with CTEs.
WITH x AS (
SELECT Year, SubjectID, StudentID, marks
FROM marks
WHERE passed
AND NOT inactive
)
SELECT *
FROM x
UNION ALL
SELECT
Year, SubjectID, StudentID, AVG(marks)
FROM x
GROUP BY Year, SubjectID, StudentID
UNION ALL
SELECT
Year, SubjectID, NULL AS StudentID, AVG(marks)
FROM x
GROUP BY Year, SubjectID
UNION ALL
SELECT
Year, NULL AS SubjectID, NULL AS StudentID,
AVG(marks)
FROM marks
GROUP BY Year; - Review: ROLLUPs are ideal to generate things like Sub-totals, which at times form key performance factors when generating large Reports. The alternative essentially uses a CTE, which is subsequently used to calculate subtotals and totals. For multiple-reasons, this is sub-optimal and can be sped up, if only for in-built support. Besides, the alternative is lengthy & repetitive (thus error-prone).
- History:
- CUBE
- Support: Yes (in 9.5)
- Description: Just like ROLLUP (was an extension of GROUPING SETS), CUBEs are an extension of ROLLUP (and thereby GROUPING SETS) and could be explained with the following example:
GROUP BY CUBE (Year, SubjectID, StudentID)
is equivalent to
GROUP BY GROUPING SETS
(Year, SubjectID, StudentID)
(Year, SubjectID)
(Year, StudentID)
(Year)
(SubjectID, StudentID)
(SubjectID)
(StudentID)
(); - Review: The alternative (not provided for obvious reasons) is not just lengthy & repetitive (thus error-prone) but primarily not as performant as is otherwise possible.
- MERGE INTO / UPSERT
- Support: Yes (in 9.5)
- Doable with alternative SQL? : Yes (for 9.4 and below)
- Is the alternative as Performant?
- No: This is because the alternative (URL given below) is a BEGIN/EXCEPTION based solution which is (time-wise) costly and an in-built support would certainly be faster.
- Description: For those new to the complexity of MERGE (or UPSERT) please read this first.
TLDR: In the face of Concurrent Use, MERGE is difficult when a trying to balance Performance vs Integrity.
Unlike some other Database engines (that are sometimes okay with trading-off Integrity when it conflicts with Performance), PostgreSQL consistently prioritizes Data Integrity. The 'best' solution seems to have taken longer than expected, but considering that when a complicated open-source development model needs to coherently agree upon core feature additions, it really takes a few falling stars to get this piece-of-code in, with most people in support of it. - Example (SQL to create scenario + below SQL taken from here)MERGE INTO bonuses BUSING (SELECT employee_id, salaryFROM employeeWHERE dept_no =20) EON (B.employee_id = E.employee_id)WHEN MATCHED THENUPDATE SET B.bonus = E.salary * 0.1WHEN NOT MATCHED THENINSERT (B.employee_id, B.bonus)VALUES (E.employee_id, E.salary * 0.05);
- Alternative: The PostgreSQL documentation mentions one recommended way of doing UPSERT / MERGE here. But again, this is non-performant and 9.5 based support for INSERT .. ON CONFLICT (a.k.a. UPSERT).
- History:
17 May 2015
Basic OLAP Support in PostgreSQL
While
reviewing an existing application, I thought it'd be worthwhile to
review how good / bad PostgreSQL is in terms of OLAP. This (growing) post is going to be my (un)learning of how ready is PostgreSQL.
Subscribe to:
Post Comments (Atom)
What's in an empty table?
How much storage does an empty table in Postgres take? This is a post about Postgres tables that store ... well basically ... Nothing . The...
-
(Please scroll down to the read the entire-post to know more) Documents required for Christian Marriage Registration (a.k.a. Documents...
-
My patch (allowing a non-super-user to dump Globals) just got committed to Postgres 10.0. Besides the use mentioned above, this patch al...
-
pg_tle - A Must-Know for Developers PostgreSQL is a fantastic database, packed with features. But sometimes, you need to add a little somet...
1 comment:
Some features are not supported, for example LEAD, LAG and NTH Value can't ignore/respect NULL, so it will be interesting if exists workaround for it.
Also, in windows partition, right now there is no support for ROWS BETWEEN and CURRENT ROW (but there is patch for PostgreSQL 11).
Post a Comment