Source: sql-snippets/Cumulative distribution functions.md at main · count/sql-snippets (github.com)
Explore this snippet with some demo data here.
Cumulative Distribution functions (CDFs) are a method for analyzing the statistical distribution of a quantity, similar to histograms.
They show, for each value of a quantity, what fraction of rows are smaller or greater.
One method for calculating a CDF is as follows:
/*
Use a row_number window function to get the position of this row
and CAST to convert row_number and count from integers to decimal
*/
SELECT CAST(row_number() OVER (ORDER BY <quantity> ASC) AS DECIMAL) / CAST((SELECT COUNT(*) FROM <table> ) AS DECIMAL ) AS cdf
FROM <table>
where:
quantity
- the column containing the metric of interesttable
- the table name
Note: CAST
is used to convert the numerator and denominator of the fraction into decimals before they are divided.
Using some student test scores as an example data source, let’s identify:
table
- this is calledExample_data
quantity
- this is thecolumn score
then the query becomes:
SELECT student_id,
score,
CAST(ROW_NUMBER() OVER (ORDER BY score ASC) AS DECIMAL) / CAST(( SELECT COUNT(*) FROM Example_data) AS DECIMAL) AS frac
FROM Example_data
Output:
student_id | score | frac |
---|---|---|
3 | 76 | 0.2 |
5 | 77 | 0.2 |
4 | 82 | 0.6 |
... | ... | ... |
1 | 97 | 1 |
Backlinks:
list from [[SQL - SQL Server - Cumulative Distribution Functions]] AND -"Changelog"