Background
I have a 4-column SQL table:
id - varchar (50)g1 - varchar (50)g2 - varchar (50)datetime - timestamp
I have this query:
SELECT g1, COUNT(DISTINCT id), SUM(COUNT(DISTINCT id)) OVER () AS total, (CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share FROM my_table and g2 = 'start' GROUP BY 1 order by share desc
This query was built to answer: What are the distribution of g1 values ββamong users?
Problem
Each id can have several records in the table. I want to consider the earliest of them. early means the minimum datetime value.
Example
Table
id g1 g2 datetime x1 a start 2016-01-19 21:01:22 x1 c start 2016-01-19 21:01:21 x2 b start 2016-01-19 09:03:42 x1 a start 2016-01-18 13:56:45
Actual query results
g1 count total share a 2 4 0.5 b 1 4 0.25 c 1 4 0.25
we have 4 entries, but I only want to consider two entries:
x2 b start 2016-01-19 09:03:42 x1 a start 2016-01-18 13:56:45
which are the earliest entries for id .
Expected Query Results
g1 count total share a 1 2 0.5 b 1 2 0.5
Question
How can I consider only the earliest entry for id in group by
source share