Distinct count ...or Distinct values (1st occurence)
This article explains how to use the distinct count and gives some examples to better understand this feature.
The distinct count feature is labeled as "Distinct value (1st occurence)" in the configuration panel of reports produced with the Analytics module.
Explanation of the distinct count
The distinct count allows you to filter the data and only consider 1 item for each value of the selected columns.
When applying the Distinct values filter, you need to:
- Select columns (fields) that cannot appear twice in the resulting dataset.
- Specify a sort (select the field on which data will be sorted and also select either ascending or descending). When there are two records or rows with the same value for the selected column, the sort will determine which row (record) is kept, since only the 1st row encountered will be kept.
Example 1: distinct count on a single column (field)
As part of a project, training and workshops are taking place and the organizers would like to know which activity each participant attended last.
The unfiltered datasets looks like this (3 participants that have participated in 2 activities each = 6 records)
By applying a distinct filter on the name of the participant, we will get only 1 row by participant.
Now we want the last activity they participated in, so we will sort by date (descending) to ensure the highest (most recent) date is the one we keep.
The resulting filtered dataset has now only 1 row by participant = 3 records:
Example 2: distinct count on 2 columns (fields)
A project tracks the price of various fruits through time.
We need to know what is the last price recorded for each fruit, by year.
As you can notice in the table below, the price of apples and bananas was recorded twice during the year 2020.
So for 2020, we will need to filter out the records in red and keep the ones in blue (most recent date).
If we filter out only by fruit, then we will get a dataset with 3 records (1 for each fruit).
But we don't only want the last price recorded by fruit... we want the last price recorded by fruit and by year.
So in order to achieve that, we need to select 2 columns in our filter: Fruit and Year.
And to ensure that we keep the last price recorded, we will set a sort by date
As you can see in the table below, only the last price for each fruit+year combination is kept in the resulting dataset.
Example 3: how to use the distinct filter on indicator values
We want to produce a line chart that shows the progress of an indicator through time.
This is the indicator's values:
In the Analytics module, we produce a bar chart of this indicator that calculates the average of the indicator's actual values, with quarters as X axis.
So far, we did not have to use the distinct count because the indicator was entered quarterly. However, if we produce it with years on the X axis, we have to decide what we want to see. Depending on the nature of our indicators, sometimes it can make more sense to see the average value in a certain period, and sometimes it is preferable to see the last value in that period.
If we don't use the distinct filter, and simply put years in the X axis, we will have a 2020 value of 30. (average of 20, 25 and 45)
If we use the distinct filter, when we put years in the X axis, we will have a 2020 value of 45. (last value entered = 45)