Distinct count... or Distinct values (1st occurence)
This article explains how to use the distinct count and gives a few examples to better understand this feature.
The distinct count feature is labeled "Distinct value (1st occurence)" in the configuration panel for reports generated from the Analytics module.
Explanation of the distinct count
The distinct count allows data to be filtered so that only one record is considered for each value of the selected columns.
When the Distinct values filter is applied, you must:
- Select columns (fields) that cannot appear more than once in the generated dataset.
- Specify a sort (select the field from which data will be sorted and select a direction, ascending or descending). When there are two records or rows with the same value in the same column, the sort will determine which row (record) will be kept, since only the first record encountered will be kept.
Example 1: distinct count on a single column (field)
Training courses and workshops are organized as part of a project. The organizers would like to know the last activity in which each participant took part.
The unfiltered datasets looks like this (3 participants took part in 2 activities each = 6 records)
By applying the distinct values filter on the participant's name, it is possible to obtain a single row (record) per participant.
To find the most recent activity, you also need to sort by date (descending) to make sure you have the highest values (most recent date) first.
Once sorted, the data set now has only one row per participant = 3 records:
Example 2: distinct count on 2 columns (fields)
An existing project tracks the prices of different fruits over a period of time.
We would like to know the last price recorded for each fruit, by year.
As can be seen in the table below, the price of apples and bananas was recorded twice in 2020.
Thus, for the year 2020, it's necessary to filter the records in red and keep the most recent date, which appears here in blue.
If we filter only by fruit, we'll get a dataset with three records (1 for each fruit).
But this only shows the most recent price recorded, whereas we want to find the most recent price recorded by fruit and by year.
To achieve this, we need to select 2 columns in the filter: Fruit and Year.
And to make sure that we keep the last price recorded, we also need to sort by date.
As shown in the table below, only the last price for each Fruit + Year combination is retained in the new dataset.
Example 3: how to use the distinct values filter on indicator values
We want to produce a line chart showing the progression of an indicator over time.
Here are the indicator values:
In the Analytics module, we produce a bar chart that averages the actual values of the indicator, including the quarters on the X axis.
So far, it has not been necessary to use the discinct count since the indicator data were entered quarterly. However, if we wish to present them in years on the X axis, it is necessary to specify what we wish to display. Depending on the nature of the indicators, it is often more logical to present the average value of a certain period, while in other circumstances it is preferable to use the last value for the period.
If we don't use the distinct values filter and simply display the years on the X axis, we'll get a value of 30 for the year 2020 (average of 20, 25 and 45).
If we use the distinct values filter, putting the years on the X axis, we'll get a value of 45 for the year 2020 (last value recorded = 45).