Full Version: Random Top N Records Per Group
cat2phat
I am attempting to retrieve a random sample of records per user.

For this example we are only concerned with the User and the record ID.

CODE
EXAMPLE TABLE

User     | ID | +more
Ashley  |   1 |
Ashley  |   2 |
Ashley  |   3 |
Brian    |   5 |
Brian    |   6 |
Ashley  |   7 |
Ashley  |   8 |
Brian    |   9 |
Brian    | 10 |
Brian    | 11 |
Brian    | 12 |
Ashley  | 13 |
Brian    | 14 |

First I established the TOP 3 records per user using subqueries. This is based off of Allen Browns example from http://allenbrowne.com/subquery-01.html

CODE
EXAMPLE QUERY - SELECT TOP 3 per USER

SELECT a.ID, a.User
FROM tbl AS a
WHERE a.ID In

(SELECT Top 3 b.ID
FROM tbl AS b
WHERE b.User=a.User
ORDER BY b.User DESC, b.ID DESC
)
ORDER BY a.ID DESC, a.User DESC;

CODE
EXAMPLE QUERY RESULTS - SELECT TOP 3 per USER

User     | ID | +more
Ashley  |  1  |
Ashley  |  2  |
Ashley  |  3  |
Brian    |  5  |
Brian    |  6  |
Brian    |  9  |

Moving onto select a random sample. There are lots of ways to get a random number. I have choosen to use RND(isnull(ID) * 0 + 1 + now()). Sorting the subquery by User then Random number will in therory select the first (or last) 3 records for that user.

CODE
EXAMPLE QUERY - SELECT RANDOM TOP 3 per USER

SELECT a.ID, a.User
FROM tbl AS a
WHERE a.ID In

(SELECT Top 3 b.ID
FROM tbl AS b
WHERE b.User=a.User
ORDER BY b.User DESC, rnd(isnull(b.ID) * 0 + 1 + now()) DESC, b.ID DESC
)
ORDER BY a.ID DESC, a.User DESC;

CODE
EXAMPLE QUERY RESULTS - SELECT RANDOM TOP 3 per USER

Um the results are random, HOWEVER, the query selects a random number of records per user instead of the TOP 3.  An example would be...

User     | ID | +more
Ashley  |  3  |
Ashley  |  7  |
Ashley  |  8  |
Ashley  | 13 |
Brian    | 12 |

There was too many records for Ashley (4, where there should be 3), and too few records for Brian (1, where there should be 3)

I did search UtterAccess and found the following link.

http://www.UtterAccess.com/forum/lofiversi...p/t1653519.html

Inside it states that the only way possible is to create a temp table with the random numbers, then run the query TOP N per Group.

So the first question would be, am I barking up the wrong tree (is what I am trying to do possible)?

If possible, what am I doing wrong?

Thank you.
accesshawaii
Do you just need a random sampling of records? Not necessarily the top 3 or whatever, just 3 random sample records?
accesshawaii
If it is a matter of that. You can do something like the below. You would just need to modify it a tad to make sure that you're not using the same # more than once.

CODE
Dim lngMax As Long
Dim lngMin As Long
Dim varRnd As Variant
Dim strTop3 As String
Dim x As Integer
Dim qdf As QueryDef
Dim dbs As Database
Dim strSQL As String
Set dbs = CurrentDb
Set qdf = dbs.QueryDefs("MyQuery")
x = 1
lngMax = DMax("EmployeeID", "Employees")
lngMin = DMin("EmployeeID", "Employees")
Do While x <= 3
varRnd = Int((lngMax - lngMin + 1) * Rnd + lngMin) & ", "
strTop3 = strTop3 & varRnd
x = x + 1
Loop
If Len(strTop3) > 0 Then
strTop3 = Left(strTop3, Len(strTop3) - 2)
End If
strSQL = "SELECT EmployeeID, LastName " & _
"FROM Employees " & _
"WHERE EmployeeID In (" & strTop3 & ")"
qdf.SQL = strSQL
cat2phat
Your method only produces a random sample of 3 on a table, where I need a random sample of 3 per user. I am also trying to avoid VBA where possible.

CODE
SELECT Top 3 b.ID
FROM tbl AS b
ORDER BY b.User DESC, rnd(isnull(b.ID) * 0 + 1 + now()) DESC, b.ID DESC

Your method is close to what my subquery does in my example. The difference is that my query limits the subquery's results to each user through the where clause of b.User=a.User in the subquery.
accesshawaii
All you would have to do is modify what I posted a bit. Do you have a table that you're referencing for all the users? I'm not sure exactly what you're trying to do. If you have a table, we'll call it tbl_Users and you have another record source, which we'll simply call "Employees" like I used in the sample and you want to get 3 samples for each user from the Employees record source, so for example with the below.

CODE
'Users                  'Employees
'Ben                    '1  Alan
'Bill                   '2  Aaron
'Christine              '3  Jim
'4  Kevin
'5  Mandy
'6  Olivia
'7  Oscar
'8  Paul
'9  Patricia

'The results might be something like below
'Ben - 1 Alan, 5 Mandy, 8 Paul
'Bill - 2 Aaron, 3 Jim  9 Patricia
'Christine - 4 Kevin, 6 Olivia, 7 Oscar

If that's what you're looking to do, you can do something like this.

CODE
Dim lngMax As Long
Dim lngMin As Long
Dim intCountRnd As Integer
Dim varRnd As Variant
Dim lngRnd As Long
Dim strTop3 As String
Dim x As Integer
Dim strUserSQL As String
Dim lngUserID As Long

x = 1

'Clear existing data
CurrentDb.Execute "DELETE tmp_Random_Number.* " & _
"FROM tmp_Random_Number; "

strUserSQL = "tblUsers"
rst.Open strUserSQL, CurrentProject.Connection, adOpenForwardOnly 'Open the users table for iteration

Do While Not rst.EOF 'Iterate through each employee and assign random #s.
lngUserID = rst.Fields("UserID").Value 'Get the user ID

Do While x <= 3
lngMax = DMax("EmployeeID", "qryEmployees") 'Unmatched query
lngMin = DMin("EmployeeID", "qryEmployees")
lngRnd = Int((lngMax - lngMin + 1) * Rnd + lngMin) 'Get the random #
'Check if it's already been assigned
intCountRnd = DCount("*", "tmp_Random_Number", "rnd_Num=" & lngRnd)
If intCountRnd < 1 Then 'Assign #
CurrentDb.Execute "INSERT INTO tmp_Random_Number ( rnd_Num, rndEmployeeID ) " & _
"VALUES (" & lngRnd & ", " & lngUserID & ")"

x = x + 1
End If
Loop
rst.MoveNext
'Reset the count
x = 1
Loop
rst.Close
Set rst = Nothing

Below is a brief explanation on this.

1. tmp_Random_Number is a temporary table, which has a field for the random # value and for the UserID value.
2. qryEmployees is an unmatched query, which shows records that are in the main table "Employees" in this example that are not in "tmp_Random_Number". This is how the code knows which #s were used and which were not used.

Hope all that makes sense. Good luck with your project.

cat2phat
accesshawaii -

Hmmm, I think there may be some miscommunication. Let me start over.

I am trying to avoid VBA if possible, which is why this post is in "Access Queries". I am attempting to perform a Query that does what can be done in VBA. The query needs to grab a Random N sample of records per Group.

For example I have a table called tbl_UserRecords with multiple fields, where ID is the autonumbered unique record ID, User is the field we will want to use for grouping, and other fields that aren't evaluated. We want to grab 3 random records per User.

So if Ashley has 6 records, Brian has 7, and Moe has 1, the resulting query should have 3 records for Ashely, 3 records for Brian, and 1 for Moe.

CODE
EXAMPLE - tbl_UserRecords
User     | ID | +more
Ashley  |   1 |
Ashley  |   2 |
Ashley  |   3 |
Brian    |   5 |
Brian    |   6 |
Ashley  |   7 |
Ashley  |   8 |
Brian    |   9 |
Brian    | 10 |
Brian    | 11 |
Brian    | 12 |
Ashley  | 13 |
Brian    | 14 |
Moe      | 15 |

CODE
EXPECTED Results
User     | ID | +more
Ashley  |  3  |
Ashley  |  7  |
Ashley  |  8  |
Brian    |   9 |
Brian    | 12 |
Brian    | 14 |
Moe      | 15 |

Using an example from Allen Brown, I am able to group the users together and get the 3 results per user, but they will always be the same records.

CODE
EXAMPLE QUERY - SELECT TOP 3 per USER
SELECT a.ID, a.User
FROM tbl AS a
WHERE a.ID In

(SELECT Top 3 b.ID
FROM tbl AS b
WHERE b.User=a.User
ORDER BY b.User DESC, b.ID DESC
)
ORDER BY a.ID DESC, a.User DESC;

CODE
RESULTS
User     | ID | +more
Ashley  |  1  |
Ashley  |  2  |
Ashley  |  3  |
Brian    |  5  |
Brian    |  6  |
Brian    |  9  |
Moe     | 15 |

To grab 3 random records per group, we sort by User then by a random number, and grab the Top 3 of that sort. This should return the Random Top N Records Per Group.

CODE
EXAMPLE QUERY - SELECT RANDOM TOP 3 per USER
SELECT a.ID, a.User
FROM tbl AS a
WHERE a.ID In

(SELECT Top 3 b.ID
FROM tbl AS b
WHERE b.User=a.User
ORDER BY b.User DESC, rnd(isnull(b.ID) * 0 + 1 + now()) DESC, b.ID DESC
)
ORDER BY a.ID DESC, a.User DESC;

The query does return random records, HOWEVER, it does not return the correct N records. For example, Ashely should have 3 random records, but 4 would return, Brian should have 3 records, but only 1 would return, and Moe should have 1 record, but none return.

CODE
EXAMPLE Results from RANDOM TOP N per group.
User     | ID | +more
Ashley  |  3  |
Ashley  |  7  |
Ashley  |  8  |
Ashley  | 13 |
Brian    | 12 |

My current VBA solution is as follows, but as I said I would prefer to use a query where possible.

CODE
'VBA CODE to generate record table with random numbers

Function BuildUserRecordsWRandomNumbers()

dim strSQL as string

If SQL_TableExists("tbl_UserRecords_W_RandomNumbers ") Then
strSQL = ""
strSQL = strSQL & "DROP TABLE tbl_UserRecords_W_RandomNumbers"
DoCmd.RunSQL "DROP TABLE tbl_UserRecords_W_RandomNumbers "
End If

strSQL = ""
strSQL = strSQL & "SELECT tbl_UserRecords.*, Rnd(IsNull([ID])*0+1+Getmillisecond()) AS RandomNumber "
strSQL = strSQL & "INTO tbl_UserRecords_W_RandomNumbers "
strSQL = strSQL & "FROM tbl_UserRecords"

Docmd.runSQL strSQL

End Function

Query to run after calling BuildUserRecordsWRandomNumbers

CODE
SELECT a.ID, a.User
FROM tbl_UserRecords_W_RandomNumbers AS a
WHERE a.ID In

(SELECT Top 3 b.ID
FROM tbl_UserRecords_W_RandomNumbers AS b
WHERE b.User=a.User
ORDER BY b.User DESC, b.RandomNumber DESC, b.ID DESC
)
ORDER BY a.ID DESC, a.User DESC;
accesshawaii
Gotcha. So, you want the random record to only come from records that are assigned to each user. Only a maximum of 3 random records will be returned. In the event that a user has less than 3 records than the random returns will be non-applicable since they will always be getting the same records.

That wouldn't be that hard to do through VBA, in fact it would be just minor tweaks to what I posted.

If you're set on doing it in a query with no VBA then I'm sorry, I don't have an answer for you on how to do it.

Anyway, sorry I couldn't help. Good luck with your project.
philben
Hi,

CODE
SELECT T1.User,
T1.id
FROM   tUsers AS T1,
(SELECT ( MAX(id) - MIN(id) + 1 ) * Rnd(Timer() + 1) + MIN(id) AS rd
FROM   tUsers) AS T2
WHERE  T1.id IN (SELECT TOP 3 Id
FROM   tUsers T
WHERE  T.user = T1.User
ORDER  BY Abs(T.Id - T2.rd),
T.id)
ORDER  BY T1.User,
T1.id

Philippe
philben
Two another propositions (better ?) regarding this first idea :
CODE
SELECT T1.User,
T1.id
FROM   tUsers AS T1,
(SELECT ( MAX(id) - MIN(id) + 1 ) * Rnd(Timer() + 1) + MIN(id) AS rd
FROM   tUsers) AS T2
WHERE  T1.id IN (SELECT TOP 3 Id
FROM   tUsers T
WHERE  T.user = t1.User
ORDER  BY Rnd(-Abs(T.Id - T2.rd)) DESC,
T.id)
ORDER  BY T1.User,
T1.id;

CODE
SELECT T1.User,
T1.id
FROM   tUsers AS T1
WHERE  T1.id IN (SELECT TOP 3 Id
FROM   tUsers T
WHERE  T.user = t1.User
ORDER  BY Rnd(-Abs(t.id - Timer())),
T.id)
ORDER  BY T1.User,
T1.id;

Philippe