Selecting a random row from a table in mysql

database

I have come across more than one instance when I had to select a random record from a table in a MySQL database. Here is how to do it.

The simple but slow method


SELECT * FROM mytable ORDER BY RAND() LIMIT 1;


Although simple, the above query can be very slow on tables which have a large number of records. This happens because MySQL makes a temporary table with a random number assigned to each row. It then sorts the table and returns the one record from the sorted list.

Random record selection by id generation : A better and faster method

The table will need to have an ‘id’ field which is an auto-incrementing integer. The approach is to generate a random number which falls in the range of the values of the ‘id’ field. MySQL MAX() and MIN() grouping functions allow you to choose the maximum and minimum values of a field.

The python approach

Sample python code to do the random record selection

import random
...
cursor.execute('SELECT MIN(id) FROM mytable')
min_value = cursor.fetchone()
cursor.execute('SELECT MAX(id) FROM mytable')
max_value = cursor.fetchone()

random_id = random.randint(min_value, max_value)

cursor.execute('SELECT * FROM mytable WHERE id = %s', (random_id,))
random_record = cursor.fetchone()


Pure SQL approach

MySQL offers a RAND() function which generates a random floating point number between 0.0 and 1.0. We will use this value and scale it to fit into the max – min value range of the id field.

SELECT * FROM mytable WHERE id = (SELECT MIN(id) + FLOOR((MAX(id)+1) * RAND()) FROM mytable ) LIMIT 1;


So next time you have to select a random record, you know what to do :)

Additional information

No Trackbacks

9 Comments

  1. One more method is there which is far simpler :)
    x = get_random_number(ROWS)
    select * from table limit x, 1

    Posted April 8, 2008 at 11:41 pm | Permalink
  2. Good one! Didn’t think about this!

    Posted April 9, 2008 at 7:33 pm | Permalink
  3. Hey, first off, thanks for linking to my post :)

    I have a comment regarding your “simple but slow” vs. “Pure SQL” comparison.

    After reading this post, I decided to try out your approach to see what kind of difference there is, so I loaded up a table in MySQL with 5000 records and ran both the “simple but slow” method and the “Pure SQL” method. You might be surprised at the results I got.

    I used two simple approaches to get my results. First, I used PHPMyAdmin to select a single record, with both SQL methods. Then I used the ASP.Net code that was used in my post and ran both, selecting a single record from the list of 5000, 100 times.

    Simple but Slow in PHPMyAdmin
    - 0.0073 sec
    - 0.0084 sec
    - 0.0069 sec
    - 0.0091 sec
    - 0.0073 sec

    Pure SQL in PHPMyAdmin
    - 4.3905 sec
    - 9.5469 sec
    - 9.9261 sec
    - 9.6034 sec
    - 9.5922 sec

    Simple but Slow in ASP.Net
    - 00:00:01.0014400
    - 00:00:00.9313392
    - 00:00:01.1816992
    - 00:00:01.3619584
    - 00:00:01.0414976

    Pure SQL in ASP.Net
    - 00:09:35.5175536

    As you can see in both sets of tests, especially with the second set, the “simple but slow” method out performs the slower Pure SQL method.

    Now I have to go write a Blog post about it :)

    Carl J

    Posted April 28, 2008 at 5:48 pm | Permalink
  4. As Carl pointed out,

    SELECT * FROM mytable ORDER BY RAND() LIMIT 1;

    is not slow because. (http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html).

    If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause must be selected, and most or all of them must be sorted, before it can be ascertained that the first row_count rows have been found. In either case, after the initial rows have been found, there is no need to sort any remainder of the result set, and MySQL does not do so.

    Posted April 28, 2008 at 6:12 pm | Permalink
  5. Carl J, Very interesting! I had written this blog post based on my experience in randomly selecting a record from a table with around 5 million records. I don’t have the timing stats for that one though. The table you tried the test on is innoDB or MyISAM? I tried mine on InnoDB. Besides I would think a larger test size (atleast a million records) will be better for this one.

    Sai,
    I don’t think the optimization applies for dynamically generated values like RAND() (as opposed to column data). As I said above, I *have* seen that “order by rand()” is slower. I will try to post stats when possible.

    Posted May 4, 2008 at 9:55 am | Permalink
  6. deepak

    Hi People!

    I need to fetch the last but one row from the table.
    I used below queries it is not giving me the desired result. I think a mysql function is needed to fetch.

    select * from cust order by cnum desc limit 1;

    Posted September 18, 2008 at 11:12 am | Permalink
  7. That should return the last row in the db, assuming that cnum is your primary key field. What is it returning to you?

    Carl Js last blog post..In Case You’re Wondering, I’m Doing Alright – Part Duh!

    Posted September 19, 2008 at 4:54 am | Permalink
  8. deepak

    Hi,

    I’m getting the last result row in the table which is ordered by that row.
    What I want is specific row selection from the table.

    Thanks,
    Deepak E.

    Posted September 19, 2008 at 11:08 am | Permalink
  9. rajesh

    very easy and very good method to follow ;
    i have learned a lot from this website
    thanks for help

    Posted April 6, 2009 at 12:54 pm | Permalink