Wed 21 Sep 2005
“Do not use ORDER BY RAND()” or “How to get random rows from table?”
Posted by Anton Titov under SQLQuite often people use queries like:
SELECT quote FROM quotes ORDER BY RAND() LIMIT 1
to get a random row (rows) from a table. That’s quite bad idea. For big tables, if your table have just
50-100 rows, use whatever you want.
What happens when you run such a query? Let’s say you run this query on a table with 10000 rows, than the SQL server generates 10000 random numbers, scans this numbers for the smallest one and gives you this row. Generating random numbers is relatively expensive operation, scaning them for the lowest one (if you have LIMIT 10, it will need to find 10 smallest numbers) is also not so fast (if quote is text it’s slower, if it’s something with fixed size it is faster, maybe because of need to create temporary table).
So. How can you select random row (rows) without this overhead? There is no easy drop in replacement. You can use something like:
SELECT COUNT(*) AS cnt FROM quotes
generate random number between 0 and cnt-1 in your programming language and run the query:
SELECT quote FROM quotes LIMIT $generated_number, 1
Yes. This are two queries, but they are MUCH faster than the first one. This option is good if you need just one random row. If you need more rows, you can still use this trick, just substract X (X is number of rows you need) from cnt when generating random number and modify query to:
SELECT quote FROM quotes LIMIT $generated_number, X
But this will give you X subsequent rows, starting from a random position. If that’s not an option, than you can use another approach: most of the time you have unique numeric field in tables, that start from 1 and continue to grow, so you can do something like:
SELECT MAX(id) AS maxid FROM quotes
generate X random numbers between 1 and X, join this numbers in string with ‘,’ for separator and run this query:
SELECT quote FROM quotes WHERE id IN ($idlist)
Yes. I know, you may have some deleted id’s and the query than may return less rows than you need. But you may generate 10 times more random ids and modify the query to look like:
SELECT quote FROM quotes WHERE id IN ($list_with_10_times_more_ids_than_x) LIMIT $x
now if you do not have too many rows deleted from the table the chances that you will not find let’s say 10 existing ids from list with 100 random numbers are near zero and you can include a code in your program, that will check if you have $x rows in the result and if not (let’s say once in 10000 times) it will run the equivalent ORDER BY RAND() code. You can also generate 100 times more ids than you need - the only overhead you’re introducing is generating 100 times more random numbers than $x (if you need 5 random rows from table with 10000 rows it’s better to generate 500 random numbers than 10000) and parsing of the query will be a bit slower, but the table scan will not be slower - the SQL server will stop searching for rows with corresponding ids from the list as soon
as it finds $x existing rows.
Leave a Reply
You must be logged in to post a comment.