Statistics on getting a random row from a table

Por Saiyine Enviar correo el 2010-01-24 15:14:31 - Secciones: PRINCIPAL ENGLISH MYSQL SQL - Enlace permanente: 901

So yesterday night I found this interesting post talking about of the ages known alternative for getting a random row from a table of using two queries, the first for learning the number of rows, and the second to get just one row using the reserved SQL word offset and passing the number of rows as the top for a random funciont. In pseudo-SQL, something like this...

SELECT COUNT(1) AS total FROM bench_myisam; $rand = rand($total); SELECT * FROM bench_myisam LIMIT 1 OFFSET $rand;

... instead of using our much hated but extremely common order by rand:

SELECT * FROM bench_myisam ORDER BY RAND() LIMIT 1;

I've always thought the two querys alternative had to be slower even if it was just for the overhead of the two network accesses, but this guy proved me wrong by going all the hassle of testing it for real, and, really, the two querys approach resulted being an order faster in is tests than our "beloved" random ordering.

It was precisely these tests that sparkled my curiosity, because I don't feel like they where of my taste. First, he uses MySQL and Sqlite, what an strange combination. Sqlite is like the MsDOS of the databases, there are lots of better embedded ones if it was the reason. Second, what's the use of using MySQL and not telling what storage we are using for the tests? (Althought he hints in the commentaries it could be MyISAM, and asserts correctly that the row number is stored in the metadata of the table). And finally, the consists of just one query. Whoa, what an statistic.

So I did my own tests with my server, a MySQL 5.1.37-1ubuntu5 running on an Ubuntu 9.10, and they really show he's absolutely right:

In a table with a million rows, for getting a random row, using the two reads approach is an order of magnitude better than the naive random order way! MyISAM performing better than InnoDB was expected, but, sincerely, I believed the difference would be greater.

The data of the table is measured in milliseconds per query (Where do I write to get that measure unit named after me????) so obviously the lower the better, as is the time on average a query took to complete, calculated from the thousands of petitions the perl based clients did.

UPDATE: The querys, explained:

The naive approach, ordering by rand() and then getting the first row:

EXPLAIN SELECT * FROM bench_myisam ORDER BY RAND() LIMIT 1;

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1 SIMPLE bench_myisam index (NULL) PRIMARY 8 (NULL) 1000000 Using index; Using temporary; Using filesort

And the output for the two querys approach:

EXPLAIN SELECT COUNT(1) AS total FROM bench_myisam;

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1 SIMPLE (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) (NULL) Select tables optimized away

EXPLAIN SELECT * FROM bench_myisam LIMIT 1 OFFSET $rand;

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1 SIMPLE bench_myisam index (NULL) PRIMARY 8 (NULL) 1000000 Using index

The tables are incredibly boring, just a million rows of an integer primary key and a varchar(200) of sample text. And here is the code for the Perl client I wrote to test the two querys approach. The other client is almost identical, but with just one query.

Otras entradas de la web que podrian interesarte:

  • (902)  Jan 25  PERL DBI PROGRAMACION ENGLISH MYSQL  Reading a random row from a MySQL table with Perl DBI
  • (828)  May 28  MYSQL ENGLISH  Get the last AUTO_INCREMENT ID in MySQL
  • (850)  Nov 15  MYSQL  Optimizar querys en MySQL con Explain
  • (869)  Apr 06  SQL POSTGRESQL  Autoincrementos en PostgreSQL usando secuencias
  • (827)  May 28  MYSQL  Obtener el ultimo AUTO_INCREMENT
  • Peter Rabbitson (2010-01-25 08:12:55)

    Can you please provide some actual data? Like the amount of rows in the table, and the table CREATE statements. Also if possible the query plans (EXPLAIN)

    Saiyine (2010-01-25 05:15:41)

    The tables are a million rows big, and as simple as they could be, with an integer key called id and a varchar for the data.

    Anyway, I will update the post with as much data as possible as soon as possible.

    Peter Rabbitson (2010-01-25 05:54:41)

    Right, I think you'll get a slightly different result with InnoDb, as the count() there sucks balls.

    Saiyine (2010-01-25 07:05:51)

    Yep, that is what the graphic says, InnoDB reads using the count() use twice the time the MyISAM.

    I have not put the explain for the InnoDB count() because is a PITA for me to add formated code to the blog, but it says the InnoDB count() uses the index of the table to calculate the row count, while the MyISAM just gets it somewhere from the data engine.


    Tu nombre (Nick):

    Tu correo (Email):
      Necesario para tu gravatar!

    Tu página (URL):

    Escribe aqui tu comentario:

           :noworry: :roll: :huh: :push: :OO 8) 8O :( :) :? :D :P :o :x :| ;) ^^ xD



    Vista previa activada.

    Vista previa (6/6/6, 6:66)


    Fondos de pantalla

    . . .

    Descargas

  • ApagaPC
    apagapc241.exe  (3263)
  • LimpiaDocus
    LimpiaDocus001.exe  (2534)
  • RCM
    rcm001.zip  (2174)
  • Popmail
    popmail-0.4-psmn.tar.gz  (2247)
  • Manual de Delphi en PDF
    delphi_pdf.zip  (2837)
  • Evangelio del Perl
    Evangelio_del_perl.pdf  (2544)
  • Excel Simpsons
    Excel Simpsons  (2788)
  • . . .

    ltimos comentarios

  • 608 - obi ivan kenobi: ;)porque no hicieron que obi wan...
  • 630 - Anonimo: re feo el programa
  • 660 - Graciela: Por favor INFORMEN BIEN de cómo se monta una foto...
  • 660 - Anonimo: No entendi nada.No sé cómo se hace para corregir una...
  • 608 - Anonimo: mapas para single-player ???
  • 727 - no: borralo hombre
  • . . .

    Proyectos Online

  • Saiyine Store
  • Kunowalls!!!
  • Fondos de pantalla
  • Picaday: imagenes sexys o chocantes.
  • Scarlett: fotos de Scarlett Johansson
  • WhatsmyIP: obtener tu IP pública
  • Uma Thurman: galeria de Uma Thurman
  • FunPics: imagenes graciosas
  • . . .

    Blogs

  • Por lo que más querais, no entreis a estos: Tapanez, Yhandros, Onez.

  • Otros en español: El Mundo Today, La libreta de Van Gaal, Chavalina, Kirai, Mundo Geek, Microsiervos.

  • Mis lecturas en guiri: Michael Yon, Coding Horror, YCombinator news, MySQL Performance, Slashdot.

    . . .
  • Utilidades

  •  Coral  (1060), cacheando webs
  •  Spam.la  (1245), correo de usar y tirar
  •  Mailinator  (1213), correo de usar y tirar
  •  Bug me not  (1094), absurdos registros di NO
  •  Trashmail  (1121), correo de usar y tirar

    . . .
  • Busquedas


    Varios ejemplos de lo que buscaban visitantes recientes:

    . . .

    Página web ©2001-2010 Saiyine generada en s, con 4084566 visitas en total, hoy (de un total de 0 previstas).

    . . .