elfs: (Default)
[personal profile] elfs
Oh, mightly LJ brain, riddle me this:

I have a table in a MySQL (well, okay, it's Sqlite3 right now, but it'll my MySQL in production) database, and the primary keys are known to not be contiguous. That's important.

It's possible to construct a WHERE clause giving you a single item, if that items qualties are unique enough. Is it possible, using some magic combination of WHERE and ORDER BY and something I don't know, to say "give me unique item N, and using the rule of the ORDER BY clause, also give me the next and previous objects?"

Date: 2009-09-30 07:53 pm (UTC)
From: (Anonymous)
I don't think you can do this in a single select (that is, without using subselects) .

If you are willing to load the next and previous objects in dedicated queries, you can use something like:

select *
from table
where id > ?
order by id
limit 1

Date: 2009-09-30 07:54 pm (UTC)
From: [identity profile] tamino.livejournal.com
I would do that as two separate queries:

SELECT * from foo WHERE myfield >= N ORDER BY myfield LIMIT 2;
SELECT * from foo WHERE myfield < N ORDER BY myfield DESC LIMIT 1;

Date: 2009-09-30 07:55 pm (UTC)
From: [identity profile] shockwave77598.livejournal.com
Why not have additional fields that hold the keys for the previous and next items? A database implementation of a linked list.

Date: 2009-09-30 08:16 pm (UTC)
From: [identity profile] elfs.livejournal.com
Insertions may be random. It might be worth contemplating, but it makes insertions expensive.

Here it is.

Date: 2009-09-30 08:25 pm (UTC)
From: (Anonymous)
Hi Elf!

I took a minute to demonstrate this in sqlite, because I have sqlite on this machine I'm posting from. Similar principle probably works in mysql.

Example:

sqlite> select key from pageviews where key in (select key from pageviews where
key > 103 ORDER BY KEY LIMIT 1) or key in (select key from pageviews where key <
= 103 ORDER BY key DESC LIMIT 2);
104
102
103


I hope you don't mind the two subqueries inside the WHERE clause, but it does work.

-- Cat Typist

Oops, I only did half the requested query.

Date: 2009-10-01 10:28 pm (UTC)
From: (Anonymous)
I simply assumed we know the desired row's key before we make the query,
whereas you want the desired row's key to be discovered within the same query.

Now ALL IN ONE QUERY! Also, pretty comical IMHO.

Date: 2009-10-01 10:49 pm (UTC)
From: (Anonymous)
This should work in one query of pure SQL (no stored procedures)
but the efficiency is somewhat questionable. Also, it's like
a code comedy.

=========
sqlite> -- The key *directly* sought (without the next or prev rows)
sqlite> select key from pageviews where url like '%elf%' and key
...> > 10000
...> order by key asc limit 1;
10198
sqlite> -- Finds the key *directly* sought, also prev and next existing keys.
sqlite> select key from pageviews
...> where key in
...> (select key from pageviews where url like '%elf%' and key >10000

...> ORDER BY key ASC LIMIT 1)
...> -- find prev key
...> or key =
...> (select MAX(key) from pageviews where
...> key <
...> (select key from pageviews where url like '%elf%' and key >10000

...> ORDER BY KEY ASC LIMIT 1)
...> ORDER BY key DESC LIMIT 1)
...> -- find next key
...> or key =
...> (select MIN(key) from pageviews where
...> key >
...> (select key from pageviews where url like '%elf%' and key >10000

...> ORDER BY KEY ASC LIMIT 1)
...> ORDER BY key DESC LIMIT 1)
...> ;
10198
10197
10199
sqlite>
=========

--Cat Typist
From: (Anonymous)

Here it is in a more copy-and-pasteable form

-- The key *directly* sought (without the next or prev rows)
select key from pageviews where url like '%elf%' and key > 10000
  order by key asc limit 1;


-- Finds the key *directly* sought, also prev and next existing keys.
select key from pageviews 
  where key in
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY key ASC LIMIT 1)
    -- find prev key
    or key =
      (select MAX(key) from pageviews where 
        key < 
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY KEY ASC LIMIT 1)
      ORDER BY key DESC LIMIT 1)
    -- find next key
    or key =
      (select MIN(key) from pageviews where    
        key > 
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY KEY ASC LIMIT 1)
      ORDER BY key DESC LIMIT 1)
;
From: (Anonymous)

This works too, but it's a little less funny, IMHO.

-- Finds the key *directly* sought, also prev and next existing keys.
select key from pageviews 
  where key in
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY key ASC LIMIT 1)
    -- find prev key
    or key in
      (select key from pageviews where 
        key < 
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY KEY ASC LIMIT 1)
      ORDER BY key DESC LIMIT 1)
    -- find next key
    or key in
      (select key from pageviews where    
        key > 
        (select key from pageviews where url like '%elf%' and key >10000
          ORDER BY KEY ASC LIMIT 1)
      ORDER BY key ASC LIMIT 1)
;

Date: 2009-09-30 11:37 pm (UTC)
From: [identity profile] zanfur.livejournal.com
I've had to deal with this exact problem. Assuming that the "next" and "previous" semantics refer to "next larger" and "next smaller" primary key values, it's done by using multiple selects. This is a limitation of the relational algebra; I don't think there's a way around it. You first need the id of unique item X, then you need the max(id)<X and the min(id)>X, which is a select for the original id (however you get it), a select for the aggregate max(), and a select for the aggregate min().

Basically:

select max(id) from items where id < X;
select min(id) from items where id > X;

It's constant time if you have btree indexes for the primary key. Don't use a hash index, even though it might make otherwise sense for non-contiguous data. The MySQL compiler should realize that this is a simple btree lookup, and not even bother scanning. You *can* combine the two aggregate functions into a single query using a GROUP BY clause on a calculated field...but don't. It's horribly inefficient for large tables. Just use the separate queries above, which are optimized into simple index lookups:

mysql> explain select max(id) previd from items where id<11;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)

If you want it all in a single DB call, use a stored procedure -- then you can just call the stored procedure with the itemid in question (this is what I recommend):

DELIMITER //
CREATE PROCEDURE prevcurnext_items(
  IN itemid INT
)
BEGIN
  DECLARE previd INT;
  DECLARE nextid INT;
  SELECT max(id) INTO previd FROM items WHERE id < itemid;
  SELECT min(id) INTO nextid FROM items WHERE id > itemid;
  SELECT * FROM items WHERE id in (previd,itemid,nextid);
END//
DELIMITER ;

Hope this helps somewhat.

(I actually tested all of the above code, using a test items table that had an INT "id" field and a "content" field, with the first 10 prime numbers as the id's and the first ten letters as the content, on MySQL 5.0.)

Date: 2009-09-30 11:53 pm (UTC)
From: [identity profile] zanfur.livejournal.com
If you're fine using subselects, this works like a champ, with N being the id of the unique item in question:

select * from items where id >= (select max(id) from items where id < N) limit 3

The subselect is (as above) is optimized away to a constant-time index lookup, and the item retrieval is done by walking along the index (signified by the type of "range"), making it also constant-time:

mysql> explain select * from items where id >= (select max(id) from items where id < 11) limit 3;
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                        |
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------+
|  1 | PRIMARY     | items | range | PRIMARY       | PRIMARY | 4       | NULL |    7 | Using where                  | 
|  2 | SUBQUERY    | NULL  | NULL  | NULL          | NULL    | NULL    | NULL | NULL | Select tables optimized away | 
+----+-------------+-------+-------+---------------+---------+---------+------+------+------------------------------+
2 rows in set (0.00 sec)

(I say "constant time"...btree lookups are actually O(logN), but with a *very* high logarithmic base. Also, the entire index is usually in memory, so it's blazingly fast even for millions of rows in the table.)

+1 upvote :-)

Date: 2009-10-01 11:15 pm (UTC)
From: (Anonymous)
--Cat Typist

Item N, N+-1

Date: 2009-10-01 03:44 pm (UTC)
From: (Anonymous)
sqlite> create table t(i int);
sqlite> insert into t values(1);
sqlite> insert into t values(2);
sqlite> insert into t values(3);
sqlite> insert into t values(4);
sqlite> insert into t values(6);
sqlite> insert into t values(7);
sqlite> insert into t values(10);
sqlite> insert into t values(11);
sqlite> select b.i,count(*) from t a join t b on a.i<b.i group by b.i;
i|count(*)
2|1
3|2
4|3
6|4
7|5
10|6
11|7
sqlite> select * from (select b.i,count(*) as N from t a join t b on a.i<b.i group by b.i) x where x.N=3;
i|N
4|3

Now ask for N+1 and N-1.

Re: Item N, N+-1

Date: 2009-10-01 03:46 pm (UTC)
From: (Anonymous)
Oh, the derived table is necessary in sqlite, but IIRC not in ANSI-SQL compatible databases.

Re: Item N, N+-1

Date: 2009-10-01 11:20 pm (UTC)
From: [identity profile] zanfur.livejournal.com
This is insanely inefficient, and simply unworkable for large datasets.

Re: Item N, N+-1

Date: 2009-10-02 11:39 am (UTC)
From: [identity profile] zanfur.livejournal.com
This is also incorrect. It chops off the lowest item (id "1" is missing from the output above, even). You can make it correct by changing the "<" to "<=". Observe:
mysql> create table t(i int primary key);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values (1),(2),(3),(4),(6),(7),(10),(11);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> select b.i,count(*) from t a join t b on a.i<=b.i group by b.i;
+----+----------+
| i  | count(*) |
+----+----------+
|  1 |        1 | 
|  2 |        2 | 
|  3 |        3 | 
|  4 |        4 | 
|  6 |        5 | 
|  7 |        6 | 
| 10 |        7 | 
| 11 |        8 | 
+----+----------+
8 rows in set (0.01 sec)

mysql> explain select b.i,count(*) from t a join t b on a.i<=b.i group by b.i;
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | a     | index | PRIMARY       | PRIMARY | 4       | NULL |    8 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | b     | index | PRIMARY       | PRIMARY | 4       | NULL |    8 | Using where; Using index                     | 
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
2 rows in set (0.00 sec)

mysql> select * from (select b.i,count(*) as N from t a join t b on a.i<=b.i group by b.i) x where x.N=3;
+---+---+
| i | N |
+---+---+
| 3 | 3 | 
+---+---+
1 row in set (0.00 sec)

mysql> explain select * from (select b.i,count(*) as N from t a join t b on a.i<=b.i group by b.i) x where x.N=3;
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                                        |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |    8 | Using where                                  | 
|  2 | DERIVED     | a          | index | PRIMARY       | PRIMARY | 4       | NULL |    8 | Using index; Using temporary; Using filesort | 
|  2 | DERIVED     | b          | index | PRIMARY       | PRIMARY | 4       | NULL |    8 | Using where; Using index                     | 
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
3 rows in set (0.00 sec)

Note also that it's horribly inefficient, requiring an O(N2) calculation on the entire index to get the position field, and then -- here's the really horrible part -- scanning the entire derived table for specific values, which is unindexed. I strongly recommend against this approach. Especially because you typically want to find "next" and "prev" id's for forward and back links, which means on *every* page view of a potentially large corpus.

Re: Item N, N+-1

Date: 2009-10-04 09:26 am (UTC)
From: (Anonymous)
select b.i,count(*) as N from t a join t b on a.i<=b.i group by b.i having N=3;

No derived table necessary, as stated above.

Date: 2009-10-01 05:29 pm (UTC)
From: [identity profile] urox.livejournal.com
Does MySQL have temporary tables? Or scrollable cursors? (I know that DB2 does, but that's probably more power than you're needing at this time. ;) )

Profile

elfs: (Default)
Elf Sternberg

June 2025

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 30th, 2025 07:40 pm
Powered by Dreamwidth Studios