elfs: (Default)
[personal profile] elfs

That horrible SQL statement I posted a few days ago? Ignore it completely. A little learning is a dangerous thing, as evidenced here by the realization that I didn’t need to mix in the narrator_children query except to exclude those things that had parents, giving me a perfect query to find parentless (root) objects. My entire query boils down to this:


WITH RECURSIVE get_all (series_id, title, slug, path, url) AS (
	SELECT narrator_series.id, title, slug, 
	    TO_CHAR(narrator_series.id * 32, '0000000') AS path, 
	    CONCAT('', slug) AS url
		FROM narrator_series
		JOIN narrator_slugs ON narrator_series.slug_id = narrator_slugs.id
		WHERE narrator_series.user_id = 1 AND narrator_slugs.slug = 'aimee' 
		AND narrator_series.id NOT IN (
			SELECT child_id FROM narrator_children 
			WHERE user_id = 1 and type = 'series')
	UNION
	SELECT narrator_series.id, narrator_series.title, narrator_slugs.slug, 
	            CONCAT(get_all.path, '/', TO_CHAR(narrator_children.position, '0000000')) AS path, 
	            CONCAT(get_all.slug, '/', narrator_slugs.slug) AS url
		FROM narrator_series
		JOIN narrator_slugs ON narrator_series.slug_id = narrator_slugs.id
		JOIN narrator_children ON narrator_series.id = narrator_children.child_id
		JOIN get_all ON narrator_children.series_id = get_all.series_id
		WHERE narrator_children.type = 'series' 
	)
(SELECT series_id, title, slug, path, url, 'series' AS type from get_all
UNION
SELECT narrator_stories.id, narrator_stories.title, narrator_slugs.slug, 
        CONCAT(get_all.path, '/', TO_CHAR(narrator_children.position, '0000000')) AS path,
	CONCAT(get_all.url, '/', narrator_slugs.slug) AS url,
	'story' AS type
	FROM narrator_stories
	JOIN narrator_slugs ON narrator_stories.slug_id = narrator_slugs.id
	JOIN narrator_children ON narrator_children.child_id = narrator_stories.id
	JOIN get_all on get_all.series_id = narrator_children.series_id
	WHERE narrator_children.type = 'story') ORDER BY path;

The only problem with this query is that it assumes that the series being asked for is a root series. I want to be able to start from an arbitrary point in the tree, with ascent and decent as needed, but that should come eventually. But grief, that is so much shorter and more readable (and it supplies the URLS!)!

Date: 2013-06-15 05:45 am (UTC)
From: [identity profile] zanfur.livejournal.com
Is there a reason you're using a subselect in get_all for the narrator_children? It would be vastly more performant if you just used an outer join of the stories table with another copy of itself using a null check. It's a standard sql trick for looking for parent-less objects when you have a parent/child relation table. Effectively, if "nodes" is a table with interesting data and an "id" field and "children" is a parent/child relationship mapping table for the nodes table with "parent_id" and "child_id" fields, you do this (note the left join and the IS NOT NULL condition):

SELECT node.interesting_attrs FROM children JOIN nodes AS node ON node.id = children.child_id LEFT JOIN nodes AS parent ON parent.id = children.parent_id WHERE parent.id IS NULL

This effectively scans the nodes table for those entries where there are no parent mappings in the mapping table, meaning as soon as it finds *any* child it goes to the next one, instead of comparing against each row of the unindexed intermediate result given by the subselect. Also, if you have reasonable indices (like a "parent_id" index on the children table and an "id" index on the nodes table), the optimizer will typically just do a walk of the indexes, only actually looking at rows that have an entry in nodes but no entry in the parent_id index of the children table -- you can't get any faster than that, as it's just looking at two sorted lists to find ones missing in the smaller list. (Okay, you can get faster if you know that each node will have exactly zero or one parent and put that information directly in the nodes table -- an index could just spit out all the nodes with null parent_id without touching any others. But, if you have a mapping table, a walk of the two sorted lists of id's is literally as fast as is possible.)

It looks like this would apply to your narrator_stories and narrator_children tables.

/me optimizes sql queries for large large datasets at his day job...

--
As for handling recursion of unknown maximum depth, SQL basically just can't do it. It's just not a thing that relational algebra can handle. If you have an upper bound, you can use a coalesce chain to achieve it. Some engines hack in a way to use true recursion, but it's not portable. For instance, MySQL's variables allow it, Postgres has that "with recursion" thing. MSSQL has that the xml datatype you can go between with using xpath. I'd be interested in seeing how you manage it, if you do. :-)

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 27th, 2026 06:07 pm
Powered by Dreamwidth Studios