SQL from Hell, my left...
Jun. 15th, 2013 01:56 amThat horrible SQL statement I posted a few days ago? Ignore it completely. A little learning is a dangerous thing, as evidenced here by the realization that I didn’t need to mix in the narrator_children query except to exclude those things that had parents, giving me a perfect query to find parentless (root) objects. My entire query boils down to this:
WITH RECURSIVE get_all (series_id, title, slug, path, url) AS (
SELECT narrator_series.id, title, slug,
TO_CHAR(narrator_series.id * 32, '0000000') AS path,
CONCAT('', slug) AS url
FROM narrator_series
JOIN narrator_slugs ON narrator_series.slug_id = narrator_slugs.id
WHERE narrator_series.user_id = 1 AND narrator_slugs.slug = 'aimee'
AND narrator_series.id NOT IN (
SELECT child_id FROM narrator_children
WHERE user_id = 1 and type = 'series')
UNION
SELECT narrator_series.id, narrator_series.title, narrator_slugs.slug,
CONCAT(get_all.path, '/', TO_CHAR(narrator_children.position, '0000000')) AS path,
CONCAT(get_all.slug, '/', narrator_slugs.slug) AS url
FROM narrator_series
JOIN narrator_slugs ON narrator_series.slug_id = narrator_slugs.id
JOIN narrator_children ON narrator_series.id = narrator_children.child_id
JOIN get_all ON narrator_children.series_id = get_all.series_id
WHERE narrator_children.type = 'series'
)
(SELECT series_id, title, slug, path, url, 'series' AS type from get_all
UNION
SELECT narrator_stories.id, narrator_stories.title, narrator_slugs.slug,
CONCAT(get_all.path, '/', TO_CHAR(narrator_children.position, '0000000')) AS path,
CONCAT(get_all.url, '/', narrator_slugs.slug) AS url,
'story' AS type
FROM narrator_stories
JOIN narrator_slugs ON narrator_stories.slug_id = narrator_slugs.id
JOIN narrator_children ON narrator_children.child_id = narrator_stories.id
JOIN get_all on get_all.series_id = narrator_children.series_id
WHERE narrator_children.type = 'story') ORDER BY path;
The only problem with this query is that it assumes that the series being asked for is a root series. I want to be able to start from an arbitrary point in the tree, with ascent and decent as needed, but that should come eventually. But grief, that is so much shorter and more readable (and it supplies the URLS!)!
no subject
Date: 2013-06-15 05:45 am (UTC)SELECT node.interesting_attrs FROM children JOIN nodes AS node ON node.id = children.child_id LEFT JOIN nodes AS parent ON parent.id = children.parent_id WHERE parent.id IS NULL
This effectively scans the nodes table for those entries where there are no parent mappings in the mapping table, meaning as soon as it finds *any* child it goes to the next one, instead of comparing against each row of the unindexed intermediate result given by the subselect. Also, if you have reasonable indices (like a "parent_id" index on the children table and an "id" index on the nodes table), the optimizer will typically just do a walk of the indexes, only actually looking at rows that have an entry in nodes but no entry in the parent_id index of the children table -- you can't get any faster than that, as it's just looking at two sorted lists to find ones missing in the smaller list. (Okay, you can get faster if you know that each node will have exactly zero or one parent and put that information directly in the nodes table -- an index could just spit out all the nodes with null parent_id without touching any others. But, if you have a mapping table, a walk of the two sorted lists of id's is literally as fast as is possible.)
It looks like this would apply to your narrator_stories and narrator_children tables.
/me optimizes sql queries for large large datasets at his day job...
--
As for handling recursion of unknown maximum depth, SQL basically just can't do it. It's just not a thing that relational algebra can handle. If you have an upper bound, you can use a coalesce chain to achieve it. Some engines hack in a way to use true recursion, but it's not portable. For instance, MySQL's variables allow it, Postgres has that "with recursion" thing. MSSQL has that the xml datatype you can go between with using xpath. I'd be interested in seeing how you manage it, if you do. :-)
no subject
Date: 2013-06-15 06:02 am (UTC)EDIT: It occurs to me that it's reasonable to assume that SQL in a comment would be something malicious...let's see if an edit of a non-spam-marked post gets through.
... nope. Short version: I recommend replacing the children table subselect with a left join of the stories table with a copy of itself where the first matches a parent_id row and the second matches a child_id row, and using a null check for the parent_id field. It's scads faster than checking against each child to ensure you're not in the list of children, basically just short-circuiting whenever it finds that there's a child at all and skipping to the next one. In general, avoid "not in" subqueries, because they're generally the slowest way to go about things.
no subject
Date: 2013-06-24 03:14 am (UTC)Okay, that's an approach. I'll make a note of it. This is, however, working for the current implementation, and I need to move on to the templating engine next. Both have different intake filters (whitelists, using HTML5Lib) for their own purposes, and integrating them has proven to be a pain in the neck.
Thanks, though. I'm keeping this, and plugging it into pgadmin3 when I get the chance.
Sadly, this may actually never see the light of day. I may have to port this to *gack* MySQL, since the server I'm running may not offer Postgres.