Jul. 17th, 2016

elfs: (Default)

It’s time to come around to a point that’s been bugging me for a long time: why is the Python import routine so, well, so darned convoluted? The answer is “history,” basically the history of Python and the attempt to turn import foo.bar.baz into a tool that’s incredibly easy to use and understand for the common programmer, yet flexible enough to give the advanced programmer the power to redefine it into whatever else it has to mean.


We’ve talked about how the system has two different loading systems: the sys.meta_path and the sys.path_hooks, and how the latter is just as arbitrary as the former: the last path_hook is for the filesystem, so it runs os.isdir() on every item in sys.path and only offers to handle the ones that returns true, and it only runs after everything else has been run, so:



  • If a meta_path interpreted an import fullname with respect to a path that’s a directory, the default won’t get it,

  • If a path_hook said it could handle it, the default won’t get it,


… and so on.  The whole point of  using first-one-wins priority pathing is to leave the responsibility for not failing up to the developer. The default really is the fallback position, and it uses only a subset of sys.path.  The formal type of a sys.path entry is… no type at all. It could be a string, a filesystem directory iterator, an object that interacts with a path_hook. It could be anything at all. The only consideration is that, if it can’t be coerced into a string that os.isdir() can reject, you had better handle it before it falls through to the default.


It’s really time to call it like it is: sys.path and sys.path_hooks are a special case for loading. They’re the original special case, but that’s what they are. They lead to weird results like one finder saying it can handle foo.bar.baz and another foo.bar.quux, turning the leading elements of the fullname into arbitrary and meaningless tokens.


I wish I could call for a more rational import system, one in which we talked only about resource managers which had the ability to access resource archives, iterate through the contents, identify corresponding resources, load the contents of that resource, and compilers that could identify the text that had just been accessed (via whatever metadata was available) and turn it into a Python module.


But we can’t. Python is too well-established to put up with such rationalizing shenanigans, and too many people are dependent upon the existing behavior to help make it so. Python was born when NFS was the thing, when there were no real open-source databases, no object stores. Python was released two years before the Mosaic web browser! It would be far too disruptive. So we’ll keep getting PEPs forever trying to rationalize the irrational.


That’s okay. It gives me something to get paid for.


But, it does point out one major flaw: because Finders and Loaders are so intimately linked, even if we manage to rationalize FileFinder and SourceFileLoader, that’s only with respect to the Filesystem. We’ll have to make equivalent loader/finders for any other sort of accessor, be it Zipfiles or any of the other wacky resource pools that people have come up with.


Unfortunately, I don’t have a good plan for those. Fortunately, filesystems are still the most common way of storing and loading libraries, so concentrating on those gets us 99% of the way there.

elfs: (Default)

Module Iterators, as defined in pkgutil.py, aren’t really part of the mess that has been imposed on us by PEP-302 and its follow-on attempts to rationalize the loading process, but they’re used by so many different libraries that when we talk about creating a new general class of importers, we have to talk about iterators.


Iterators, after all, are why I started down this project in the first place. It was Django’s inability to find heterogeneously defined modules that I set out to fix.


Iterators are define in the pgkutil module; their entire purpose is, given some kind of reference to an archive, to be able to list the contents of that archive, and to recursively descend into that archive if it happens to be a tree-like structure.


When you call pkgutil.iter_modules(path, prefix), you get back a list of all the modules within that path or, if no path is supplied, all the paths in sys.path. As I pointed out in my last post, the paths is sys.path aren’t necessarily paths on the filesystem or, if they are, they’re not necessarily directory paths. All that matters is that for each path, a path_hook exists that can return a Finder, and that Finder has a method for listing the contents of the path found.


In Python 2, pkgutil depends upon Finders (those things we said were attached to meta_path and path_hooks) to have a special function called iter_modules; if it does, that function is used to list the contents of the “path”.


In Python 3, the functools.singledispatch tools is used to differentiate between different Finders; once a Finder has been identified by path_hooks, the singledispath us used to find a corresponding resource iterator for that Finder. It doesn’t necessarily have to be a method on the Finder, although the default has a classmethod that is its finder.


An iterator is pretty straightforward; once you know the “path” (resource identifier) and the Finder for that path, you can call a function that checks for the presence of modules. In the case of FileFinder, that function is a combination of listdir, isfile, and isdir/isfile' to check fordir/init` pairs indicating a submodule.


For our purposes, of course, we had to provide a path_hook that eclipses the existing path_hook, and we had a provide a Finder that was more precisely ours than the inherited base FileFinder, so that single dispatch would find ours before it found FileFinder‘s and still work correctly.




There is one other module I have to worry about: modulefinder. It’s not used often, it’s not used by Django or any of the other major tools that I usually use, and it’s never been covered by Python Module of the Week. That doesn’t mean that it’s hard-coding of the ‘.py’ suffix isn’t problematic. I’m just not sure what to do about it at this point.

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 5th, 2026 06:10 am
Powered by Dreamwidth Studios