elfs: (Default)

I wish I’d known this a long time ago.  Django’s request object includes a dictionary of key/value pairs passed into the request via POST or GET methods.  That dictionary, however, works in a counter-intuitive fashion.  If a URL reads http://foo.com?a=boo, then the expected content of request.GET['a'] would be 'boo', right?  And most of us who’ve used other URL parsers in the past know that http://foo.com?a=boo&a=hoo know that the expected content of request.GET['a'] would be ['boo', 'hoo'].

Except it isn’t.  It’s just 'hoo'.  Digging into the source code, I learn that in Django’s MultiValueDict, __getitem__(self, key) has been redefined to return the last item of the list.  I have no idea why.  Maybe they wanted to ensure that a scalar was always returned.  The way to get the whole list (necessary when doing an ‘in’ request) is to call request.GET.getlist('a').

Lesson learned, an hour wasted.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

It drives me nuts that we in the Django community rely on Solr or Haystack to provide us with full-text search when MySQL provides a perfectly functional full-text search feature, at least at the table level and for modest projects. I understand that not every app runs on MySQL, but mine do, and I’m sure many of you are running exactly that, and could use this technique without modification.

Well, after much digging, I found an article on MercuryTide’s website covering custom QuerySets with FULLTEXT and relevance, and built this library around it.

I used this rather than Django’s internal filter keyword search, because this technique adds an additional aggregated value, the relevance of the search terms to the search. This is useful in sorting the search, something not automatically provided by the QuerySet.filter() mechanism.

You must create the indexes against which the search will be conducted. For performance reasons, if you’re importing a massive collection of data, it’s better to import all of the data and then create the index. More importantly, when you declare that a SearchManager to be used by a Model, you declare it thusly:

class Book(models.Model):
    ...
    objects = SearchManager()

When you do, you must add an index that corresponds to that list of fields:

CREATE FULLTEXT INDEX book_text_index ON books_book (title, summary)

Notice how the contents of the index correspond with the contents of the Search Manager.  Or you can automate the process with South:

    def forwards(self, orm):
        db.execute('CREATE FULLTEXT INDEX book_text_index ON books_book (title, summary)')

    def backwards(self, orm):
        db.execute('DROP INDEX book_text_index on books_book')

To use the library is fairly trivial. If there is only one index (which can encompass several columns) for any table, you call

books = Book.objects.search('The Metamorphosis').order_by('-relevance')

If there’s more than one index, you specify the index by the list of fields:

books = Book.objects.search('The Metamorphosis', ('title', 'summary')).order_by('-relevance')

Note that that’s a tuple, and must be.

If you specify fields that are not part of a FULLTEXT index, the error message will include lists of viable indices.   It will also tell you if there are no indices.  (Getting that to work was tricky, as it involved database introspection and the decoration of methods, so I’m especially proud of it.)

The library is fully available on my github account: django_mysqlfulltextsearch

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Recently, I had the pleasure of attending another of those Seattle Django meet-ups.  This one was a potpourri event, just people talking about what they knew and how they knew it.   I revealed that I’d written my first aggregator, and that seemed to be an impressive statement.  Apparently Django Aggregators (database conditionals that perform sub-selected summarizing or filtering events) is something of a black art, much like Wordpress Treewalkers were a black art I figured out in just a few hours.

Aggregators consist of two parts: The Definition and the Implementation.  Unfortunately, Django’s idea is that these are two different objects, bound together not by inheritance but by aggregation (both the definition and the implementation are assembled in a generic context, one providing access to the ORM and the other to the SQL).   The definition is used by the ORM to track the existence and name of the aggregate, and is then used to invoke the implemenation, which in turn creates the raw SQL that will be added to the SQL string ultimately sent to the server, and ultimately parsed by the ORM.

I needed to use aggregation because I wanted to say, “For any two points’ latitude and longitude, give me the great circle distance between them,” and then say, “For a point (X, Y) on a map, give me every other place in the database within n miles great circle distance.”

The latter was not possible with Django’s Queryset.extra() feature.    You can add a WHERE clause, but not a HAVING clause, and this definitely requires a HAVING clause when running on MySQL.  Using an Aggregator with a limit forces the ORM to realize it needs a HAVING clause.  Besides, it was a good excuse to learn the basics of Aggregation. Ultimately, I was able to do what the task required: find the distance between any two US Zip Code regions without making third-party requests.

I make absolutely no promises that this code is useful to anyone else.  The Aggregator is definitely not pretty: it’s virtually a raw SQL injector.  But it was fun.  Enjoy: Django-ZipDistance.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

As I’ve been working on a project at Indieflix, I’ve been evaluating other people’s code, including drop-ins, and for the past couple of days a pattern has emerged that, at first, bugged the hell out of me. Django has these lovely things called context processors– they allow you to attach specific elements of code to the request context before the router and page managers are invoked; the idea is that there are global needs you can attach to the context in which your request will ultimately be processed, and this context can be grown, organically, from prior contexts.

Nifty.  I kept noticing, however, an odd trend: programmers attaching potentially large querysets to the context.  Take, for example, the bookmarks app: it has a context processor that connects a user’s bookmarks to the context, so when time comes to process the request if the programmer wants the list of the user’s bookmarks, there it is.

It took me three days to realize that this is not wasteful.  It says so right there in the goddamn manual: QuerySets are lazy — the act of creating a QuerySet doesn’t involve any database activity. So building up the queryset doesn’t trigger a query until you actually need the first item of the queryset.  It just sits there, a placeholder for the request, until you invoke it in, say, a template.  Which means that you can attach all manner of readers to your contexts, and they won’t do a database hit or other I/O until you drop a for loop into a template somewhere. I’ve been avoiding this technique for no good reason.

This also means that pure “display this list and provide links to controlling pages, but provide no controls of your own” pages are pure templates that can be wrapped in generic views.

Ah!  And this is why I didn’t get why generic views were such a big deal.  Information is often so context sensitive so how could a generic view possibly be useful?  The examples in the book are all tied to modifying the local context in urls.py, but that’s not really where the action is.  The action is in the context processors, which are building that context.

I feel like Yoda’s about to walk in and say, “Good, but much to learn you still have, padawan.”

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Following up on the The Onion’s announcement that they’re using Django comes this priceless discussion of the technical challenges of doing so with several members of The Onion’s technical team. They were using Drupal before.

Among the things I discovered:

  • Grappelli, a customizable theme for the Django admin
  • uWSGI, a high-performance WSGI container separated from Apache,
  • HaProxy, a viable open-source TCP/IP Load Balancer.

I’m constantly reminded, when I work on IndieFlix, “You’re not YouTube. Don’t code like you are ever going to be YouTube.” And they’re right. If I ever reach that level of technical difficulty, I’ll deal with it then. But we very well could have traffic issues similar to The Onion’s, and that’s not a bad target to aim for.

Also not to be missed in this conversation: The Onion cut 66% of their bandwidth by upstream caching 404s

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

From the announcements for Rails 3:

The upcoming version 3 of Ruby on Rails will feature a sexy new querying API from ActiveRecord. Here is an example:

User.order('users.id DESC').limit(20).includes(:items)

In other words, Rails is now Django.

Also:

  • Each application now has it’s own name space, application is started with YourAppName.boot for example, makes interacting with other applications a lot easier.
  • Rails 3.0 now provides a Rails.config object, which provides a central repository of all sorts of Rails wide configuration options.

In other words, Rails is now Django.

To be fair, these are huge improvements to Rails. They’ve needed to do these things for a long, long time. The separation of application namespaces is especially powerful– it’s what gives Django a massive chunk of it’s dynamism. It’s good to see that these great ideas, which have been in Django since version 2, have now made it into Rails, just as the Django people start grappling with their own version of Capistrano (Fabric) and their own deployment issues. Rails’ migration path has always been obvious, a pythonic value, while Django has two migration tools (South and Evolution), which is more a rubyish value, and the Django team has decided to leave migration tracks up to outside development teams may-the-best-solution-win.

So, we’ll see. I’m installing Rails 3 this morning, and who knows?  Maybe it’ll seduce me back to working with Rails again.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Java is Pass-By-Value, Dammit!

Quite possibly the most important article I’ve ever read, because it finally, finally explains to me what Java’s object-passing model is really all about. I’ve never understood it, and now I do: it’s exactly backwards from pass-by-reference, so it’s exactly backwards from the languages with which I grew up. The Object (and derivative) variables are pointers, not references, and calling them references only confuses people who grew up writing C (as I did).

Even more oddly, it explains to me what I’ve never quite understood about Python’s object model, because Python’s object model is exactly the same.   Reproducing the code in the article above in Python creates the same result:

class Foo(object):
    def __init__(self, x): self._result = x
    def _get_foo(self): return self._result
    def _set_foo(self, x): self._result = x
    result = property(_get_foo, _set_foo)

def twid(y):
    y.result = 7

def twid2(y):
    y = Foo(7)

r = Foo(4)
print r.result  # Should be 4
twid(r)
print r.result  # Should be 7

r = Foo(5)
print r.result  # Should be 5
twid2(r)
print r.result  # Still 5

This demonstrates that Python’s code remains pass-by-value, with pythonic “references” in fact being pointers-to-objects. In the case of twid2, we change what the pointer y, which exists in the frame of the call twid2, points to and create a new object that is thrown away at the end of the call.  The object to which y pointed when called is left unmolested.

This is important because it changes (it might even disrupt) the way I think about Python. For a long time, I’ve been using python calls out of habit, just knowing that sometimes objects are changed and sometimes they aren’t.  Now that the difference has been made clear to me, in language that I’ve understood since university, either I’m going to be struggling for a while incorporating this new understanding, or I’m going to be much more productive.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Google last week released Google Go, a new programming language designed to be “more C-like” (a kind of python/C#/Java mash-up of a language) with support for Erlang’s excellent concurrency management, which allows you to write programs that can do lots of different little things at the same time on machines that have multiple CPU’s. It’s odd that Google that should come out with this at about the same time that I started looking into Python’s high-concurrency variant, Stackless, because both Stackless Python and Go purport to do the same thing: bring high-quality concurrency management to the average developer in a syntax more familiar than Erlang’s, which is somewhat baroque.

While looking through both, I came across Dalke’s comparison of Stackless Python versus Go. Not even taking into account the compile time (which is one of Go’s big features, according to Google), Dalke compared both and found that Stackless Python was faster. His comparison, he wrote, maybe wasn’t valid because the machines on which he ran some of the comparison were close, but not the same.

So, here’s the deal: on a Thinkpad T60, running on the console with no notable load whatsoever, I ran Dalke’s two programs, both of which spin off 100,000 threads of execution and then sum them all up at the end.

The “real” time to run the Go program was 0.976 seconds (average after ten runs). The “real” time to run the Python program was 0.562 seconds (average after ten runs). The Python program, which must make an interpretive pass before executing, was almost twice as fast as the Go program. (It was 1.73 times as fast, which matches up approximately with Dalke’s 1.8).

In fact, you can see the consequences of this in the way the other figures come out: the amount of time dedicated to CPU for both is wildly different. Both spent approximately the same amount of time executing the user’s code (python: 0.496 seconds, go: 0.514 seconds), but time spent by the kernel managing the code’s requests for resources is way off (python: 0.053 seconds, go: 0.446 second).

It may be that Go is simply doing something wrong and this can be fixed, making Go a competitive language. An alternative explanation is that Stackless Python has an erroneous implementation of lightweight concurrency and someday it’s going to be found and high-performance computing pythonistas are going to be sadly embarrassed until they fix the bug and come into compliance with Go’s slower but wiser approach.

But I wouldn’t bet on that.

Kent Beck recently said of Google Wave, “it’s a bad sign for wave that no one can yet say what it is indispensable for. not just cool (it’s definitely that), but indispensable.”  The same can be said of Google Go:  There’s nothing about it I can’t live without, and other implementations of the same ideas seem to be ahead of the game.  Stackless Python not only has Erlang-esque concurrency and reasonable build and execution times, but it has a familiar syntax (which can be easily picked by by Perl, Ruby, and PHP developers), a comprehensive and well-respected standard library, and notably successful deployed applications.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Do you have that one thing that you have to constantly look up?

In python, to replace elements of a string, there are two operators.  One is a strictly linear search, the other uses regular expressions.  The regexp call to replace part of a string with another string is sub, and the string call to replace a part of a string with another string is replace.

I get these backward all the flamin’ time, and it drives me insane.  I use both of these every day, why the Hell can’t I remember which is which?

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

We frequently write little functions that populate the Django context, and sometimes we want that context to be site-wide, and we want every page and every Ajax handler, basically everything that takes a request and spews a response, in our application to have access to that information.  It might the user’s authentication, or his authorization, or some profile information.  Or it might be environmental: a site might have figured out what time it is on the user’s site, and will adjust backgrounds and themes accordingly.

The context might be a simple variable.  I have an example right here: is the browser you’re using good enough?  (I know, this is considered Bad Form, but it’s what I have to work with) .  The function has the simple name, need_browser_warning.  The context key may as well have the same name.  Using a constant for the context key is the usual pattern; this ensures the Django programmer won’t get it wrong more than once, at least on the view side.  (The template is another issue entirely.  Set your TEMPLATE_STRING_IF_INVALID in settings.py!)

I wanted something more clever in my context processor.  Here’s sickly clever:

import inspect
def need_browser_warning(request):
    return { inspect.currentframe().f_code.co_name:
        not adequate_browser(request.META.get('HTTP_USER_AGENT')) }

Yeah, that’s a little twisted.  It guarantees that the name of the context key is “need_browser_warning“, and the value is True or False depending upon what the function “adequate_browser” returns, which is what we want, so it’s all good.

Obviously, this isn’t good for everything.  Some context processors handle many, many values.  But for a one-key, this is a nifty way of ensuring name consistency.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Today’s little snippet: Filtering a loosely coupled many-to-many relationship.  As revealed earlier,  I don’t really “get” the difficulty with many-to-many relationships.  I don’t even get the difficulty with signals; if you define the many-to-many manually, handling signals on it is trivial compared to trying to do it manually in one of the referring classes.

Today, I was working on an issues tracker.  There are two classes at work here, the Profile and the Issue.  One profile may be interested in many Issues, and obviously one Issue may be of interest to many Profiles.

This calls for a ProfileIssue table that stands independent (in my development paradigm) of both Profiles and Issues.   As I was working on a dashboard, I realized that one of the things I wanted was not just a list of the issues the profile was following, but also a list of the issues that the profile was responsible for creating.  As it turned out, adding that query to the ProfileIssueManager is trivial, but requires a little knowledge:

class ProfileIssueManager(models.Manager):
    def from_me(self, *args, **kwargs):
        return self.filter(issue__creator__id = self.core_filters['profile__id'])

The secret here in knowing about the core_filters attribute in the RelatedManager.   It contains the remote relationship key that you can use;  calling from_me from profiles works, but calling it from anywhere else doesn’t.  The IssueRelatedManager won’t have a profile_id and this will blow up.  That’s okay; using it that way is an error, and this is a strong example of Crash Early, Crash Often.

I can here some of you cry, “Now why, why would you need such a thing?” Well, the answer is pretty simple: templates. Having one of these allows me to write:

<p>Issues tracked: {{ profile.issues.count }}</p>
<p>Issues created: {{ profile.issues.from_me.count }}</p>

And everything will work correctly.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Repeat after me:

  • Registration is not Authentication is not Authorization is not Utilization.
  • Registration is not Authentication is not Authorization is not Utilization.
  • Registration is not Authentication is not Authorization is not Utilization.

I’ll keep reminding myself of that until I figure out how to disentangle the four from this damned Facebook app.  Registering to use the app is not the same thing as authenticating to use the app, and it’s definitely not authorization to determine your level access.  Nor is any of this related to callbacks to the social application network to give you things like lists of friends and writing on your wall; that’s outside the responsibility of SocialAuth anyway.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

The correct call for posting to a user’s facebook wall with Python and pyfacebook, after you’ve established both user authentication via FacebookConnect and gotten stream_publish permission, is:

request.facebook.stream.publish(
    message = render_to_string(template_path, fb_context),
    action_links = simplejson.dumps(
        [{'text': "Check Us Out!", 'href': "http://someurl.com"}]),
    target_id = 'nf')

See that ‘nf’ down there in target_id?  It’s not on any of the Facebook documentation pages, but that is the correct string to post to your user’s facebook Newsfeed. (For that matter, the fact that you have to run the action_links through simplejson, and that they have to match the PublishUserAction API action_links spec, is also not documented; the documentation says it just needs an array of arrays.)  I have no idea how to post to some other user’s newfeed, but at least I’m one step closer.

Oh, another important tip: in order to make my news “stories” consistent, I’m using a template to post them to Facebook.  The template must not have newlines within, or they will show up on Facebook and it’ll look all ugly.  Every paragraph should be one long line of text without line breaks.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

I’m prepping for an interview this afternoon at what is primarily a Perl shop, and so I’m mainlining the O’Reilly Perl in a Nutshell book as a way of reminding myself about the little details of the language. I can write Perl just fine– I just made some major hacks to dailystrips for example, and I write various renaming and text processing scripts all the time in Perl, because it is the language for that kind of thing.

But it’s little corners like, exactly what does bless do, that I’m reminding myself of. I know it’s the OO instance creation operator, and I remember the instantiation procedure but what exactly does it do?

So I go read the source code for Perl, in C, because that’s where the magic is kept, and I discover to my not-so-surprise that bless is a complete and utter hack. It puts a flag in the dereferencer to point to a different function, one that seems added after-the-fact, that instead of handling a procedure one way, instead just handles it another way with the recently dereferenced scalar as the first argument. That’s all it does. OO is so “bolted onto the side” of Perl that it’s amazing how important it seems to have become to the language.

But there’s so much missing from Perl; the whole metaprogramming capabilities of modern languages like Python, Ruby, and Javascript is just gone– done instead with code generation and eval, good grief– and yet the capacity for obfuscation is so terribly great. In many ways, Perl feels more like Bash scripting with a much bigger library of niceties and bolt-ons, which may explain why I use it that way.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

I was reading through the Wordpress source code, trying to figure out a problem for a contractor, when I saw the function compact().  When I saw it I boggled, read the description, and shook my head.

Compact() takes a list of variable names as strings, and returns a hash of those variable names and their values.   So if you have something like:

$title = "My blog";
$link = "foo";
$h = compact('title', 'link');

You get back a hash of array(’title’ => ‘My blog’, ‘link’ => ‘foo’).

That, to my thinking, is completely messed up.  You’re giving this function, which has its own scope, explicit permission to twiddle with variables in the current scope and create a new variable.  It’s one of those things that convinces me that PHP is an unholy mess of silliness.

And then my brain reminded me that, hey, you can do the exact same thing in python.  So, I have:

import inspect
def compact(*args):
    return dict([(i, inspect.currentframe().f_back.f_locals.get(i, None))
                 for i in args])

def foo():
    a = "blargh"
    b = "bleah"
    c = ['1', '2', '3']
    return compact('a', 'b', 'c')

print foo()

And sure enough, this spits out: {’a': ‘blargh’, ‘c’: ['1', '2', '3'], ‘b’: ‘bleah’}

I hang my head in shame for giving Django developers one more thing around which they can develop bad habits.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

As some of you may have heard, I recently lost my job at Isilon. In that great tradition, I have put up my resume. Have a look, and please comment on the content or presentation of either version:

Kenneth M. Sternberg, Senior Web and User Interface Developer and Designer.

There’s a copy for printing here.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

Last year I briefly flirted with Ruby.  I once infamously described Python as “Perl for grown-ups,” and if that’s true, then Ruby is the hot 20-something firecracker chick of your programming mid-life crisis.  You’ve got nothing in common but damn the interfacing is great.  Maturity is getting over it and coming back to the stable reliability and maintainability of Python once more

But if Python is the language of the mature developer, than Django is that stage of midlife where life becomes way too complicated.  (Studies indicate that 44 is “the worst year of life” for most people: kids, family, career, finances, professional, social and romantic obligations all pile on hard around this time, leaving most people too little time to think or plan for happiness.)

I’ve been spending the day trying to port narrator, the program suite that runs my story site, from Ruby to Python.  Ought to be easy, right?  I decided to make it harder by porting it to Django instead.  I have the data models, and they’re great for Python.  Easy to port.  But I had to make it harder on myself, by trying to create containerized subsystems for my stuff: stories belong to authors, but authors might have series, series might have novels as well as stories, but there might be standalone novels, and you might have series so short and standalone that they go in your short story collection with an ident or something.  This immediately suggested to me a database trick I know called modified preorder tree traversal, a technique in which allows you to store hierarchical information in a meaningful manner.  There’s even a pre-build MPTT script for Django unmysteriously named django-mptt.

And that’s where I wandered off into the weeds.  I tried to understand how the mptt worked and how to incorporate it into the models I was building, and eventually my head started to explode.  It involved something called “generics,” which are a contributed library for Django’s ORM that uses metaprogramming to create one or more model classes with a many-to-many relationships to many objects in many different models.  It’s very cool.  It’s very esoteric.  It’s very hard to understand.  The layers between implementation and concrete realization are many and intertwined.

One of the differences between Rails and Django is that Django is “just” a bunch of Python libraries loosely assembled.  But Python encourages reading the source; that’s the idea, it’s supposed to be a readable, self-documenting language.  And sure enough, it is, if the problem domain is small enough.  It’s when you start to layer domain on top of domain, solution on top of solution, that the system become unwieldy.  The headaround for some Django applications is beyond the average programmer (and believe me, I am an average programmer).  Assaf Arkin encapsulated this idea perfectly in a recent post about object-oriented programming, quoting Travis Jensen:

“My point is that the architectural complexity of these applications inhibit a person’s understanding of the code. … Without actually doing anything, applications are becoming too complex to understand, build, and maintain.” Every layer makes perfect sense in isolation. The cracks start showing when you pile them up into a mega-architecture.

This seems to be the problem even with small Django applications. By presenting the four components of Django– the ORM, the Templating language, the Route/View library, and the Administrative envelope– as four separate components– The Django Project has greatly increased the cognitive load on the programmer.  Django becomes harder to learn than Rails because of the extra mental effort needed to grasp all the intricacies.

There is a perfect irony here: Django is loosely coupled enough that you can do whatever you want with it and it will probably work.  But understanding what crosses that loose coupling is difficult, and when you’ve got all the layers, plus contribs and plugins going, you need  to pull out pencil and paper for even the smallest of efforts.  Rails, in contrast, is a hodge-podge of different technologies all thrown together into a pot, but because the average web developer is actively discouraged from being curious about more than what he needs to get the job done, Rails is easier to grasp from the very start.

It’s probably worth it.  The payoff is a Deep Understanding of the joys of Python metaprogramming.  Done right, I’ll probably have an even better grasp of the abstractive power of it all.  But it’s a long, slow slog.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

I’m very fond of gaffitter, a smart little console program that scans a list of files and/or directories and fumbles through them until it comes up with a subset of that list that will fit in a given space.  It’s perfect for taking large directories of stuff and segmenting them into archivable collections.

I recently ran out of disk space on my desktop and realized I had hundreds, nay thousands, of music directories that I needed to put somewhere else.  I suppose I could have bought more hard drive space, but more than that I wanted a lot of it just to be put away.   I ran gaffitter on the collection with a limit of 4.2 GB, the reliable size of a data DVD, and it said I had about 35 collections worth.  Great, I thought, but how to organize the output of gaffitter into subdirectories that I could then burn onto DVDs?

For that, I wrote mass_gaffiter.py.  It’s a very simple little script that uses gaffitter’s regular output as its input, and then turns around and spits out another script (a shell script this time) that, for each collection gaffitter has identified, creates a subdirectory and moves everything in that collection into the subdirectory.  When it’s done, your cluttered directory is organized into a collection of subdirs named “gaf_disk_01″, “gaf_disk_02″, etc., all ready for growisofs or whatever other DVD burning software you like.

I’m trying to get into the habit of sharing the little utilities in life that I can’t work without.  I think of them as little throwaways, but some of them I’ve kept for years, so I figure someone else might have good use of them. Here’s mass_gaffiter.py:

#!/usr/bin/env python

import sys
import re

re_m = re.compile(r'^\[(\d+)\] Sum')

f = open(sys.argv[1], "r")
accum = []
for l in f:
    g = re_m.match(l)
    if not g:
        accum.append(l[:-1])
        continue

    print 'mkdir gaf_disk_%03d' % int(g.group(1))
    print 'mv %s gaf_disk_%03d' % (
        ' '.join(['"' + i + '"' for i in accum if i]),
        int(g.group(1)))
    accum = []
This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

A couple of weeks ago I bought the book “The Definitive Guide to Django,” and I’ve come to realize, to my frustration, that the book is already outdated.  My big headache this week was dealing with the administration interface, which the Django people swear is one of the coolest features of the entire application server.  The problem is simple: between Django 0.96, which is when the book was written, and Django 1.0, which is what I’m running, the interface was completely changed.

In 0.96, the way you defined a database table as being “administratable” was to add to the Python definition of the table a subclass entry class Admin: pass.  The Administration app would automagically pick out those tables that were administratable and they would appear in the admin interface.

In 1.0, it’s completely different.  Instead, you must create in your application, next to your models, a file admin.py which contains registry lines for each model and, optionally, an administration interface class that describes how the model should be administered.  It’s all covered in the Django tutorial which doesn’t really help you if all you have is the book.

This makes obvious sense. Administrative details are independent of model details, and although the argument could be made (and was made) that they’re implementation details of the model, making it a separate decorating class also makes just as much sense. Yes, it means that the details of a class (the administrative class of a model) are in two different places, but it also means that administrative features of your application can be restricted in deployment just by deleting the admin.py file.

You did keep a copy of admin.py in your source control, right?

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com
elfs: (Default)

I started using this recently. If you do a lot of Python, you’ll sometimes find yourself desperate for breadcrumbs, little print statements scattered throughout your code as you try to figure what you told it to do, since it’s obviously not doing what you want it to do. I’d used inspect previously to unravel exception handlers; there’s a customized one inside the Isilon UI, so if the product fails in the field the exception will be logged to a file for later analysis, but this is a nice little routine. It’s basically a pythonic version of the C __LINE__ macro. Wherever you think you might need a debugging statement, put if DEBUG: print _line(), foo, bar, baz and you’ll not only get the fields you want to see, but also the name of the function/method, and the line number of the print statement.

import inspect
def _line():
    info = inspect.getframeinfo(inspect.currentframe().f_back)[0:3]
    return '[%s:%d]' % (info[2], info[1])

Use as needed.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com

Profile

elfs: (Default)
Elf Sternberg

June 2025

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 9th, 2025 03:47 am
Powered by Dreamwidth Studios