Geek tales

Dec. 4th, 2007 08:40 am
elfs: (Default)
[personal profile] elfs
It was 1996 when CompuServe bought the company at which I was working, Spry, and turned us into The CompuServe Internet Division. The idea was that this Internet Thing was becoming important and might someday be as important as CompuServe's own network, and so CompuServe decided that they needed some expertise and that we had it. They wanted our advice.

CompuServe at the time had phone banks scattered in closets throughout the country that their customers used to access the central repository in Columbus, Ohio. While CompuServe did not like the Internet, they did like TCP/IP and so over that summer they moved their network to a hybrid system that would support their old protocols at the network level while also supporting IP. One of the services they used was something called RADIUS[?] (Remote Authentication Dial In User Service), which allowed all their phone banks to do lightweight authentication of customer accounts when they dialed in.

CompuServe had reluctantly allowed Spry's 30,000 or so customers to keep their Spry accounts rather than force them to move over to the CompuServe Information Service. CompuServe at the time had ten times as many customers. Spry was ordered to use a specific RADIUS server by a specific vendor, connected to an Oracle database.

My development partner, Brad, and I worked hours trying to get this beast to work. First of all, it was an NT product that had been "ported" to Solaris (Spry's OS of choice). It leaked like a sieve, eventually crashing. It had to run as root so it would crash the whole server, not just its own process. Its communication with Oracle was flaky at best.

We had a conference with the vendor. He said, "Yeah, we know about the leak. We're not going to fix it, the Solaris product isn't a great seller. Look, just reboot the server every night. That's what the NT people do." Brad and I were aghast. "Oh, and the communications with Oracle isn't good either. Our people aren't as familiar with Solaris as they are with NT. Use the Solaris ODBC drivers from Oracle."

Under orders from bosses three layers up, we agreed to go with a restart regimen for the servers. Then we actually saw the post-evaluation bill for the ODBC drivers: $30,000 each. And we needed two: one for the primary, one for the failover.

After five weeks of this, Brad and I had had enough. We both went in one Saturday to fix the problem dead. We downloaded the reference implementation of RADIUS, a GPL'd solution. I wrote a bog-stupid forking server to watch a delivery queue and feed the requests to the Oracle database, and then take Oracle's responses and drop them into another delivery queue. Brad wrote a new back-end for the RADIUS reference server that would use my delivery queues, unpacking authentication requests from the outside world and packing up the responses. Back then, Brad was better at building robust servers, and I was a better Oracle guru.

I got my boss, Tim, to agree to a switchover, and we ran it a week without a hitch. Then another, then another. Finally, we took it to his boss, Stuart, and told him what had been running the authentication server all this time. He said (in his fine Scottish accent), "That sounds just like something you two would come up with. We'll leave it. Don't tell the boys in Columbus."

It ran fine for six months without any downtown at all. And then we got a call from Columbus: their entire RADIUS "server wall" was down. For 48 hours, SpryNet (as our customer base was called) was the only part of CompuServe still running.

Brad and I were ordered to fly out to Columbus to give a speech on how we had achieved so much stability that our RADIUS server never even appeared in the bugbase. Stuart said, "You're going to have to tell them. You always say working code is better than theory." Brad and I went.

In the CompuServe buried bunker with its halon systems and its fluorescent lighting Brad and I showed our system: two Solaris boxes running the RADIUS servers, and two Solaris boxes running Oracle databases, with one cold spare. We explained to them the premise of the system, the implementation decision we used, and the monitoring kit we'd put around it. We made a pitch for the open source nature of the server, explaining that we'd had to publish the queue manager because it was written directly into the RADIUS server, but showing that the tradeoff had been worth it: several contributors had sent back fixes for range checking and buffer overflows that we'd missed.

To say they were horrified is to downplay their reaction. They were an all-NT house. They were completely beholden to Microsoft for their software, and couldn't possibly consider installing Solaris. They had thousands of NT boxes; shifting to Microsoft products had been upper management's brilliant idea for modernizing CompuServe and bringing it out of its dark ages of six-bit CPUs and DEC PDP-11s. Going to Unix, an OS 20 years older than NT, would be "a step backwards."

Worse yet, we had given away CompuServe intellectual property. We had developed something in-house and just given it away without licensing agreements. I said there was a licensing agreement with the software, it was called the GPL, and agreeing to it had saved SpryNet $90,000 (the commercial RADIUS servers were $15,000, plus $30,000 for the Oracle ODBC shims, each) and gotten us some very valuable tech support beside. One of CompuServe's techs actually said, "The GPL's not a real license."

They were very proud of their "RADIUS Wall": 50 RADIUS servers fronting 50 Oracle servers, with a 10-and-10 collection of spares. They had had a catastrophic failure in one Oracle NT server and the resulting failure had corrupted the copies in all fifty; that had been their meltdown that had thrown them off the air until they could re-install enough NT instances and Oracle and restore from backup to ensure a decent quality of service. It had taken them two 24-hour days of nonstop manual labor to recover. One of the engineers said, "We have 10 times as many customers as you guys. We couldn't do it your way."

I did not point out they had 25 times as much hardware, and they still couldn't keep it going. I did not point out that they had marginally higher labor costs because of the manpower needed to maintain their server farm. I did not point out that they had spent $15,000 per server for that commercial junk RADIUS server, whereas we had gotten ours more or less for free. I did not point out that our customer base hadn't just spent 48 hours wondering when they'd get their next Internet fix.

Ultimately, CompuServe learned nothing from our experience. This became a pattern with them: they would ask our advice on something since we were their Internet experts, they would listen, and then do nothing with the advice we gave them. They had bought us for our internet cachet, not our expertise after all.

I learned a lot: about writing servers, about the undeniable value free software has to the infrastructure of the Internet, about how using the GPL can actually save a company money if used correctly and honestly, about dealing with databases, and about dealing with managers. I'm still not good at the last because I've come to understand that managers are irrational people caught between two crushing stones: the one from the top that controls the money, and the one from underneath that may bring creative destruction raining down.

Date: 2007-12-04 05:04 pm (UTC)
From: [identity profile] littleone66.livejournal.com
Oh I remember when ... Ok I don't remember specifically about the server issues, but OMIGOD do I remember that corporate would never listen to us and you noticed (after the fact) that they were quick to sell off their Internet division. I still rue the day CS bought Spry. I knew it would not be a good thing. Oh well, evolution.

Date: 2007-12-04 05:18 pm (UTC)
From: [identity profile] elfs.livejournal.com
What I still find interesting in the way they ignored us. It wasn't that they completely ignored us. It wasn't that they took our advice and did the opposite to spite us.

They sent their experts to Seattle, who would actually listen patiently to our explanations, would take down notes on our technical expertise... and then absolutely nothing would happen.

They didn't hear the cultural messages we explicitly gave them during those meetings and they thought they could "own" the Internet.

"You kids, with your autogyros and bi-focular spectacles and electrical difference engines! Confound it all!"

Date: 2007-12-04 08:15 pm (UTC)
lovingboth: (Default)
From: [personal profile] lovingboth
Doesn't surprise me at all, but then I'm thinking of how the UK's Health Education Authority never listened to the people it was proud to wave around as being part of its advisory groups on various groups and HIV.

And what being a CI$ customer was like.

Date: 2007-12-04 05:40 pm (UTC)
From: [identity profile] mouser.livejournal.com
I only remember CIS as a customer. It was their pricing that got me.

One month; $300+ in overage charges. Next month; Netcom.

Date: 2007-12-04 06:25 pm (UTC)
From: [identity profile] antonia-tiger.livejournal.com
I remember, in the nineties, occasional references on Demon Internet, to their use of RADIUS servers.

It wopuldn't surprise me if they were looking at your work, maybe even using it. A friend who worked there described their system as a work of insane genius. As customers we had fixed IP addresses with dial-up. The way the UK phone system works, they were able to set up a single phone number that connected from anywhere in the country to a single bank of modems. So the RADIUS system did authentication, and then a miracle occurred which set up IP routimng to whichever modem you were connected to.

(I over-simplify: the physical modems were divided between at least two sites.)

I'd be astonished if your work didn't feed into that.

And you right such good fiction too. Is there no end to your talents?

Might have been less difficult than you think...

Date: 2007-12-04 06:51 pm (UTC)
From: [identity profile] danlyke.livejournal.com
I'm trying to remember how the Livingston Portmasters worked, but I think that if you had them set up to query a RADIUS server they could assign a PPP or SLIP address that was any address on the network the Portmaster was plugged into.

I know we used 'em that when Chattanooga On-line was small, but that may have been when it was small enough that each phone bank talked to its own Portmaster. I left for the west coast before it had grown much bigger than that.

Not that it makes tying all of those technologies together any less impressive. These days we spend so much time working around limitations on technology that has been sold to the higher-ups that we don't get to do much really cool ad-hoc stuff.
From: [identity profile] zonereyrie.livejournal.com
Yes, RADIUS can assign an IP address and the PM will use it. But it does have to be within the routing domain for that PM or the traffic isn't going anywhere, of course. (I worked for Livingston 95-98.)

Date: 2007-12-04 07:07 pm (UTC)
From: [identity profile] elfs.livejournal.com
The only things we contributed to the RADIUS reference server were quite a few bugfixes and a handler that send the request into a shared memory queue, and that then listened on the queue for a poke from some backend authentication server.

If your company used the reference server it's entirely possible you got our bugfixes, but the queues were pretty implementation (and Solaris) specific, and one thing we did not have to publish was our authentication server (the thing I wrote that dialogued with Oracle) since it wasn't a part of the RADIUS server (in the same way that a browser is not part of Apache).

Still, I wouldn't be surprised if someone figured out other ways of using them.

Date: 2007-12-04 09:42 pm (UTC)
From: [identity profile] doodlesthegreat.livejournal.com
The business lessons of Xerox/PARC continue to be ignored by the monkeysuits and always will.

Let InfoWorld buy you dinner

Date: 2007-12-04 10:21 pm (UTC)
From: [identity profile] ideaphile.livejournal.com
There's $25 waiting for this story here:

http://weblog.infoworld.com/offtherecord/

. png

Date: 2007-12-05 01:50 am (UTC)
From: [identity profile] gromm.livejournal.com
You wouldn't have anything to do with this section of the Wikipedia article on CIS, would you? :)

http://en.wikipedia.org/wiki/CompuServe#WOW.21_and_the_decline

Date: 2007-12-05 03:52 am (UTC)
From: [identity profile] elfs.livejournal.com
Nope. Gods save me, though, I still have a WOW! for Kids! mousepad.

Date: 2007-12-05 05:54 am (UTC)
From: [identity profile] zonereyrie.livejournal.com
That was probably the Livingston RADIUS server implementation, since Livingston created RADIUS and published our server code as the reference. :-)

And I think I know which NT RADIUS server you mean - the NT version was just 'RadiusNT' and the UNIX port was 'RadiusX'.

Date: 2007-12-05 03:48 pm (UTC)
From: [identity profile] elfs.livejournal.com
Yes, that was it, Livingston. That does sound familiar. I don't remember the NT Radius version.

Anyway, I don't see the shared memory queue anywhere. We put it on an FTP site like the law required; I guess nobody ever downloaded it.

Actually, that begs a question: if you make an in-house mod for a GPL's product, and you put the mod on your FTP site as required by law, how long after you've stopped using the mod and have gone out of business are you obliged to keep the mod available to the public?

Date: 2007-12-05 09:56 pm (UTC)
From: [identity profile] zonereyrie.livejournal.com
That's a good question. Similarly, if you EOL a product - how long do you need to continue providing the mods? You no longer sell it or use the code, but the products are still out there.

not the point of the story, but. . .

Date: 2007-12-05 05:56 am (UTC)
walkitout: (Default)
From: [personal profile] walkitout
It's interesting after all these years to get some confirmation that Stuart was, at least in part, a reasonable guy. He seemed like he might be, but at the time, I wasn't interested in sticking around to find out.

Re: not the point of the story, but. . .

Date: 2007-12-05 06:26 am (UTC)
From: [identity profile] elfs.livejournal.com
The one thing I liked best about Stuart was that he knew how to get 110% out of his people. He absolutely knew, better than they did sometimes, what they were capable of achieving and when to get out of the way and let them do it.

He just never seemed to have the luck of landing a project worth actually doing. Baywatch, Cobalt, Moe's Cafe, some WebTV clone you've never heard of, some kiosk system for German rail that never went anywhere... the list of projects that were just silly and never launched was just endless at CIS. (Yes, I know, Baywatch was huge as a TV show, and the Cobalt Group is huge now, but their web properties seemed to do so much better after they got out from under CompuServe's umbrella.) Brad and I were the first people in the US to put Virtual IPs into commercial use. I wrote a moderated web-based chat room. We invented the process of exposing a database to the web in 1994, did absolutely nothing with it. (It was a bizarrely hacked oraperl instance. I remember it had to be compiled with GCC, but linked with the SunOS linker.) If we'd patented "A means of generating SQL queries and rendering meaningful SQL responses from a relational database via HTTP and HTML Forms." The number of things we created at CIS and failed to monentize properly seems endless.

He did give me MissingKids.org to build, which was definitely worthwhile. At the end of that year, what did I get recognition for? Baywatch. CIS had some very f*ck'd up priorities.

Now and then I meet someone working at Cobalt. I enjoy telling them that their entire business used to be a Sparc5 under my desk that I kicked once in a while.

Date: 2007-12-05 09:37 am (UTC)
From: [identity profile] ewhac.livejournal.com
So, what you can tell me about Kerberos?

I've accidentally bumped into the wonders of Kerberos at work, and have been mucking around trying to get it to work. NVIDIA is a PC-centric organization, which makes life for the Mac developers tougher. All the PCs have single sign-on via NTLM, but the Macs just flap in the breeze, with everyone typing in their credentials every five minutes.

I happened to discover that one of the LDAP tools that ships with Mac OS X will transparently authenticate via Kerberos. After futzing around for about an hour with 'kinit' and friends, I got an LDAP query through without having to enter a password.

Then I found out that Firefox can also authenticate via Kerberos, and excitedly plowed straight into a brick wall of failure. I couldn't get it to work. Oh well...

Date: 2007-12-21 08:04 pm (UTC)
From: [identity profile] elfs.livejournal.com
Not a whole heck of a lot, I'm afraid. The braincells allocated to that task have long since been repurposed to other things. (Sadly, I was in the gym the other day and Adam Ant's "Good Little Two Shoes" came on the PA, and I found I could still sing along; it would be nice if those brain cells found something better to do, as well!)

WTF? Please??

Date: 2007-12-05 03:14 pm (UTC)
From: [identity profile] pendorbound.livejournal.com
You simply *must* post this to WTF (http://worsethanfailure.com/).

My co-workers should read this, but many of them are far square-er than I'd want to point at your blog. Not that there's anything wrong with your slightly off-square blog, of course. =)

Date: 2007-12-06 10:55 am (UTC)
ext_113512: (Default)
From: [identity profile] halloranelder.livejournal.com
To me this looks like perfect fodder for the Worse Than Failure!

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 5th, 2026 12:47 pm
Powered by Dreamwidth Studios