Friday
30
Dec 2005

Ignore a file in Subversion (svn ignore)

(11:13 pm) Tags: [Software, How do I...]

svn propset svn:ignore *.pyc dirname

or

svn propedit svn:ignore dirname

Popularity: 92%

Comments: (0)

Wiki with offline support?

(5:57 pm) Tags: [Software, Business Ideas]

Is there such a beast? Google seems to hear people talking about trying to find one, but I don’t see any to play with.

Perhaps something like subwiki could be made to just make local edits to checked out files, and commit occassionally? This would allow N instances of subwiki on the same content, but located on different machines (my laptop, for instance). You could utilize svn’s merge ability to make it all work together.

Just a random thought.

Maybe this is a project for someone…

Popularity: 44%

Comments: (3)
Wednesday
28
Dec 2005

Gettin’ Twisted

(9:54 am) Tags: [Software, Projects, FeedLounge]

Playing with Twisted for the past few days to see how it can help me with FeedLounge work.

Seems pretty straightforward to use, although you have to think a bit differently about the problems that you are trying to solve. FeedLounge was already written in such a way to do as much as possible in an asynchronous (background) fashion, so Twisted fits well with the backend design.

Side note: In playing with some of the bits of Twisted, the already excellent documentation still wasn’t enough. Googling for things would usually find the solution, but it would also find issues where the Twisted team was being “less than helpful”. When you get Ian Bicking frothed up enough to respond, you MUST be doing something wrong. Caveat coder. While this won’t stop me from using Twisted, it definetly doesn’t encourage me to participate in the community to any great extent.

With the current code, it was written to believe that there would only be one backend worker (KISS, and work your way up the complexity ladder). I was able to extend the worker to use threads to grow with us as we scaled, but once we launch live, we will absolutely need many machines performing these backend tasks.

To be able to do this, I essentially need a task queue structure that is outside of any worker process. Coming from the Java world, I would use JMS with a durable Queue as the task dispatcher. What to use in the Python world, though? After searching for many solutions, it seems as though people end up building their own one off message dispatchers for this type of task. I found quite a few options in the multicast arena, but none in the single message to one of N clients.

I have set up ActiveMQ, with a STOMP protocol adpater, and that is the task dispatcher for now. The problem with the STOMP protocol is that you subscribe, and then messages are delivered to you asynchronously, so you end up queuing the messages on each work client as well. Since different tasks take different amounts of time to complete, you have just failed because you are round-robining the message to all connected clients. So, I am using the STOMP API to send new tasks, and using the ActiveMQ servlet to take tasks from the queue, one at a time, synchronously. This way, load balancing will automatically happen, as the task workers only take tasks as they can work on them, and I can add more workers as the load increases.

In the future, it may be a custom Twisted server, with a Berkeley DB backend for some speed.

Does anyone have any ideas on sending many messages, making sure that only one message ever gets delivered to one and only one client? In the C or Python worlds? I would have expected something like this built on top of Spread or something similar.

Popularity: 48%

Comments: (5)
Monday
26
Dec 2005

CentOS 4.2 steps after install

(9:38 pm) Tags: [Software, How do I..., Sysadmin]

Popularity: 45%

Comments: (2)
Saturday
24
Dec 2005

This post brought to you by the letter ‘T’ and the number ‘1′

(12:54 am) Tags: [Rants]

(Sat on this post for a while, but I still feel the same, even after sleeping on it)

Couldn’t stand my excellently non-responsive wireless ISP, GV.NET. They are so over-subscribed in my area that during the day, may throughput is about 20Kb/sec. I should just dial up at that point (at least my Bluetooth Treo 650 gives me 34Kb/sec)!

Since GV.NET never responds when I call them for outages, or any other reason, I decided to not take the abuse of a monopoly anymore (monopoly to me in the cheap broadband area). Since I cannot get either DSL or cable modem, and satellite is NOT a choice for ssh sessions (with its latency), I ordered a T-1 line, and it was installed last Friday. Although I only get 1.5Mb/s (symetrical, of course), my ping times are 10 times better than the wireless.

Is it expensive? Yes.
What is it worth to me? Approaching a body part.
What would I change if I had it to do over again? Get it as soon as I moved.
Is it worth the cost to not live in traffic hell? More than 10 times!

And in case you missed it, if you are living in the GrassValley area, and have the ability to purchase GV.NET wireless internet service, don’t do it! You have been warned. Until they can actually return phone calls, and tell you their policies up front (nothing like a $250 bill for downloading some ISOs one day), they are not a valid choice.

Popularity: 34%

Comments: (1)
Friday
23
Dec 2005

Upgrading a machine from CentOS 3.4 to CentOS 4.2?

(3:16 pm) Tags: [Software, Rants, Sysadmin]

Just a short piece to say: don’t do it!

I just re-imaged a machine after fighting for 3 days to do it. I know you can upgrade from 3.4 to 4.0 with a bit of a windy road, but it is impossible to go straight from 3.4 to 4.2

I recommend (in valid order for my situation):

  1. Start over with a fresh 4.2 image
  2. Go from 3.4 to 4.0, then 4.0 to 4.2
  3. Stay in the 3 range, until you can do the first recommendation

Just my 2 cents.

Popularity: 40%

Comments: (3)

Updates to CentOS 4.2 for my environment

(8:50 am) Tags: [Software, Sysadmin]

Popularity: 35%

Comments: (0)
Thursday
22
Dec 2005

DIY and NIH

(12:11 am) Tags: [Software]

Found Eugene’s blog today, and have to completely agree with his post: DIY and NIH Syndromes.

Particularly:

In most cases DIY and NIH are not justified. If you see something like that you may assume that developers didn’t do their homework, which is not acceptable in the Internet era. In many cases this ignorance is combined with aggressive protection of irrational decisions.

and:

In my opinion the most productive contribution to software industry is to select one of high quality libraries and improve it. What about a talented lone coder? Even a genius will reach stars faster standing on shoulders of giants.

Right on, and oh yeah, subscribed.

Popularity: 27%

Comments: (0)

Upgrading Emacs to OS X 10.4

(12:11 am) Tags: [Software, How do I...]

I gave up on the code compiling ever again (tried for 3 months with latest from CVS), and just download the version from the Apple website here.

Popularity: 34%

Comments: (0)
Wednesday
21
Dec 2005

Eclipse comands respect

(4:34 pm) Tags: [Quotes]

Talking in IM with someone who is now hacking on and in Eclipse full time:

me: what’s happenin’?
him: I’m neck-deep in Eclipse guts. You?
me: How much do you love Eclipse now? ;)
him: I wouldn’t say love, but I definitely respect it. Kind of like a bear or lion. :)

Popularity: 23%

Comments: (0)

L1/L2 Cache object for django

(11:50 am) Tags: [Software, Projects]

I started hacking on an L1/L2 cache implementation for django, and since I won’t be able to finish it anytime soon, I am posting it here for someone else to pick up and use.

If you are using a backing cache such as _FileCache or _DBCache, it can be a might slow for frequently cached items. So sometimes a _LocMemCache is more your style. But you still want to share the cached data between multiple systems. Now you can have your cake and eat it too!

I tested with _LocMemCache as the L1, and both _MemCache and _FileCache as the backend. That way you could cache 500 items locally, and as many as you can handle remotely. I use it for frequently used configuration data right now, and it speeds up the logic quite well.

What is left to do? Well, if you wanted to use it in django, you would need to register it in the cache infrastructure, add a parser to create an L1 and L2 instance, etc. If anyone wants to put it into django, you are more than welcome.

class L1L2Cache:
    \"Thread-safe L1/L2 cache.\"
    def __init__(self, l1, l2):
        self._l1 = l1
        self._l2 = l2
        self._lock = RWLock()
	
    def get(self, key, default=None):
        #print \"get(%s)\" % key
        result = None
        self._lock.reader_enters()
        try:
            result = self._l1.get(key, default)
            if result == None:
                result = self._l2.get(key, default)
                if result:
                    self._l1.set(key, result, None)
                    return result
                else:
                    return default
            else:
                return result
        finally:
            self._lock.reader_leaves()
	
    def set(self, key, value, timeout=None):
        self._lock.writer_enters()
        try:
            self._l1.set(key, value, timeout)
            self._l2.set(key, value, timeout)
        finally:
            self._lock.writer_leaves()
	
    def delete(self, key):
        self._lock.writer_enters()
        try:
            self._l1.delete(key)
            self._l2.delete(key)
        finally:
            self._lock.writer_leaves()
	
    def get_many(self, keys):
        d = {}
        for k in keys:
            val = self.get(k)
            if val is not None:
                d[k] = val
        return d
	
    def has_key(self, key):
        return self.get(key) is not None

Update: You just have to love WordPress for esacping my quotes that don’t need it. Thanks WordPress!

Popularity: 29%

Comments: (0)
Tuesday
20
Dec 2005

And Another (MySQL to Postgres gotcha)

(10:01 pm) Tags: [Software, Rants, Projects]

Say you are writing some Python code, code that used to talk to MySQL as a backend. Say you are using autocommit, because you believe transactions are for academic weenies.

Now say you want to execute a set of updates to the database. And you don’t know which ones that have been executed before. In Python/MySQL land, you just execute each one in turn with a simple try/catch, so you can continue past failures such as duplicate keys, etc.

Now, suppose that you convert to Postgres as your backend. Now, you catch the exception, and continue on. The very next statement that you execute gives you this:

ProgrammingError: current transaction is aborted, commands ignored until end of transaction block

Now what do you do? Well, Google tells me the problem is not unique to Python, and sent some poor Ruby-on-Rails dev to the madhouse. He sent himself there because this error comes on the SQL statement AFTER the errored statement. I tried just close()ing the cursor, and creating a new one. No dice. Hmm….

Well, the solution for those dying to know is to abort() at the connection level, and then get yourself a new cursor() to continue on. Apparently this is a Postgres thing, not a driver or a Python thing, and I must say that it is brain dead STUPID. There, I said it. Why in the name of all that is good and holy am I required to hand rollback() an autocommit transaction? That has to be one of the stupidest things I have ever heard of.

Do I need some sort of magic ‘autorollback’ transaction? Pixie dust? What? I always assumed (perhaps totally incorrectly, but almost every other database on the planet is on my side here) that autocommit meant each statement was totally separate, and NOT dependent upon the success/failure state of the previous? Am I wrong?

Anyhoo, if you are using Postgres and autocommit, remember to rollback on any exception, as silly as it sounds, it does indeed work. This concludes this evenings rant.

Popularity: 30%

Comments: (0)
Monday
19
Dec 2005

Treo 650 contact search is nice

(10:48 am) Tags: [Why I like...]

In using my Treo 650, I always ended up typing in either the first or the last name, and inevitably scrolling through all the Sanders in my contact list to find one. I just found a shortcut that makes that easier. The Treo 650 can search by first initial, then last name. So, to search for Scott Sanders, one has only to type ssa on the phone screen to find me in the list quickly. Found this last week, and have been using it quite heavily.

Popularity: 19%

Comments: (4)
Sunday
18
Dec 2005

Another MySQL to Postgres gotcha

(6:58 pm) Tags: [Software, Projects]

Keep in mind that if you are switching from MySQL to Postgres, that Postgres is much more strict on what a valid SQL statement is. You CANNOT have an ORDER BY without the field existing in the query, whereas MySQL will just ignore it and continue.

Popularity: 89%

Comments: (4)

Switched to CentOS 4.2 as primary development desktop

(1:04 pm) Tags: [Software, Sysadmin]

Tired of the constant nagging of the Anti-Virus, and MS Excel locking up on virus scan, as well as VMWare disabling sound entirely, and finally generally poor performance of the default Windows install on Dell boxes, I decided to switch to Linux as my primary environment for about the 8th time in my life.

I decided to go with CentOS 4.2, as CentOS is what I have installed on various servers that I admin. I backed up all the important data off the Windows box, and then started the adventure. First, download the DVD image from a torrent, which necessitates a BitTorrent client. Install the client, start the download, work on something else while it takes 6 hours to download. Burn the DVD, drop it into the PC, and reboot.

Try the graphical installer, but watch it fail from either the ATI X600 or the Dell 2405FPW. Try the text installer, installing everything, and reboot. No dice. Seems my RAID controller isn’t GRUB happy. So toss the RAID0 array, and re-install. Finally get it to boot, but into X, which is not working.

Reboot ‘linux single’ this time, editing /etc/inittab to start in runlevel 3. Then download the ATI drivers (never buy ATI if you want to use Linux out of the box), and try about 6 different ways to get it to work. Finally just hack the x config to tell it to use the native resolution, and finally everything seems to be working.

Now, just setting up my dev environment (svn, emacs, etc).

Linux is SO not ready for the desktop, at least as long as Dell is the number one vendor, and they ship mostly ATI, and ATI doesn’t open up their drivers. Holy Crap!

And yes, Steve, I did consider Ubuntu, but it is still downloading from the torrent, so CentOS won on sheer download speed. I am actually fairly impressed now that everything is working. The fonts don’t look bad, tabbed terminals, full firewall :)

More updates later, as I settle in.

Popularity: 25%

Comments: (11)
Monday
5
Dec 2005

Scalability is not the only concern

(8:13 pm) Tags: [Software, FeedLounge]

UPDATE: Jeremy responds:

I guess the low-down is that if you’re a company that provides a service, you either need to be ready to scale or you need to be ready to limit access to your service. Users shouldn’t suffer. But if they do, at least communicate. Thankfully, that’s something FeedLounge does really, really, really well.

Agreed, agreed, agreed, agreed, thanks.

I guess they’re made of different stuff than most of the companies I deal with. Best of luck to you guys, and sorry my rant put you in the crosshairs :)

Alex and I are very heavily customer focused. We want to keep it that way.

ORIGINAL POST:

In his post entitled Web 2.0 Companies NEED To Scale, Jeremy Wright makes a few good points, and a few bad ones. I am glad he chose FeedLounge as an example, as it gives me more than enough reason to respond. :) The points he attempts to make are mostly valid, but not always applicable to the small, bootstrapped player.

I’m not sure when building a scaleable web app became optional. But Feedster, Technorati, Delicious, Google Analytics (and numerous other Google apps of late), BlogPulse and many of the other “big apps” have “suddenly” been hit by scaleability issues.

First, building anything is optional. Building an app, building a web app, building a scalable web app. All optional. You don’t need to do any one of the list. Even when you choose to build a web app, you will pick a target to scale to. FeedLounge chose 2 users as the initial ’scale to’ numbers, seeing if we could build enough functionality and a great user experience. We then released it to a few friends to see if they liked what we built. They did. When we were hit ’suddenly’ by scalability issues, we knew it would happen sometime and dealt with it accordingly.

He says:

Yeah. Here’s their process:
1. Start with a handful of users. This is too much for ded box.
2. Move to dedicated server.
3. Add a few more users til they’re at 100. This is too much for one box.
4. Add more hardware. It’s obvious this isn’t enough.
5. Recode.

Erm… Hello? Should the recoding have happened after step 1? I mean, if you draw a graph of “okay if we use 10% of a CPU with 10 users, with 100,000 users we’ll need 10K CPU’s” … Something’s wrong.

The FeedLounge development process was more along the lines of:

  1. Build a webapp, see if the features are compelling to a set of users, keeping a design in mind that is capable of scaling
  2. Overrun the shared server that you are using, switch to dedicated server, so you can properly measure the effects of the application.
  3. Add more users, adding requested features from the users, measuring the load in a fixed, known environment, and start work on “Distributed” part of ladder. The is where the build portion of the scalability starts.
  4. Now that you believe you have something that has value, invest in the hardware and software development necessary to scale. Continue working on priority based tasks towards release of your product.

The design of the application allows scalability/availability to be added as time and money allow. The ‘recoding’ has happened every step along the way. The focus was not and has not been on scalability. It has been on whether we can provide value to our user base. If we were to focus on what you deem important in this article from day one, a lot of people would be able to look at a horrible application, and no one would use it for any significant amount of time. Perhaps you think that FeedLounge has infinite pockets to dip into for hardware infrastructure and development talent? Hint: We don’t.

We are on step 4, and it is going slowly since our team is so small. It was much more important to show what user interaction we could build, and then worry about total performance and scalablility afterwards.

The business model also comes into play. If we were to choose to sell a software product instead of a service, the software we have will work fine for hundreds of users, no problems. Since we have finally chosen to go out as a web service, scalability to many thousands of users is a very important requirement, but only now.

Nothing is wrong. It is all choices that you make trying to start a company. Alex and I chose to focus on the user experience instead of scalability, and now we know that we have to scale. I knew going in to this venture that it would be a huge amount of data to move. I do have a great number of years experience making fast things faster in the software world. You mention that is “astounds you” as to what people define as scalable and available. No one has ever used those two words to define FeedLounge. Nor will they, until we have proven that we can. Ask any of our alpha users. They don’t stick around for the availability, they stick around for the features that they have been given. And yes, scalability is a feature, but not one that should be the major focus in incubation.

Maybe I’m just spoiled, having worked in high performance, high availability apps before, but it constantly astounds me what some folk consider “scaleable” and “available” applications.

Scaling of resources and time is also important in the real world. Expecting Alex and Scott to scale to the level of Google and Yahoo! (or even VC funded companies like feedster and technorati) is just silly. Once you look at what we have done with the resources given, I think your tone will be quite different.

At FeedLounge, we are taking a realistic reactive approach to optimization, versus a predictive, all-encompassing approach. We designed a platform that we knew we could scale in a distributed environment, identified the areas that needed to be refactored to scale, validated those with measurements, and then wrote the code to make it a reality. Remember, premature optimization is the root of all evil.

Popularity: 26%

Comments: (7)

FeedLounge Beta Date Announced

(9:39 am) Tags: [FeedLounge]

Alex and I have officially announced the beta release date of FeedLounge. Read it here.

It will be a huge flurry of activity in a push to the release. More code, more infrastructure, less sleep :)

Popularity: 19%

Comments: (0)