Wednesday
28
Dec 2005

Gettin’ Twisted

(9:54 am) Tags: [Software, Projects, FeedLounge]

Playing with Twisted for the past few days to see how it can help me with FeedLounge work.

Seems pretty straightforward to use, although you have to think a bit differently about the problems that you are trying to solve. FeedLounge was already written in such a way to do as much as possible in an asynchronous (background) fashion, so Twisted fits well with the backend design.

Side note: In playing with some of the bits of Twisted, the already excellent documentation still wasn’t enough. Googling for things would usually find the solution, but it would also find issues where the Twisted team was being “less than helpful”. When you get Ian Bicking frothed up enough to respond, you MUST be doing something wrong. Caveat coder. While this won’t stop me from using Twisted, it definetly doesn’t encourage me to participate in the community to any great extent.

With the current code, it was written to believe that there would only be one backend worker (KISS, and work your way up the complexity ladder). I was able to extend the worker to use threads to grow with us as we scaled, but once we launch live, we will absolutely need many machines performing these backend tasks.

To be able to do this, I essentially need a task queue structure that is outside of any worker process. Coming from the Java world, I would use JMS with a durable Queue as the task dispatcher. What to use in the Python world, though? After searching for many solutions, it seems as though people end up building their own one off message dispatchers for this type of task. I found quite a few options in the multicast arena, but none in the single message to one of N clients.

I have set up ActiveMQ, with a STOMP protocol adpater, and that is the task dispatcher for now. The problem with the STOMP protocol is that you subscribe, and then messages are delivered to you asynchronously, so you end up queuing the messages on each work client as well. Since different tasks take different amounts of time to complete, you have just failed because you are round-robining the message to all connected clients. So, I am using the STOMP API to send new tasks, and using the ActiveMQ servlet to take tasks from the queue, one at a time, synchronously. This way, load balancing will automatically happen, as the task workers only take tasks as they can work on them, and I can add more workers as the load increases.

In the future, it may be a custom Twisted server, with a Berkeley DB backend for some speed.

Does anyone have any ideas on sending many messages, making sure that only one message ever gets delivered to one and only one client? In the C or Python worlds? I would have expected something like this built on top of Spread or something similar.

Popularity: 24%

5 Responses to “Gettin’ Twisted”

  1. James Strachan Says:

    Note that if you are using a queue with stomp each message is only sent to a single stomp client; its not sent to every client.

    ActiveMQ dispatches messages using some dispatch policy (such as round-robin) up to the prefetch value for each consumer. So you could, if you want to ensure completely fair load balancing, configure each consumer with a prefetch value of 1 so that the broker will only send 1 message & wait for an acknowledgement before sending another to that client. This leads to very fair load balancing though under heavy load it tends to result in increased latency (as consumers often have to wait for messages to arrive).

    e.g. http://activemq.org/I+do+not+receive+messages+in+my+second+consumer

    You can configure the prefetch size using the header “activemq.prefetchSize” (using SVN HEAD of ActiveMQ at Apache)

    More details here…

    http://activemq.org/Stomp?refresh=1

  2. Scott Sanders Says:

    Thanks for the update James. So it looks as if I just need to extend the stomp python client to support sending headers, to get a pre-fetch of 1? Sounds easy enough (for me).

    Is there any way to use a pre-fetch of zero, and synchronously get() each item? My tasks are such that one may take 1 millisecond, and another may take 30 minutes. Have that pre-fetched item set and wait for 30 minutes is not a great option.

    I looked at the stomp protocol doc, and it didn’t mention anything that I recognized.

  3. James Strachan Says:

    We could introduce a new Stomp verb to PULL messages on demand; so when a client is ready to process a message it PULLs it, processes it and sends an ACK back again when its done? With a prefetch of zero we need some stomp verb that the client can use so that the broker knows when to send it a message (if one is available)

  4. James Strachan Says:

    BTW unless you have clients which mysteriously block or do strange things, a prefetch of 1 should be fine though

  5. Scott Sanders Says:

    I would personally prefer the PULL verb, as it makes much more sense in my task engine. With that said, I now have to find the time to scratch the itch :)

    Thanks for the input James, and good luck with incubation.