thefoundationhttp://www.thefoundation.de2010-08-10T18:46:36Z(c) 2010 Michael Kurze, Aachen, GermanySites for Mozilla Input2010-08-10T18:46:36ZMichael Kurzehttp://www.thefoundation.de/about/michaelsites-mozilla-input<p>As a side project during my internship at Mozilla, I <a href="http://aakash.doesthings.com/2010/08/10/firefox-input-1-6-2-is-released-more-malory/">worked with Aakash</a> from Mozilla QA to bring <a href="http://input.mozilla.com/sites" title="Input Dashboard: Sites">a new feature</a> to the Mozilla Input website.</p><p> Oftentimes when users have trouble with a Firefox beta, there is not actually a bug in the beta, but a problem with a specific website (such as broken <a href="http://www.anybrowser.org/campaign/" title="Good old anybrowser website, unfortunately still an issue">user agent detection</a>). Even when a problem is related to Firefox, it can be very helpful for QA to see what sites trouble our users the most, and what issues the users face there. </p> <h3>Enter clustering…</h3> <p> To group sentiment by topic, my fellow metrics intern Andres and I made use of Dave Dash’s <a href="http://github.com/davedash/textcluster" title="Textcluster on github">clustering algorithm</a>, which uses techniques from the search engine world to group related input. That helps to get a quick impression on what’s going on when a site is causing trouble for many users. We also get a lot of positive feedback on sites where the user experience has improved for beta users compared to the release version. </p> <h3>…and Django of course!</h3> <p> It was very cool to do something with Django again. The webdev team is very knowledgeable in this area so I learned a lot working with <a href="http://fredericiana.com/" title="Fred Wenzel’s blog">Fred</a> and <a href="http://davedash.com/" title="Dave Dash’s site">Dave</a>. There are some limitations (you <em>still</em> <a href="http://blog.affien.com/archives/2009/05/30/django-annoyances-no-reverse-select_related/" title="Django annoyances — no reverse select related">cannot prefetch related objects</a> along the inverse edge of a one-to-many relationship, like with any sensible ORM), but other than that Django has become a pretty solid toolkit. Also I finally got started with Git, which is as of now my version control system of choice. </p> <p> Hopefully my main project will allow me time to improve Input and the dashboard in the future, there’s a lot of cool stuff planned with it. </p>And Apos Semicolon: A Cathapostrophe2010-03-25T16:32:32ZMichael Kurzehttp://www.thefoundation.de/about/michaeland-apos-semicolon-cathapostrophe<p>This morning on Facebook syndication, I reviewed the <a href="http://www.thefoundation.de/michael/2010/mar/24/thoughts-on-android-platform/" title="Thoughts on the Android">article on android</a> that I wrote yesterday. And one of the few HTML-incompatible XHTML-properties assaulted my eyes, impersonated by a bunch of entity references.</p><p>Specifically, I had escaped the <em>typewriter apostrophe (&#x0027;)</em> using named entity reference syntax (&amp;apos;). Unfortunately, I had forgotten that &#x2014; while this entity is defined by XHTML 1.0 &#x2014; it is actually illegal in plain ol&#x2019; HTML. This should not have been a problem, as these pages are served using the XHTML 1.0 doctype where &amp;apos; points to the Unicode code point 0x27, so that you can use single quotes to delimit attributes. </p> <p>The Django RSS framework however would put a plain &quot;html&quot; content-type into the Atom-Feed, so the references to the apostrophe remained unresolved when the Feed readers converted my contents for display. Instead, they correctly escaped the ampersand, which led to a lot of ugly entity references on my facebook feed.</p> <p>So for now I am going to reference the apostrophe using the Unicode code point reference &amp;#x2019; (<em>punctuation apostrophe: &#x2019;</em>) which is actually recommended over the ASCII-compatible &amp;#x0027; (<em>typewriter apostrophe: &#x0027;</em>). Strictly speaking, I would not even need to use any entityref here, as 0x2019 is not XML syntax. Next I need to figure out if there is a way to configure the <a href="http://docs.djangoproject.com/en/dev/ref/contrib/syndication/" title="The Django syndication feeds framework">Django feeds framework</a> to use XHTML as a content type for Atom feeds and to check if the results are real-world-compatible.</p> <p>But really, this just shows once more that it is absolutely inhumane to edit XHTML by hand. So I&#x2019;ll be looking for a suitable <a href="http://en.wikipedia.org/wiki/WYSIWYM#In_web_environments" title="What you see is what you mean">WYSIWYM</a> editor to maybe handle this stuff.</p>Nobody Expects The Production Problems2008-09-21T00:08:05ZMichael Kurzehttp://www.thefoundation.de/about/michaelnobody-expects-the-production-problems<p>Sometimes you need to track down problems in a production setup. How the combination of Django and Flup make this difficult, and why I think that both should provide more than <q>log by mail</q>.</p><h2>Our Setup</h2> <p class="annotation right"><q>Our chief weapon is development!</q> <cite><a href="http://people.csail.mit.edu/paulfitz/spanish/script.html" title="The Spanish Inquisition">&mdash; Cardinal Ximinez of Spain</a></cite> </p> <p> To the developer, the <a href="http://www.djangoproject.org" title="Django Project Site">Django</a> web development Framework offers a set of debugging helpers such as the verbose error page with interactive traceback and the management commands. The automatic reloading of python modules allows you to instantaneously see changes to your application, without even losing running sessions. If that is not enough for you, you might be interested in the powerful <a href="http://code.google.com/p/django-command-extensions/" title="Django Command Extensions, Project Site">Django Command Extensions</a>. The <a href="http://rob.cogit8.org/blog/2008/Sep/19/introducing-django-debug-toolbar/" title="Rob Hudson: Introducing the Django Debug Toolbar">Django Debug Toolbar</a> also looks promising yet rather unfinished. I might come back to that another time. </p> <h2>The Other Setup</h2> <p class="annotation right"> <q>... and production! Our two weapons are development and production!</q> </p> <p> Web applications written in Django are also highly portable: Development can take place on a different <abbr title="Operating System">OS</abbr> than production will, using a different version of Python and different database and HTTP systems. With respect to debugging, these alternatives are equalized by the abstraction layers of the Python and Django stack &mdash; and fairly well. Once you have sorted out the do's and dont's of your <abbr title="Relational Database Management System">RDBMS</abbr>, there is the decision on how to handle HTTP requests. Django <a title="Deploying Django | Django Documentation" href="http://docs.djangoproject.com/en/dev/howto/deployment/">recommends</a> using the Apache HTTP Server and <tt>mod_python</tt>. Another performant <a href="http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/" title="How to use Django with FastCGI, SCGI or AJP | Django Documentation">setup</a> that seems to be quite common is using a <a href="http://nginx.net/" title="nginx">modern</a> FastCGI-capable Web Server in combination with the <a href="http://trac.saddi.com/flup" title="flup Project">flup</a> FCGI/WSGI bridge. This is what I did. </p> <h4>Production: Turn Off the Debug Switch!</h4> <p> When preparing your production setup, you will want to look into your <code>settings.py</code>. Actually, you will probably have multiple settings files of different names in revision control. Then, you will create a symlink to the settings matching the current setup. In your production settings, you will turn off debugging. This replaces the verbose traceback pages with neat-looking <abbr title="Internal Server Error">500</abbr> views which will &mdash; hopefully &mdash; never see the light of day! And otherwise, you will receive a nice error report somehow, right? </p> <p> Of course you will probably encounter a production error in your site, be it a bunch of downloaded third party applications or your own magnificent creation. In my case, I soon had <a href="http://joseph.randomnetworks.com/archives/2005/08/05/postgresql-index-limitation-index-row-size-xxxxx-exceeds-btree-maximum-2713/" title="PostgreSQL Index Limitation">problems with an index</a> on a text column. </p> <h2>The Third Setup</h2> <p class="annotation right"> <q>... and ruthless stage testing! Our <em>three</em> weapons are development, production, and ruthless stage testing.</q> </p> <p> To catch such problems before they occur with your production setup, it is highly desirable to use a <em>stage setup</em>. This setup has its own database, usually a copy of your production database, updated on demand. The other settings should mirror your production setup. Run your staging application on the production machine or &mdash; if you can afford it &mdash; on a separate identical system and make it accessible to selected testers only. For example, you might make it listen to the local IP only and ssh-tunnel from your development machine there. On this machine, you can turn on debugging whenever needed. </p> <h3>Problem One: Logging by E-mail</h3> <p> Of course, there might still be problems that are discovered in your production setup. Unfortunately, out of the box Django reports errors only <a href="http://docs.djangoproject.com/en/dev/howto/error-reporting/" title="Error reporting via e-mail | Django Documentation">via e-mail</a>. To me, this has some critical drawbacks: </p> <ul class="block"> <li>You do not know if the error reporting works unless a problem occurs. In this case, you will only know that error reporting works <em>if it works</em>. Avoid this problem by enabling <abbr title="Resource not found">404</abbr> reporting and then testing an invalid URL!</li> <li>It is <em>not secure</em>! To my knowledge there is no PGP-support for the error mails sent by Django. The messages might contain sensitive data that should not be sent unencrypted.</li> <li>If your production setup does not work right away, you will need debugging information in fast iterations. Obtaining such information by mail is inconvenient as it is tiresome to link a browser-click to an e-mail that you might receive minutes later.</li> <li>Your machine might not be allowed to send mails! Many <abbr title="System Operators">sysops</abbr> put firewall rules in place to prevent machines in their domain from becoming spam robots. This is what I did on my production machine, so fortunately I knew in advance that debugging by mail would not work. I am not sure if every django admin is aware of this potential problem.</li> <li>If you host a high traffic site and an error occurs (for example if you database does not respond due to slashdotting effects or denial of service attacks), the volumes of e-mail it generates might be so high that the error reporting competes with your application on IP-connections, worsening your problems.</li> </ul> <h4>Solution: Exception-Logging Middleware</h4> <p> For my basic needs, I have written a simple <a href="/files/michael/python-modules/exception_handling.py" title="Exception Logging middleware">middleware</a> that should handle any exceptions raised by views in your production setup. Please note that this middleware relies on the <a href="http://code.activestate.com/recipes/444746/" title="ActiveState Code Recipe 444746">Exception Helpers</a> module written by <a href="http://www.targeted.org/" title="Dmitry Dvoinikov&squot;s Homepage">Dmitry Dvoinikov</a> to format tracebacks. The traceback information is written into the logfile specified by the (custom) <tt>LOG_FILE</tt> setting. If the <tt>LOG_LEVEL</tt> is at least as verbose as <tt>DEBUG</tt>, the request object is dumped as well. </p> <h3>Problem Two: Exceptions Outside of Views</h3> <p> We are now prepared for errors that might occur when running your views production. But there is a different class of exceptions that cannot be handled by our middleware because of their nature: </p> <p> On your Site, you probably want caching, <a href="http://en.wikipedia.org/wiki/HTTP_ETag" title="HTTP ETags">ETags</a> and transfer compression. So you enable the corresponding middleware, hopefully in the right <a href="http://phaedo.cx/archives/2007/07/26/django-middleware-order/" title="Django middleware order">order</a>, as <a href="http://effbot.org/zone/zone-django-notes.htm#middleware-order" title="Django Performance Observations">explained</a> by Fredrik Lundh. And you might just add some middleware from <a href="http://www.djangosnippets.org/tags/middleware/" title="Django Snippets: Tag &quot;Middleware&quot;">djangosnippets</a> or some of your own. Perhaps you want to prettyprint or to simplify your output, or to yield custom error pages for Ajax-Requests. </p> <h4>Solution: Enable Flup Traceback Pages</h4> <p> Any exceptions raised during middleware processing are not caught and handled by Django, but instead propagated up to the flup FastCGI handler. By default, Flup would present an error page like Django does in debugging mode, which is quite helpful (if not as beautiful). But Django <a href="http://code.djangoproject.com/changeset/4170" title="Django Trac &mdash; Changeset 4170">disables</a> this error page explicitly during startup. I filed a <a href="http://code.djangoproject.com/ticket/6610" title="Ticket #6610">patch</a> that allows to specify a debug flag to your FastCGI process to remedy this issue. </p> <h3>Summary</h3> <p class="annotation right"> <q><em>Amongst</em> our weaponry are such diverse elements as development, production, ruthless stage testing, almost fanatical verboseness to the log and nice red flup tracebacks, my oh my oh my.</q> </p> <p> To anticipate and tackle production bugs, the solution consists of these parts: </p> <ul> <li>separate staging and production setups</li> <li>use exception logging middleware</li> <li>enable flup error pages in the stage setup</li> </ul> <p> This is still not perfect. Ideally, flup would also log errors in production mode. Weirdly, flup also knows <em>log by e-mail</em>, and in this case it is not even configurable from your Django app! I will probably complement this article when I find an elegant way to fix this. </p> Still Cracks in Our Foundation2008-09-09T14:57:48ZMichael Kurzehttp://www.thefoundation.de/about/michaelstill-cracks-in-our-foundation<p>Welcome to thefoundation.de, <a title="about us" href="/about/">our</a> aggregate blog on media and internet technology. There are still many <q>cracks</q> to fill...</p><p> ...but in accordance with the recent <a href="http://www.djangoproject.com/weblog/2008/sep/03/1/" title="Django Weblog: Django 1.0 released!">release</a> of the Django web development framework, we decided to push our site out of the door as well. This blog is much of a spare time effort and all of us have enough other things on our hands, so that even Web 2.0 baseline commodities such as comments, trackback and pingback are still pending. </p> <p> And although I generally agree with the <a href="http://seeknuance.com/2008/02/04/django-blogs-vs-wordpressorg-vs-wordpresscom/" title="Django blogs vs. WordPress.org. vs. WordPress.com">notion</a> that one should not always reinvent the wheel, we chose to develop this blogging application by ourselves. The main reason is that it is an easy way to get started with Django and thus with one of the more recent <abbr title="Model, View, Controller">MVC</abbr> frameworks. Another is that we wanted to be able to define relations and interactions among our four journals without limitations. And of course, there are some really nice Django applications that are more easily integrated with your own glue. There is also a more conceptual bonus which might be regarded as a disadvantage by some: <em>Every design decision that has to be done, we have to do.</em> There are lots of <em>defaults</em> in modern publishing applications, leading to a confluence in style and functionality where there could be diversity. </p> <p> But, for that to happen here we still has some catching up to do. See you then! </p>