Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Things I Wish I Had Known About Django Development Before Starting My Company (medium.com/cs-math)
378 points by misiti3780 on April 17, 2013 | hide | past | favorite | 154 comments


12. Use `pip freeze` to keep a list of your requirements, and keep that requirements list in your repo. Use virtualenv and virtualenvwrapper to keep your environment clean from other environments. Virtualenvburrito (https://github.com/brainsik/virtualenv-burrito) will help you set it all up.

13. You almost never need to write bash scripts. Use django management commands to write scripts that interact with your application. Use fabric and a fabfile to do deploys. Use something like chef or puppet to do machine configuration. Your bin/ folder will turn into an unmaintainable mess very quickly.

14. Use south. Yes, it's obtuse, but it gets the job done.

15. Use a pre_save hook on all of your models to do full_clean for validation before anything goes to the database. This will save you from cleaning up your data later.

    from django.db.models.signals import pre_save

    def validate_model(sender, **kwargs):
        if 'raw' in kwargs and not kwargs['raw']:
            kwargs['instance'].full_clean()

    pre_save.connect(validate_model, dispatch_uid='validate_models')
16. If you're writing a javascript-heavy application, consider all of your options for static asset management before you get too deep. I've used django-compressor, django-pipeline, webassets, and django-gears. None of them are perfect solutions (there still isn't a sprockets-esque one-catch-all solution for django like in the rails world), so consider the pros and cons before you make a choice.

edited for formatting


Regarding 15 - if you want that behavior, you'd be better off with using your own model subclass:

  class ValidatingModel(Model):
    def save(self, *args, **kwargs):
      self.full_clean()
      super(ValidatingModel, self).save(*args, **kwargs)
Then inherit from that everywhere.

Signals have a not-insignificant overhead. They are most useful for tying together code that comes from different places. (If you want to do full validations on 3rd-party app models, then you might this signal approach, but it might also be a bug in the app that the hook isn't just there for you.)


> (there still isn't a sprockets-esque one-catch-all solution for django like in the rails world)

This is definitely a little frustrating. I'd kill for a solution that handles minification, versioning & can upload to S3/etc. You can pass things through boto to get to s3, but the rest requires a bit of fiddling.

I've seen a few solutions that come close, but then they have extremely odd versioning systems. I should just fork something so I can add an ISO8601 timestamp to my CSS & JS; which I like because having granular dates helps when debugging front-end issues.


> I'd kill for a solution

Would you code for one? I, too, am sorry that we don't have a thing like asset pipeline. I'm pretty sure it would be welcome...


What have you tried? Because staticfiles + django-storages + django-compressor sounds like it can handle that use case.


I'm using exactly that setup and it does work, it just took me many hours and over 100 lines of custom storages + settings code to get it working.

I had to solve issues with custom domains, HTTPS, gzip, separate media/static files buckets, and storages/compressor not playing nicely together in general.


I have been developing with rails for a couple years, though I am comfortable with python I've never touched django because I haven't had to and one thing I will say is that the rails asset pipeline documentation is something I constantly return to. The asset pipeline provides a lot of functionality but not a lot of simplicity or management of complexity. If what you wish you had from rails is the asset pipeline you are either either an expert with with rails api or you are seeing things greener on the other side of the fence.

I don't mean to insult the asset pipeline, it provides a lot. But it definitely doesn't save you hours. Out of the box the asset pipeline is great for all the things that come for free but if you are doing a lot of development in the framework you probably return to the asset pipeline documentation on a regular basis. And I consider time spent in documentation a negative compared to time finding your own solution if the API is not intuitive and you find yourself consistently returning to the docs about similar problems. And let me tell you, plenty of my fellow Rails devs have said to me they also regularly return to the asset pipeline docs.


#16 is one of the biggest pains we had to deal with but I think we now have a good solution for now.

We use brunch.io a node.js build tool. It handles concatenation and minification of CoffeeScript, Stylus and Handlebar templates for us but it can work with almost any front end technology out there. A cool thing is that we can use require within out CoffeeScript files and that allows us to better organize code into modules.

Brunch even provides file watching capability so when we save a stylus or CoffeeScript file it will update the browser without a reload.

In our setup we compile everything to our django static folder so when we collectstatic, files get uploaded to s3.

One thing that was a bit a tricky and something that we have to make better is cache expiration of files. For now we are appending static files with a few characters from the current commit hash (app.css?e2232 something like that), while this is pretty effective we've found some cache systems that ignore it. The next step is to rename files before uploading to s3 (app-e2232.css).

I agree that this might be a bit too complicated for most apps but in our case (getblimp.com) we have a pretty heavy JS app and we really need to take advantage of all the "better" front end tools available.

I will be willing to contribute to FOSS project to solve this issues once and for all on the django/python side. BTW I think that file watching is better than compiling on every reload during development. In our experience (many js files) request on the development side took seconds and became very frustrating.


> Virtualenvburrito will help you set it all up.

Is it just me, or is it a little absurd to have a wrapper for a wrapper to a Python module?


I think the author recognizes that as well, hence the name.


Just responding to #16, it may be best to manage your static/client assets using NodeJS with grunt... It's not in the django chain, but may be worth calling the external. Node seems to be at the leading edge of JS & CSS tooling.

It's worth noting that I'm a pretty big NodeJS fan and am even using grunt via node for building the client bits in my latest .Net project at work. It's pretty sweet, though some addons are broken since the new grunt version.


I've switched to this method as well and recently removed django-compressor. Better to have a tool that is used and improved by many, not just django people.

In the top level of my project I have a makefile so make assets runs grunt and collects static.

My grunt also runs jslint so it will halt if there are errors.


Would you mind giving a little more of an overview of how you set this up? I think this would be incredibly useful.


I'm going to try to throw together a blog post.

basically though, I've used requirejs for the js both for single page apps and for general js like plugins and galleries.

compass/sass for the css which does minification and concatenation already, and also ensures there is no illegal css

js is processed by requirejs into STATIC_DIR/r/

css is compiled into myproject/static/myproject/css and then uses django's staticfiles system to collect and deploy

grunt is what calls requirejs. it could also concat and minimize css but compass has already done it. it also has watch and does live reload so that my browser will reload the css and even the js when either of them are edited.

on my pages I use a tag:

{% vcss "nestseekers/css/front.css" %}

which renders a <link css tag with a ?v=HASH appended

and

{% r_url 'nestseekers/js/libs/dist/html5shiv.js' %}

for the js


Would you mind expanding on the issues you have with pipeline? I use pipeline at the moment and beyond a couple of small things (mainly working with vendor apps) I'm really happy with it, but I'm open to the idea that it's because of what I don't know than because it solves every problem.


Mostly with setup and configuration. Configuring it to work the way I expected required a lot of digging through documentation/code/blogs/fudging around. I don't like it when I have to do that, for a number of reasons -- a. I'm lazy and I don't like spending a lot of time doing upfront configuration, b. when something doesn't come with sensible defaults pre-configured, I assume I'll do something wrong and it'll have non-obvious but bad consequences.

I had this problem with django-compressor as well (but more with configuring 3rd party asset compilation). I usually recommend people use webassets via django-assets. It's easy to configure and very feature-ful.


I'd maybe also suggest pythonbrew since you win by getting the exact version of Python you want and virtualenv is packaged up nice with it.


pip freeze has always been horrible for me. I've been better off managing my requirements manually.


At least managing multiple files is a pain.

I've had to resort to

    $ vimdiff <(pip freeze) requirements/prod.txt
You can't just do "pip freeze > requirements.txt" like you see on every tutorial. I wish there was a better way to do this.

And I would also like to only display without dependencies. Say I "pip install X", which depends on Y and Z as well. Then later I want to delete X, so I just "pip uninstall X", but I will still have Y and Z.


> Django does not have a built in JSON HTTP response, so you are going to have to either man up and roll your own (good luck)

Am I missing something? What's wrong with:

   return HttpResponse(json.dumps(data), mimetype='application/json')
Wrap it up in a convenience function and you're done.

The JSONResponse class suggested automatically implements JSONP, which is extremely dangerous. Consider a view on /accounts/info which returns some information about the currently logged in user. A malicious site could embed

  <script src="http://example.com/accounts/info?callback=someFunction">
and access the account information of any user logged into your site. JSONP is a technique to bypass the same-origin policy in appropriate cases; don't just blindly apply it everywhere or you're giving up the protection of the policy.


might also be worth noting that django-tastypie is the defacto standard for REST apis, and sends and returns json (among many other serialization formats) very easily. This obviously doesn't work for all ajax cases, but its extremely useful nonetheless.


I would recommend Django Rest Framework. It gives you more fine grained control. We just use the serialisers for example.


json.dumps() can be dangerous if used on your raw domain data. You should specify the exact schema being sent down to the client so you don't accidentally leak something (this can happen very easily in Python)


Well, not dangerous so much as will fail with a "Model instance is not JSON serializable" message. So of course you'll need to construct the list/dictionary representation of your data manually. A good framework can help with that, but this isn't something that's solvable in the general case with just a response subclass without risking data leaks as you stated. (The other option in the original post makes this mistake, making both suggested options insecure)


yep, I build response objects ( my own term, not great but it describes what they are ) that are basic subsets of the object that I want to serialize to json. That way I'm sure only the fields that I really want to send are making it out.


Cool, I think a good JSONResponse implementation would bake that into the framework such that it's difficult to make the mistake you didn't make :)


I would guess complex objects - containers, or strange databasey stuff.

The way to deal with is __complex__ as a method on the object and recurse through asking the complex method to return nested simpler python types.


__complex__ is for converting to a complex number.


yeah, danellis is right. don't use __complex__ for that. it's for complex numbers:

http://docs.python.org/2/library/functions.html#complex --and-- http://docs.python.org/2/library/cmath.html


> Use Gunicorn instead of Apache for your webserver

This is strange advice; while you can use gunicorn as a front-facing webserver, the gunicorn docs strongly recommend against doing so. In a typical deployment scenario, then, gunicorn and Apache would occupy different levels of your stack, with one running your WSGI app and the other exposing it to the world. The advice ought probably to be "Use gunicorn+nginx instead of Apache+mod_wsgi," and indeed, lots of people do make that recommendation about Django deployment.


sorry - i did forget to mention nginx should be in front of everything managing all requests + serving static content.


updated article


nginx+uWSGI emperor mode is such a nice setup.


Did you ever try using Passenger (nginx) to run your WSGI app instead of Gunicorn+nginx? Seems like it would be even less of a hassle to run.

(I work at Phusion, and am not a Django guy, just curious)


if this question was for the author, my answer is no.


1. package your python code as a sdist (except setuptools is too hairy so nevermind), keep your deployment scripts & configs in a separate repo or orphan branch

3. use nginx to reverse-proxy WSGI to Gunicorn over a unix socket

6. don't have per-environment config in your app, use the same config in all environments and configure each host with aliases/proxies to consume the appropriate resources (use local_settings.py for local development on your workstation & exclude it from your package)

7. use native operating system service management (systemd, upstart, launchd)

10. use munin if your app can withstand a CPU spike every 5 minutes, use ganglia/graphite/collectd otherwise


I have to disagree strongly with your point 6.

I'm working on a large app at the moment where all hostnames are the same regardless of environment, and the only way to switch envs is via a proxy. It's a massive pain.

Keep environments mostly similar, but always make sure you can configure locations of the app itself and its various external resources separately on staging, test and development platforms.


> I'm working on a large app at the moment where all hostnames are the same regardless of environment, and the only way to switch envs is via a proxy. It's a massive pain.

One of our projects works like this, it sucks.

For per-host app location, we set X-HTTP-Script-Name in the nginx location block and extracted it in WSGI middleware like http://flask.pocoo.org/snippets/35/

Everything else is hardcoded to use 'http://localhost/service_name/foo & nginx proxies to the appropriate resource. Obviously, this doesn't work for non-HTTP services!


> 1. package your python code as a sdist (except setuptools is too hairy so nevermind), keep your deployment scripts & configs in a separate repo or orphan branch

This. I'm tired of Django projects not being packaged/released correctly, and deploy scripts that simply git clone in production (yiikes).


I'd take it one step further. Why do they have to be packaged as sdists at all? Why not bdist_egg?


bdist works too, but it adds one more level of annoyance (you need the same environment as the target). sdist is a good balance because you get a pip-installable package, so your deploy can just consist of creating a virtualenv in /opt/<your project> and installing a determined version of your project there.


I was under the impression that Django projects couldn't be deployed when they were bdists (because Django doesn't use pkg_resources).

I agree the annoyance wrt needing the same environment as the target. We tend to have two platforms that are supported: CentOS 5.x and MacOS. Keeping the build machine on the same platform as the deployment machines is simple. Creating the eggs for MacOS developers is more difficult, but still not too bad. That might seem odd, since we could just use pypi.python.org, but we have an internal PyPi server so that we can easily share internal libraries. Adding a line to a project's setup.cfg makes this trivial for the application developers.

There's another annoyance with sdists. I don't want to compile during deployments. So, I build everything that can possibly be built as an egg as one, and fall back to sdist for everything else. I push those to the internal PyPi server. At deploy time, I create a virtualenv and easy_install the appropriate artifacts. I know the correct artifacts because I `pip freeze` the requirements at build time.

We're also extra paranoid, so our stage and production VPCs are on different AWS accounts. We have one PyPi server per VPC and flow artifacts forward as needed.


Could you go into some more detail about what being 'packaged correctly' means? What's wrong with a git clone deployment, why is deploying as a package better, and how does one implement it correctly?


I love how I got downvoted for mentioning git clone deploys.

> What's wrong with a git clone deployment, why is deploying as a package better, (...)

The advantage is that you can have a release process, where you update the package's __version__, compile/minify the files you need, build documentation, and so on, until you have a deployable branch that you can tag and upload to the repository. This way you have a reproducible history of releases, it's easier to inspect which version is deployed, you have hooks for installation (for instance, you can abort installation if the tests fail on production), etc. Mainly this:

Crank out code -> Run a makefile/fabfile to update version/compile/minify/build whatever -> Export a tag -> Build an sdist/bdist -> Install on production

I believe too many things grew inside Django (e.g. collectstatic) that really shouldn't be part of the framework at all. Another thing that bothers me is South: you need to push a release to production, then run a migration, because the migration is part of the codebase. Well, the migration really should be part of the installation process. There are corner cases where this is an issue - for instance, worker processes reloading before your migration is complete would use the new, wrong model definitions, and suddenly you have a broken release on production.

For all effects, your Django project should just be a valid Python package that you can pip install in a virtualenv inside your server (lets say, /opt/<myproject>) and you're done with it. This way you can freeze the environment on production, pip can handle upgrade/downgrade, you don't have to care about *.pyc hell, etc.

> and how does one implement it correctly?

I should probably upload a project template with a workable setup.py to github.



Solid list! For most of those bullets I definitely had the 'Oh wow, should have done this sooner' moments. I'd add:

- use South (right away!)

- Class Based Views

- (not django specific) use virtualenvs!


Are class based views really any good for anything outside of CRUD? I've found the documentation (and, more importantly, rationale) lacking. Generally, when I try and use them, I spend more time figuring out how to customise than when I write "normal" views.


I should be specific. I never use the built in Django views. I'm sure they're great but I don't like that much magic and I moved to class based views later in a project.. those seems easier in a clean-room build.

I do, however use class based views that I've built myself. We extend them and add mixins and I much prefer all of this to the decorator soup that is the alternative.


Ah, fair enough, that makes more sense. Last time I tried to use the built-in ones, I just scrapped the project after two days and redid it with functions. It's not magic if it doesn't work!


There's definitely a learning curve, but CBVs will give you much DRYer views. Being able to use mixins is a godsend.


Am I the only one left who finds south more trouble than it is worth?


The only one? No, I'm sure not. In the vast minority? I'd suspect so :)

South is pretty obtuse to learn and definitely has it's warts and issues, which Andrew seems well aware of and wants to fix so check out the kickstarter link below!

That said, our project has over three hundred tables, a couple thousand migrations, and I've been very happy with it compared to the systems I've rolled myself in the past. I'm certainly unaware of an alternative that holds a candle to it, assuming Django's ORM.


South can be troublesome, but what's your alternative? never make changes to your schema? do it painstakingly by hand from the psql command line? just suck it up and use South, warts and all. its still really good, even though its not perfect.


...what's your alternative?

Write SQL and interact with your RDBMS directly?


You still need a system for tracking which changes have been applied to your database and for distributing the changes to the rest of your team.

Even if you don't use South's automatic migration generation, its database-independent schema modification api [1], dependency tracking, and tracking of which migrations have been run, are useful and necessary. In fact, the automatic generation of migrations is an optional feature of South that was not in it when first released.

When you need to write SQL and interact with your database directly, using `db.execute` inside a South migration is a nice way to do it.

[1]: http://south.readthedocs.org/en/latest/databaseapi.html


that is not a viable practice over the long run. the entire point of using a migrations tool is so you can migrate the schema and the data forwards and backwards automatically and consistently every time. attempting to do that with raw SQL scripts is a disaster waiting to happen. its so bad that the entire idea of a migration tool was invented to solve this problem.

so I ask again: what's your alternative (given that avoiding raw SQL scripts is the problem you're trying to solve)?


I'm getting sucked into an argument I don't really care about because I actually like South and use it in production for some things.

But I mean idempotent SQL takes care of the vast majority of all this stuff you mentioned. Write your SQL properly and it doesn't matter whether or not it's been applied before.


I don't know about postgres offhand, but there aren't idempotent solutions to some DDL operations in mysql. (alter table add column, off the top of my head)


Yeah in Postgres you can check for the existence of a column in a table. If it exists/doesn't exist you can drop/add the column. Obviously "ALTER TABLE x ADD COLUMN y" isn't idempotent, but the whole SQL statement around it can be idempotent.


I've been on a project that was doing exactly this. It's a giant PITA and we found ourselves slowly implementing something that looks a lot like South in order to mitigate all the problems we encountered. So we dropped this approach entirely and just used South. With a half decent migration strategy it's so much more robust than messing around with SQL. Unless there are specific edge cases that a migration tool can't deal with, use one.


Maybe if you're working on your own. If you're in a team (where each member has her own local DB) and multiple environments for testing, staging, preproduction etc. I'd say you're going to want some sort of scripted migration.


I came to Django & South after using ActiveRecord and Rails migrations for years, and it drove me nuts. Not trying to cause a big argument, but why the big push in the Python community to model database tables via object properties/fields? I seem to recall SqlAlchemy does something similar.


From 'The Zen of Python' - Explicit is better than implicit.

Automatically generating properties after querying the database is seen as unpythonic. PyLint and some IDEs/editors will not be happy if you try.


DataMapper 1 does the same. It does make sense if you're worried about changes to your database schema spontaneously breaking previously working code.


No, I get that part. I'm talking more about building a database from models. It seems like these tools are designed to build the database scheme from the object models, which to me is just as bad as building the objects from the schema.

Ideally, we should use the declarative style of Django models and SqlAlchemy on the object, along with the scheme generation tools of Rails migrations.


My workflow is to only add south to the project when required. To begin with I use a bash script which trashes and rebuilds the DB each time I wish to. Data which needs to survive this gets dumped to fixtures (using another bash script).


I totally agree with you. I just export the SQL and see what tables need to change.


I'm working on a tool for better schema migrations as well. Currently it only supports MySQL but more RDBMS's coming soon: http://devjoist.com


Good luck, but you may want to reconsider. Andrew Godwin (author of South and Django contributor) is building a new version directly into Django core.

http://www.kickstarter.com/projects/andrewgodwin/schema-migr...


DevJoist is language agnostic so I think there will be a need for it regardless of what happens with Python. I'm actually a backer of Andrew's kickstarter :)


don't be discouraged. a solid, open-source solution that is language/platform agnostic is valuable.


When I started using South, I was much in awe and loved its ease. But as time passed migrations became completed and south soon came to its knees.


I find that generally, when this happens it's due to a bad / non-existant migration policy (it's just as important as your branching strategy to keep migrations solid). Migrations should be able to rebuild an entire database from scratch. If it can't then the migrations have been screwed up.


i think it is the best solution out there right now ... someone had a kickstarter recently to replace it, but I can't seem to find that info via google right now ...


Andrew Godwin had that Kickstarter and he is the creator of South. Here is the link http://www.kickstarter.com/projects/andrewgodwin/schema-migr...


I believe you're thinking of Andrew Godwin's schema migration kickstarter: http://www.kickstarter.com/projects/andrewgodwin/schema-migr...


yep - that is what i was looking for - thanks


The article recommends using MongoDB as a primary data store, so South wouldn't be applicable (and that's specifically addressed).


Actually, I use South to manage data migrations (as opposed to schema migrations) in MongoDB. That way I can ensure that I've run the same data migrations on all of my dev and prod databases. Just a `./manage.py migrate` and every developer is caught up.


I don't know about class based views - each time I've tried to use them they've started out nice and convenient then slowly turned into an over-engineered mess as my requirements got more complicated. I think I prefer the control and simplicity of function-based views even if there's a bit more duplication.


I wish there was something like this for rails - well written, concise and keeps the newbies away from things that'll hurt them. In particular, some major pain points:

1. Use rbenv or rvm. Half the problems I notice with beginners stem from installing rails with apt-get or something.

2. Your controller is meant to be simple glue between model and view - if you're putting tons of code in there, you're doing it wrong.

3. Unit test. A lot. This is especially good for beginners because you often make little errors when you're starting out.

4. Unit testing isn't enough. Functional testing is important too.

5. Learn about your hosting provider. Specific to Heroku, if you're on the free plan, every dyno spin-up takes absolutely forever if you don't have a steady stream of visitors.


> 2. Your controller is meant to be simple glue between model and view - if you're putting tons of code in there, you're doing it wrong.

Yes, fat models are the way to go. This is just a good rule of thumb for any MVC framework. I feel like no framework docs actually explain this. Newbies end up shooting themselves in the foot because they don't understand the reason for separation or understand where logic should be implemented.


That's very debatable, and pretty rails-specific, IMHO. Conventional wisdom in other ecosystems is instead to have a business logic layer between your controller and your persistence layer. I've always found this to be good advice, as it makes the code more testable and makes reasoning about manipulating different models in the same operation easier.


The Rails models ARE the "business logic layer" between the controller and the persistence layer (the database itself).

I don't want to get into exactly what MVC should mean and whether any particular framework does it right though.


That's how they're traditionally used, but this does not mean doing something cleaner is impossible.


I started reading about Django recently so this is a noob question, but what's the controller in Django? All I've seen with with a lot of code is models, views, and templates. It's kind of annoying how tutorials talk about MVC but it doesn't seem like an MVC framework.

Also, if I want a user to input numbers to do calculations, where does the computationally heavy code go?


For most intents and purpose, "controllers" from so called MVC frameworks are referred to as "views" in Django.

The article that edavis mentions does make one good point, which is that the term "view" was chosen to emphasize the notion that the python callback function (or object) sets up a view of the data represented by the models. I think there's something good in this choice of words. I find the terms "view" and "template" more intuitively meaningful than the corresponding "controller" and "view".

One important, substantive difference that I'm aware of between controllers in many MVC frameworks (probably not all) and views in Django is that a lot of MVC frameworks map requests to controller actions through URL traversal, whereas Django offers a more hands-on approach with URL confs.

Also, whereas controllers are often implemented through classes in MVC frameworks, views in Django may be implemented through any callable object which accepts a request and returns a response.

As far as where your code should go, it probably depends on what kind of data the user is entering. If the data must be persisted in the database somehow, such as with transactions in a bank account, then the code that crunches the numbers would probably be in a method on a "BankAccount" or "Transaction" model class. If it's data that only really needs to persist for the user's individual browsing session, then it's probably safe to put the code that works on it in a view. Django limits what kinds of things you can do in its templates (thankfully), so you probably won't have much luck putting your computation code in a template.


Django is close to an MVC framework but not exactly an MVC framework. This link (https://docs.djangoproject.com/en/1.5/faq/general/#django-ap...) describes the difference better than I ever could.


I'm somewhat new to Django, and I'm just curious if Heroku is the best solution out there for Django? I've heard some good things about Gondor, and apparently it's more tailored for Django apps, but I'd be curious to know what the HN consensus is on Django hosting that won't collapse on you if you get HNed/Reddited.


Watch out for these Celery gotchas:

1. Tracking tasks status in SQL will result in a lot of queries! (Even when using Redis/RabbitMQ as brokers.)

2. Crontab style tasks are great... until they take a long time to complete and Celery kills them. And then you're back to regular crons (and there is nothing wrong with that).

3. Use UTC everywhere! New projects get this by default, but don't make this mistake.


My recc after a lot of heavy use is: RabbitMQ for the queue, Redis for the result store.

RabbitMQ for the result store is madness.


why not just use redis for both - just curious ?


Redis does OK as a queue, but not as well as rabbit.

For a start, rabbit handles OOM better. It has lower per-message overhead. It has a more flexible queueing model in general so that you can use it for both work queue-y stuff and other queuing needs. It's interesting how often queues seem handy once they are easy to use.

Then there's the fact that redis is good for lots of things other than queues, and you'll be tempted to use it for that, and that will crowd the queue use. If you hold a transaction on redis or do a large set intersection or run a lua script, you block everything else, including your fanned-out celery worker pool.


This is indeed a solid list.

The most comprehensive best-practices resource of Django I've come across by far is "Two Scoops of Django"[1] by "pydanny"[2]. Its absolutely worth the $17.

[1]: https://django.2scoops.org/ [2]: http://pydanny.com/


Correction: It's not by "pydanny", it's by "pydanny" and "audreyr". She just doesn't do the whole Twitter/HN/blog thing as much as me. Seriously, do you think I could write a section including code that discusses "the impossible condition of too much chocolate". ;-)

Seriously, between the two of us she's the better coder, taught me LaTeX, and knows Strunk and White. I just write what I think and let the grammar experts fix it. :P


Seconded. My investment in "Two Scoops of Django" has repaid itself many, many times over. Concrete "best practices" book.


An item I'd like to add to Python development in general is to use virtualenv. If you're doing multiple Django projects this is a must have.

One of those tools that you start off thinking isn't useful but quickly turns into a must have.


> Use Gunicorn instead of Apache for your webserver [...] This assumes NGINX is managing all incoming requests and serving static content.

Why not just go nginx + django the usgi way? Now you have friendly tutorials and docs for this like https://uwsgi.readthedocs.org/en/latest/tutorials/Django_and... . Maybe I'm weird or I've never worked on large enough apps, but I fail to get what the shiny unicorn and gunicorn bring useful tot the table...


> 10gen has added the aggregation framework, full-text search, collection-level locking, etc.

Mongo has collection-level locking? I thought there was some preliminary work to support it in 2.2 (It just does database-level locking for now), but it's still unsupported. Source: https://jira.mongodb.org/browse/SERVER-1240

Also, I wouldn't recommend using any of the aggregation stuff unless its for infrequent ad-hoc stuff or scheduled tasks during slow periods.


I use aggregation queries regularly and it's amazing. My MongoDB database has 4k ops/sec and I run an aggregation query to count averages, sums, and splits on a name (to get averages and sums for each of two dozen groups) for about 10K rows every 5 minutes. Not a huge amount, but this is on a db that is performing a ton of work. Also, I should mention that you should probably run your aggregation queries on your secondaries.


i agree - the aggregation framework seems solid - much better performance than map-reduce


You're right, it doesn't yet support collection-level locking (https://jira.mongodb.org/browse/SERVER-1240).


I don't see a monitoring system there? Something like Supervisor is great, but who watches the watcher? I prefer Nagios, but that's just because I'm used to it. Even it is better than nothing.

If you want something a bit more distributed (read less prone to single points of failure) than Supervisor, check out pacemaker. Very powerful and useful for keeping resources alive in any sized cluster.



I've been expecting to use Celery in a project I'm working on, could you explain why?


Celery is brittle. If you stop a celery service, odds are, it will either not stop at all or leave stray child processes running.

It is not transactional, at least not with a MongoDB back-end. If you stop a celery service, any tasks that were in progress may or may not complete, but none of them will be picked up again when you resume.

No prioritization of queues or tasks. It is possible to set up separate services that handle different queues, but that's a PITA and there's still no guaranteed order of processing.


Celery is a PITA and over-complicated for the problem it solves.


Do you know why pyres relies on itty? Seems like kind of an odd dependency...


The pyres web interface (where you can view job status) is built using itty.

edit: looks like they're using flask now, I don't see itty being used anywhere. They also split the web interface (resweb) into a separate project, so pyres doesn't depend on a web framework now.


Ah ok, thanks. I guess the documentation is a little outdated then. Thanks for the suggestion, I'll check pyres out when I get to that point in my project.


considering the infrastructure that has built up around celery and particularly its use with django, i dont know why you wouldn't- care to explain?


Pinterest replaced Celery and RabbitMQ [1] with their fork of pyres [2], presumably because it's 1/100th the codebase but does what they need to.

[1] http://highscalability.com/blog/2013/4/15/scaling-pinterest-...

[2] https://github.com/pinterest/pyres


This may sound strange, but when I first started doing python stuff with Django, I spent WAY too long trying to figure out the best project directory layout. To me, Django should generate a "pretty-good" directory layout, much like `rails new`.


My supreme thanks to the author for writing this and to all those who have submitted additional items to the list in the comments here. I'm writing my first few Django apps in series and have found this bits of advice to be very valuable.


I haven't used Jammit before, but I highly recommend checking out django-pipeline for the same purpose: https://github.com/cyberdelia/django-pipeline


Specifically for me, pipeline fits into django's collectstatic way of doing things. And because of this it works fantastically with django-storages as well. My projects are set up so that `python manage.py collectstatic` trawls all my apps static folders, compiles everything if required (r.js, sass) and uploads any changes to an S3 bucket. And in development it compiles over the wire (except r.js, we let requirejs do the async loading thing in development).


Ditto regarding everything up until your last sentence - it is a pretty great setup.

What do you mean it "compiles over the wire" in development? In development, pipeline just renders individual js/css tags for each of your static files, un-compiled...


That when we're developing, you edit a sass file, and when you request the CSS file it compiles the SASS to CSS before serving the file. It has been a little while since I set it all up but I think for the SASS example I've made a gist[1] showing the settings you need to make it work. If I recall correctly the key bit is using the .scss as the source file. Then, in production make sure to set `PIPELINE = True` to stop compiling per request.

[1]: https://gist.github.com/Bockit/5408958


Gotcha


your django app doesn't look like a django app.

the basic premise of the django defaults is that app modules are pluggable and self-contained. all of the static assets, templates, model code, view code, migration files, management commands, etc. related to a particular app should go in that app's folder (which is treated as a Python module with its own __init__.py file). conventionally the app's folder is a first child descendant of the top level project directory (i.e. its in the same directory as manage.py)


That's true about Django apps, but this post doesn't describe a Django app. It describes a Django project.


The main thing I wish I had known is that the ORM is rather limited (or, at any rate, it was too limited for my app), and while you can use raw SQL, it won't let you get model objects out of your own queries, which makes it impossible to integrate with the rest of your app that uses the ORM [1]. The first thing I'd do if I had to use Django again would be to eschew its ORM and use SQLAlchemy instead.

[1] This was a few versions of Django ago; if they fixed this since then, I welcome corrections.


Not sure when it was added, but is this what you were after? https://docs.djangoproject.com/en/1.5/topics/db/sql/#perform...


Yes, that looks like it should do the job. Thanks!


Thing #1 for me, by a long shot:

Deploy to Heroku (or similar). Saves a ton of headaches, saves boatloads of time/money, and for most startups in the early stages, is well worth the tradeoff.


I decided to dive into bootstrap.py/buildout for my current project. So far, it has done a reasonable job of keeping the directory structure clean as well as keeping dependencies under control. The nice thing is that there are recipes for most important things, and adding other new recipes (oh, I need to install and have access to coffeescript) are very straightforward.

I see it as a build step before fabric or puppet would take the output and deploy it.


I've used buildout for years but I'm now moving away from it. I've enjoyed it though, and I wouldn't recommend against it.

Several issues:

the buildout process is long and monolithic and not conducive to minor adjustments. I'm using ansible now and its much better for just changing one setting on an nginx config file or settings file and reloading. I chatted with the author of buildout and he said he was building to an rpm and then mounting that as his means to do a live deploy.

I often wished I had virtualenv, so many things work well with it. for instance python-vim and sublime lint / rope like to have a virtualenv to get the python paths. also ctags is happier if I can just enter the virtualenv and run it.

there may be a recipe for that but I never found one that worked nicely.


This list is somewhat dependent on the size and complexity of the project. For a small or simple project, many of these points might not matter.

For example, if Apache can handle your traffic just fine, why spend time replacing it with gunicorn? Or if you never really migrate your database ever, why waste time fiddling with South?

Just a gentle reminder to take into consideration the present and future needs of your project to avoid needlessly adding complexity to it.


I think he's actually saying gunicorn was simpler than Apache. I've also found that to be true (anecdotaly).


I like the ideas on directory structure. I've been working in Flask and I spent way too much time thinking about my directory structure.


You're also using Flask? I wrote a short piece about roughly the same subject as the author of the original article uses for Django. Running Flask behind gunicorn and nginx, monitored by supervisord:

http://www.michielovertoom.com/freebsd/flask-gunicorn-nginx-...


Yes, using Flask. I need to write why we chose it over Django....but mostly becasue we're using Mongo and once you're down the path of not using the Django ORM, what's the point? Also, it seems like there's dozens of articles on "how to structure" your large Django projects, so you don't even get good guidance about it out of the box.

I'm running mine under NGINX->uWSGI....but might be switching to Gunicorn. (I'm still researching our path to production).


I had to double check the title because I thought he said Django and all the advice seemed to match my Rails experience and not my Django one, also, this may be a bit nitpicky, but Django isn't really an MVC Framework nor is it a CMS. You can only really call it a web framework as it has elements of these items but isn't the same:

1. The right directory structure, the default use for the media folder is supposed to be using for stuff uploaded through your app, not the files you develop, your static files should go in static and your compressed files should go in media as they are dynamically generated or in your global "static" folder that is pointed to from your web server. You should also have different settings.py files on a per-situation basis. Your development servers, staging, and production environments will probably all have different settings because at the very least development will have `DEBUG = True` and production `DEBUG = False`.

2. This is fine even though it seems wonky to use it for cron jobs to me.

3. This is fine too but I personally use nginx+uwsgi w/ emperor mode.

4. Up to you on this one, there are some nice Django branches that support key-value store databases but don't use the standard Django with key-value databases because it is 100% built around RDBMS.

5. Only piece of advice I agree with completely.

6. As partially discussed in #1 by me. I don't like his override method. I think an import of a base settings into production.py and development.py is cleaner.

7. Supervisor is good here but also if you use uwsgi in emperor mode instead of the suggested gunicorn it can handle the same task saving you an extra install and configuration.

8. Django very clearly has a nice Mixin for JSON responses in the docs or you can build a nice easy API using Tastypie: https://docs.djangoproject.com/en/1.5/topics/class-based-vie...

9. It's up to you if you want to use Redis, I don't personally need it for all the suggested things and I like how well memcached works with Django out of the box for caching.

10. Munin is great! I'm often too lazy to set it up and am fine reading log files.

11. This drove me to write this long comment... I actually get really annoyed when people try to mix in items from other stacks when there are many solid solutions already that don't force you to install another entire stack. Django has django_compressor which works great in this situation (https://github.com/jezdez/django_compressor) and a quick Google search will find many other similar solutions that won't require you to install Ruby to work with your Python web app.

Source: 6 years of developing Django apps and doing everything that was suggested here and more.


The guy who wrote django-compressor fairly recently released a library to address the settings.py problem, django-configurations https://django-configurations.readthedocs.org/en/latest/

It uses classes to define your settings, and you can use inheritance to override and mixins to augment your bits of configuration. I usually have a "CommonSettings" class, with a few others extending it (Production, Development, etc.). At runtime the correct class is set based on an environment variable (you have to add a line or two to manage.py and wsgi.py to start the magic).

Anyway, it doesn't seem that popular but I've enjoyed using it (and done so without problem).


How is Django not an MVC framework? I'm not being snarky, just curious.


MVC means a ton of different things. Django itself does not use "MVC" to describe itself: https://docs.djangoproject.com/en/dev/faq/general/#django-ap...

> If you’re hungry for acronyms, you might say that Django is a “MTV” framework – that is, “model”, “template”, and “view.” That breakdown makes much more sense.


models -> models.py, controllers -> views.py, views -> templates,

right?


It's all semantics. As a long time django developer (4 years) I agree.


Just want to point out that MongoDB doesn't have full text search perse, but it can do nested array indexes... If you want stubbing/phoneticization you need to do it as part of your input, and part of your search logic... if you front your queries with a service then it is easy enough to do.


Actually, as of 2.4 there is an experimental text search feature built-in: http://docs.mongodb.org/manual/core/text-search/. While it is no Lucene (and is still considered experimental) it does provide simple tokenizing, stop-words and stemming.


I tried various asset management tools but prefer having grunt manage all frontend related things. Node projects seem to be better tailored for this. Especially stuff like linting and watching for code changes, plus having a js based config file for people that don't speak python.


nginx with uwsgi is also an excellent alternative imho


using a settings module, requirements directory and environmental variables makes alot of sense (outlined in two scoops of django at https://django.2scoops.org/, a good book)


"Use named URLs, reverse, and the url template tag": For Javascript URL handling use sth. like that https://github.com/version2/django-js-reverse


That is an impressively long lost of tools to pick up and get running professionally on top of being a web newbie. Hat tip to that man.


~ 2 years and a lot of pinot noir ....


solid list, I would also add that it is extremely useful (I would argue necessary) to use vagrant (http://www.vagrantup.com/) to manage your development environment.


this is actually on my todo list, i saw this video by zac holman a while back:

http://zachholman.com/screencast/vagranception/

and have been meaning to mess around with it ever since


Why does this page break "space" for paging down (in Firefox 20.0.1, Windows 7)?


Replace all comments with: Use web2py instead of django


Is mongodb a good choice for CRUD apps also?


depends on the actual app use. Simple in and out low traffic crud? It's probably not worth the extra effort. MySQL is still a great product that has and will continue to serve billions of web requests and just cause NoSQL is a new kid on the block doesn't mean that everything should use it.

That said if you are storing data that is really just documents, Mongo or another document db is likely a good choice. For example if you storing interrelated performance measurements (like in a factory setting ) Mongo would likely not be the best choice but say in a setting like an app for applying for a job or registering for events ( natural document data segments ) a document db would work well for modeling the domain I'd think.


It really depends on the nature of you application, but raw speaking... probably you'll regret using mongodb as your primary database at some point.


off-topic, this _Medium_ platform seems to mangle all outgoing urls…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: