Whenever I read about the latest vulnerability in a popular WCMS, I wonder why static HTML export still doesn't seem to be a prioritized feature in popular systems.
After all, most sites out there probably don't need server-side dynamic preprocessing for every request. The CMS directory could be locked using HTTP auth (implemented by the HTTP server); this way, not every little CMS bug would allow the world to compromise the whole server.
Do we really expect every parish choir with a web site to hire a CMS specialist who installs updates within hours of the release and fixes all compatibility quirks that occur with new major releases? This is an unworldly approach that bestows thousands of zombie machines on us.
And what happens if the CMS for some old site stops being maintained? A responsible admin would shut the site down, resulting in a loss of potentially valuable information. This issue would be solved by using static HTML export, too.
Are there any well-maintained open-source CMS out there where static HTML export is an integral part of the architecture, ideally with good usability and written in PHP (not that I like the language, but that's what is available everywhere)? (I'm not talking about command line static site generators without a user-friendly backend - those are only an option for techies.)
I was just thinking the same thing. Elsewhere, someone asked how to handle "build it and forget it" jobs of the type handled by small web contractors all the time. Drupal is a really popular engine for those kinds of contract apps.
At this point, if I was advising someone building sites for small businesses and the like, my advice would be:
Can you use a hosted service? Wordpress.net, Shopify, that kind of thing? If you can make that work, that should be your first and only choice. If not... are you really sure that you can't use a hosted service? Okay, then build a site that does static HTML for end users, with a carefully locked-down admin section. On a different domain, if you can.
If you need dynamic features for users (a shopping cart, etc.), again, see if you can integrate a hosted service with your static site. If the reason you can't use a hosted service is because your client is too cheap to pay for it, do not take that client.
Otherwise, if your client absolutely needs bespoke, dynamic features for their end users, and absolutely no hosted service will work for them, they need to invest in a support contract, and you need to tell them up front that they'll have to do that. There are actually contractors out there that do long-term support for other people's apps, if you don't want to be saddled with it yourself.
Actually this is one of the core features of the Ruby CMS I've been working on for the last few years http://spontaneous.io
It's template engine supports two tag types: one that gets run during the publish stage & one that get's run at the request stage.
It can then take an intelligent approach to publishing by having a publish step that renders each page to a static file. It's then trivial to test for the existence of 'run at request' tags and position the generated template accordingly: pure static files get put where a reverse proxy (nginx by preference) can see them, anything with per-request tags gets put in a 'private' directory out of nginx's paths for rendering & delivery by some front-end server.
This has many advantages, including:
- static pages can be served directly by Nginx
- the public facing site runs in a separate process that consumes the generated templates
- the public app (if needed at all) has no admin-centric code so has a smaller (& separate) attack surface
- the publish step can be seen as a form of code generation so you could, in theory, publish a dynamic PHP powered site from this Ruby CMS
I'm also gradually working towards abstracting the template 'filesystem', currently you can render to the disk, redis, memcache or any other key-value store supported by the Moneta gem[1].
For the developer it gives the power of static site generators but provides a simple & usable editing interface for content editors.
It's a little more complex when the site is not 100% static. Even a contact form requires a server.
But I do think there should be a good separation between html and admin backend. Security is only one reason. There are other very important reasons:
1. waste of resources. the machine that builds the html from the cms is completely idle for 99% of the time.
2. scalability. the html should be served from a storage server like S3 together with a CDN. there should be absolutely no downtime in viewing html as a result of overload.
the ideal system for small websites is a machine that is turned off by default, and when an admin needs to change something it is turned on (even if it takes a whole minute). After the changes are committed, the system creates html and sends to S3. For forms, comments, and dynamic things it's best to use third-party (like facebook comments and a billion forms services), or use a different small machine that captures user input (completely separate from the turned-off admin machine).
> A separate machine/VM is better from an availability perspective.
How so? A separate machine is one more thing that can fail and since it isn't web facing it won't help with availability if the other one fails. And if it is a VM they will both go down if the underlying hardware fails.
Its true that a separate machine is one more thing that can fail, but if its purpose is so different, as with the "active CMS" - "static hosting service", then it becomes easier to create a replacement.
E.g. the frontend can be replicated (if needed), S3 can be used, while the backend CMS remains intact. Or the backend CMS can be implemented in a HA setup and the static hosting in the cloud.
Wagtail (our open-source Django CMS [1]) can generate a static HTML export of your site to the filesystem, Amazon S3 or Google App Engine, piggy-backing on the excellent django-medusa library [2]. This feature was commissioned by a security company who didn't want to risk an embarrassing exposure to new vulnerabilities in Drupal, Rails, Django etc.
Search is an issue for static HTML sites. Swiftype [3] and Algolia [4] look like solid options - has anyone used these in production?
Back-publishing 1,000's of articles is impractical and risky (for other reasons).
Would be curious as to why.
Also, while the pages clearly have different designs, it isn't immediately clear which is "better" (or that this betterness is due to its being dynamic versus static).
Funnily enough Drupal already has this as a contributed module, would imagine Wordpress has similar. But it's not a massively popular approach as a) it can lead to considerable complexity as an update to an atom of content may need to propagate to many parts of a site, and b) the whole web is moving towards contextual dynamic delivery.
> Funnily enough Drupal already has this as a contributed module, would imagine Wordpress has similar.
Sure, static export plug-ins do exist in plenty, but I don't think that core functionality like this should be implemented as a plug-in.
First, if it isn't enabled by default, most people won't bother to enable it. Second, who knows how long and how well the plug-in will be maintained? Third, if static export is an architectural afterthought, it will probably break unusual features and workflows.
> a) it can lead to considerable complexity as an update to an atom of content may need to propagate to many parts of a site
We shouldn't let perfect be the enemy of good. Even if some blog article from months ago won't get the updated shiny new sidebar, what's the matter? The New York Times still have content from the web stone age on their server [1] - it doesn't use the latest template, but isn't it more important that this content is still available? It probably wouldn't be had they used a dynamic CMS back in 2001.
> b) the whole web is moving towards contextual dynamic delivery.
I'm not talking about amazon.com. They can probably afford proper maintenance of their fully dynamic site.
As the creator of a static export module for Drupal, and a maintainer of another, I agree that in the end if it's not built into the core, it's not going to have much long-term viability. That said, D8 is built on services, which makes an interface for static export available basically out of the box.
> b) the whole web is moving towards contextual dynamic delivery.
But not using server side dynamism like Drupal provides. It is moving towards sites serving up static files consisting mostly of Javascript that then call data services, so called single page sites.
In so-called single page sites, the data services are the server side dynamism. The only difference is that you've moved the template engine client-side, and replaced one call to the server-side application with multiple calls.
SQL injection attacks are just as possible against those kinds of apps, although slightly tougher to configure because you need to read client-side JS to find the service URLs, tokens, etc.
Drupal is used to power client-side JS-based sites. The current Tonight Show website is a recent example.
Having moved a large media co from bespoke systems which worked this way, I couldn't disagree more. Separate frontend and backend systems are an absolute maintenance nightmare and kill time to market which is the lifeblood of any online commercial endeavour. Apart from being extremely dated (c. Vignette) that architecture is a technical conceit which doesn't put business needs first. It is easy to build and easy to secure but that doesn't make it a useful longterm solution.
I doubt any CMS will adopt this into its core, since there are so many approaches to chose from, all yet unclear in benefits. These community solutions should perhaps be marketed better, and if there is indeed a need for this functionality, the market should prove that.
There is at least one open source WCMS that uses static export at its core: http://openengine.de/ (site is in German, but docs are in English) - unfortunately, it has been unmaintained since 2010. It will export .htm or .php files depending on whether you need pre-processing on every request.
As far as I know, though, Publicis still use a closed source version of openEngine (openEngine Corporate ASP) for two clients: http://www.siemens.com/ (pretty big fish) and http://www.man-finance.de/ (search for "openEngine" in the source code).
Do we really expect every parish choir with a website to configure server file and directory permissions and turn on HTTP authentication? Seems to me that static site generation is only going to be more secure if the people setting it up know what they're doing.
Probably the best approach these days for a parish choir website is a hosted solution like Wordpress.com, Squarespace, Google Sites, or if we're talking Drupal specifically, Acquia Cloud Site Factory, or Pantheon.
I've been using Stacey for that purpose. It doesn't exactly export HTML but the surface area of attack is very reduced as it doesn't interact too much with the outside world: http://www.staceyapp.com/
I tried Stacey but had a hard time doing a few crucial things:
1) reordering posts -- posts are displayed in _alphabetical_ order, as are directories, so the suggested nomenclature is a numerical prefix before every file name. So if I wanted to rearrange things, I had to decide whether not a linear shift was worth the benefits of file name readability.
2) the order of posts is reversed! I want new blog posts to show up at the top of my feed, but Stacey puts higher numbers underneath. I didn't want to count backwards, so I looked around and found a php hack for reversing this.
This might be fun if you're hankering for some DIY PHP, but I wouldn't recommend it otherwise.
Stacey is like Kirby without a web interface. You manage content by dropping text files into directories. It's not inelegant. Admin interfaces can be just as poor for representing content as anything else (see: TinyMCE/WordPress) so once you get a client used to the workflow it isn't much of a hassle.
It's not like you're forcing them to build jekyll.
After all, most sites out there probably don't need server-side dynamic preprocessing for every request. The CMS directory could be locked using HTTP auth (implemented by the HTTP server); this way, not every little CMS bug would allow the world to compromise the whole server.
Do we really expect every parish choir with a web site to hire a CMS specialist who installs updates within hours of the release and fixes all compatibility quirks that occur with new major releases? This is an unworldly approach that bestows thousands of zombie machines on us.
And what happens if the CMS for some old site stops being maintained? A responsible admin would shut the site down, resulting in a loss of potentially valuable information. This issue would be solved by using static HTML export, too.
Are there any well-maintained open-source CMS out there where static HTML export is an integral part of the architecture, ideally with good usability and written in PHP (not that I like the language, but that's what is available everywhere)? (I'm not talking about command line static site generators without a user-friendly backend - those are only an option for techies.)