Good! We need to ditch all the analytics software that rely on clients to make pointless HTTP requests. They are offloading their work on to the website visitors, wasting every user's bandwidth and slowing down internet connections worldwide.
If you want to know who is visiting your site, try reading your server logs.
Server logs are missing important information like screen resolution, navigator.language etc. Server logs also can't report on element events. Nor does it work with things like single page apps. How can developers make their apps better without having the proper analytics to do so?
We're dropping all third party domains at Userify [1] (plug: SSH key management software for EC2), but for reasons of both security and privacy.
What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?
I don't think that it'd be too hard for us to add a simple API call that pokes data about screen resolution, browser agent string, language, etc upon load or login, and it'll be far more efficient and private than us sending random data off to GA or similar where they frequently don't even provide us IP addresses of our own site visitors so that we can correlate the data against our own logs.
The data that GA gathers is highly valuable... to Google. They only provide you visibility of the tippy tip of the iceberg.. but ultimately it's your customers' and your data, not theirs.
Don't compromise your users with third-party includes, even Google Fonts (which is still our last holdout on the website.. hm, someone should make a simple web app that gathers names and styles of fonts and provides a zip w/ pre-generated CSS.)
CDN's sound great but they're a huge privacy hole. Ask yourself; what's the profit model? Are they really just an opportunity to gather valuable data on other people's websites and browsing habits? (yes).
Same here. I never really used third party domain software. I used Google Fonts for a while, but I find it pathetic to have some font files and some CSS loaded from a different server than mine. Google Fonts (and every CDN) can be a lot slower than my own server. Sometimes, the site hangs while "waiting for google.com". Silly.
Also, Google Fonts might be Google Analytics in disguise. Who knows.
I use Piwik for tracking, but the self-hosted version. I don't even use newsletter services. I bought a cheap newsletter plugin for Wordpress which I use as an autoresponder email course.
But delivery is all that matters they say. And yet, all Mailchimp and Aweber and whatnot goes 100% to my spam folder automatically. I believe the delivery argument is a myth.
The best part: Decision making is much easier. "So, your product can't be installed on my own server? Bad luck, I won't become your customer."
> I used Google Fonts for a while, but I find it pathetic to have some font files and some CSS loaded from a different server than mine
You can have the fonts and not have any requests leave your site by using something like this[1]. It downloads the Google font data so you can serve the font files and CSS from your own site.
In theory, the point of Google Fonts is that it does user-agent sniffing to adapt the font and css to the user's browser, to get the best rendering. You would lose that advantage by hosting the fonts yourself.
In practice, I'd set up the CSS such that modern browsers render it beautifully and give a crap about older browsers. Or make a CSS for older browsers without that font.
> What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?
Most of ours would. Subresource Integrity means that all of our Firefox and Chrome users would get mostly blank pages if our CDN tried to pull anything. It's really hard to justify dropping our CDN when they do so much for our load times for people outside of the US (where our servers are located).
> Ask yourself; what's the profit model?
Well, we pay them, so I kinda thought it was obvious. I suppose they could be selling data as well, but it doesn't seem like a great strategy to endanger so much of their userbase when there's already a clear and profitable monetization model.
> What would happen to your website (or millions of websites) if one of the CDN's that you rely on started quietly issuing evil code to a few, targeted users?
What would happen if EC2 started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?
What would happen if Digital Ocean started quietly issuing evil code to a few, targeted users? Would you notice? Would your users?
> hm, someone should make a simple web app that gathers names and styles of fonts and provides a zip w/ pre-generated CSS
I used http://www.localfont.com to retrieve the Open Sans font I was previously using from google font. Maybe that's the kind of webapp you are looking for.
What about application performance monitoring? Usual SaaS solutions use tracking solutions similar to Google Analytics (and are blocked by ad blockers).
All third party domains - wonderful :-). Please tell us more - would like to follow suit (am revamping my site. Where I have a different privacy protection strategy - no-one ever visits so no privacy loss)
What are you asking? How to host your own css/jquery etc?
(sorry if I'm an annoying old fart in the rest of this remark but I went back to desktop software 15 years ago (yes that sounds weird) and the last 2-4 years I have more and more troubles understanding discussions on modern web development as everything that was once considered Very Bad is now not only encouraged, but taken as the natural state of affairs - e.g. javascript for core functionality, the 'css is bad, do it in javascript' movement, 'semantic markup should not even be attempted', 'frontend frameworks', ...)
So yeah my question is not sarcastic, I'm just asking for some context.
Well partly I'm asking the same question - what am I missing? CSS and other CDN based stuff I get, but hosting my own analytics? Where does one start - what can be measured, what is worth measuring? What else is done on these 10MB JS downloads we all get these days?
I suppose there is a job of work to be done downloading the top 1Million web sites and seeing what crap comes through the door - but would be nice to know what the OP is replacing and with what so I know what is derigeur these days
Oh OK then we're on the same page and I don't have much advise.
Maybe just that you can look at Piwik analytics - open source and you can host it yourself.
Html/dynamic pages, images, CSS, js, fonts, analytics all from the same server (or at least same domain), and as much squashed together to avoid requests, that was the 'best practice' when I was still 'current'. I don't really understand either what else there would be.
You can certainly obtain this information with your own javascript and record it on your own server. Combine it with your server logs and you have can most of the same information you would have had from GA or another hosted solution.
Actually, GA only provides you with a [meager] subset of the data that Google gathers. Now it's their data.. not yours, and not your customers. They won't even give up IP addresses so that you can check it against your own logs.
Yes, and although that would remedy some of the tracking/privacy concerns with GA, the (grand)parent comment's concern with bandwidth would not be solved but transformed when taking the road you describe.
> How can developers make their apps better without having the proper analytics to do so?
Of course. They can just ask people.
Let's not pretend that all this data is for developers. It's only for a) advertisers to shit more on your users, and b) sales to micromanage the site into getting more conversions, usually at the cost of utility.
That's good, you have no business knowing my screen resolution, browser language nor lot of other information. You just feel entitled to have them, for some unfathomable reason.
Just prepare your web{site,app} and have the webserver serve it to my browser. If it is too large for my screen, my browser has these newfangled things called scrollbars to deal with it, and I will see that my screen is too small to comfortably view content you graciously share with me.
I couldn't agree more. Why can't web developers produce websites that just serve clean standards compliant html+css(+/-javascript where really need -- no highjacking scrolling is not a good use for javascript) and trust my browser to display it how I like it?
> How can developers make their apps better without having the proper analytics to do so?
The same way as desktop developers do. Test things yourself and get you mother (or some non-techy person) to try it out and see how much she/he swears when attempting to use it.
As someone mentioned below, there is self-hosted solutions to analytics tools like the free software project piwik. Nothing the app development process require that you send your customer data to a third-party for collection and processing.
Why don't they proxy host client-side analytics on their side, and don't put the word analytics in the endpoint names?
Most ads would be unblockable if you made the ads come from your domain and have them indistinguishable from your normal content in the URIs as far as I know.
IMO they will do this out of necessity soon enough.
This will mean advertisers can't count exact hits to their ads (or at least would be foolish to do so) so they will probably have to employ some kind of web crawler or HIT services to randomly sample the sites who are supposed to be serving their ads to make sure they are.
But eventually the blocker technology will become much better at blocking page element ads more easily and automatically. Then I guess they will have to think of something else.
This could be a huge opportunity for someone like Cloudflare. They could proxy agreed upon URLs to advertising networks and everything would originate from the same domain. Advertisers would establish a business relationship w/ Cloudflare and could trust that the traffic they are receiving is more or less legitimate.
Analytics are different from ads. When you're running analytics on your own site, you can trust your own logs. But when you are displaying adverts, the ad buyers will likely want better proof that their ad was displayed 'n' times.
They should crawl the website x-times a day (sample) to check if you ate displaying their ads. Or release a binary blob so that you can host their ads on your server (which reports back to their ad network once per hour)
I suggested this the last time this issue came up [1]. In the case of ads, some people are saying that it becomes a problem for advertising networks to actually verify the traffic if the connection is proxied, though I think there can be solutions developed for this given the enormous amount of money being lost. As far as analytics, this shouldn't be an issue because the primary recipient of the data is the site itself, and cheating on their own analytics would only make sense for a couple of applications (trying to sell the site based upon fraudulent analytics etc).
I may be misunderstanding this but if my users download a pixel/whatever from my site not from google analytics, then google does not know anything but my server IP
Or does client side mean the user downloads JS which does some investigation and reports back?
How about static ad-pictures served from the ad-network's server? The ad network can track the impression. That's how it all began back in the nineties, and it worked fine. Everyone would be happy.
For clarity, as some replies seem to have missed the point: because conforming clients will not send the cookie in the origin server request due to domain name differences.
The point is that third-party cookies will never reach your server (in this case Google Analytics' cookies), because the request they are piggy-backed on is going to Google, not to your server.
Analytics JS is not pointless because it is part of the page/service that the visitor is requesting. It is a legitimate form of data logging just like Apache or Nginx logs and reading cookies. Logging is a necessity for troubleshooting and improving any software service.
It's no more pointless than using Adobe Typekit for fonts or pulling jquery from a CDN. Both of these use a lot more bandwidth than an analytics ping, BTW.
The point you are missing is that users don't need to send an analytics ping for the page to load. If you were to conflate your jquery load with analytics so that the page was broken, then you could prevent the use of ad-blockers (although you're just as likely to make people think your site is just broken).
The use of bandwidth is a moot point. If advertisers hadn't abused the user's good will we wouldn't be where we are today, but there's no putting the genie back in the bottle.
If we're talking about "need" then the conversation gets sticky pretty quickly.
Users don't really need webfonts; text just needs to be readable. And they don't need to grab jquery from a CDN; it can come off the server. Heck, they don't need to grab anything from a CDN--the app server can serve images just as well as Cloudflare.
Do users really need DDOS protection? No, that protects the server, not the user. And it banks their legitmate request--needlessly--through a proxy server, which certainly adds latency. Not to mention that Cloudflare can see everything they're browsing.
Do users need single-page JS apps at all? Why not just render HTML4 on the server like in the good old days?
So why do developers use these technologies? Because it makes the user experience better. And so do analytics. Without analytics, developers are flying blind. And no, Apache/Nginx logs don't capture the same data--especially with more advanced JS-heavy sites.
Users want websites that load fast and are easy to use. It is impossible to build or improve such a site without data upon which to base decisions. That's why analytics are a necessity.
That's not a valid comparison. If your site needs a custom font, the users have to download it from somewhere - the data needs to be sent. For analytics, the HTTP request can be dropped entirely, since the data can be extracted from the web server logs.
For analytics, the HTTP request can be dropped entirely, since the data can be extracted from the web server logs.
No, it can't, at least not in general. That's what others here are trying to explain to you. It can be very useful, and in both the visitors' and the host's interests, for someone operating a site that has a lot of client-side interactivity to see what's really going on, for example.
If you want to know who is visiting your site, try reading your server logs.