Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Discussing CNET’s content-pruning to improve SEO ranking (tedium.co)
141 points by thm on Aug 13, 2023 | hide | past | favorite | 79 comments


one aspect of this that's overlooked in the discussion is how google penalises pages for bad core web vitals. instead of marking a single page bad and reducing its ranking, google decides using some weird algorithm to bundle a bunch of pages together into a group and penalise them based on the average core vitals metrics in the group. this would explain easily why old content is not worth keeping around. older pages might not be optimised for current web vitals metrics, but they can bring down stuff you care about a lot.

SEO is black magic and astrology, so there's a lot of guessing going around, but I think I can vouch for this as this particular issue happened to one of the sites I maintain, and it caused the homepage to lose lots of traffic because it was bundled as related with some ancient blog posts. the homepage performed quite well according to all the metrics, but old pages did not, and until Google started penalising groups last year it didn't make a difference to keep the old pages around. Since they started penalising random groups, something that's important can be affected by something old and irrelevant, and the feedback loop is very slow.

It takes 28 days to get feedback from the web console about any potential changes to page performance, so I tried fixing the old pages in a few ways, then just gave up and deleted them. the average rose up, the penalty to the whole group was gone, and the homepage again started getting a decent amount of traffic.


Google's relationship with blogs is downright schizophrenic.

So it seems you can be penalized for keeping old content around. You're also penalized for not generating and updating new content all the damn time.

The marketing guys I work with are always on my case about this.

"We need new content!"

"We have nothing to say on our corporate website that people will want to read."

"It doesn't matter. Write stuff anyway, for SEO."

"You're not going to get me to publish blogspam just because you think it's what Google wants."

"B-but you don't understand! You need to. Everybody else is doing it!"


Fortunately LLMs should end this idea of ranking content. It’s going to poison the well so badly search engines will have to rank on everything but content. Except links already proved too spammy and Google had to change gears to prioritize content. I wonder where that leaves them now.


It will leave them in a tarpit of their own making. Google was useful only in a world where content quality mattered.


> older pages might not be optimised for current web vitals metrics

For news sites that's unlikely to be the reason.

Generally speaking, old online pages are the same as new online pages, because all the article content is just pulled from a database, a CMS. All the rest of the page (menu, ads, footer, etc.) is modern and kept up-to-date. It's not like an article from 2007 is using the same HTML that it was in 2007.

Just because on your personal blog, you'd left old blog pages on some ancient clunky Wordpress (?) version, that isn't how large publishing sites operate.


It’a not that simple. Older content might not have webp images variants available, or styling might be optimised for newer stuff and cause old pages to reflow, and many other things like that.

Key web vitals change over time and it’s often a mix of content, styling and scripting that needs to be optimised to fix CVW issues. Revisiting old content and rebuilding old assets might not be worth it.

I’m not talking from a perspective of maintaining a personal blog, but a product web site that currently gets about 1.5 million visits from organic search per month. It’s not CNET, granted, but not a trivial site either.


A large website wants to have its current content cached as much as possible—ideally very few users should be hitting the CMS directly—but it doesn't make sense to cache content from 1995 on the off chance that today is the day someone needs it. In fact, to speed up database performance on your current content it may even make sense to store your archives in a different, less beefy database.

However, if Google penalizes slow performance on any page, even archives, these strategies could damage your search rankings even for your current content.



A lot of CMS software allowed you to add custom HTML, so there are hundreds of thousands of images with "float:left" hardcoded among many other deprecated HTML stuff that Google Search doesn't like.


I don’t have SEO expertise but I did dabble in it a little bit on a need to basis and listened to a few experts, etc.

And the people dismissing the idea that pruning older pages has no SEO benefit don’t know what they’re talking about because frankly the only way to know if it does or doesn’t have an effect is to try it.

Also, there seems to be a segment of people laughing at CNET because Google explicitly says this will not benefit. Well, these people know even less because Google absolutely lies. Both by making knowingly false statements but more so by making statements which are correct by the letter but not the spirit.

So, for example, Google might say it does not penalize a site for having too many older pages at all. And this may be true (or it may not, but let’s assume it’s true.

But it’s entirely possible that eliminating older pages does improve CNET’s SEO scores. It’s entirely possible that while Google does not penalize old pages, it penalizes websites that have a lot of pages that look similar. So, for example, CNET probably had 15 articles which sound very similar for each version of the iPhone that is released. They have a bunch of review articles for different products in the same category thst look very similar. They have articles comparing the 2021 Lenovo with the 2023 model that looks similar.

And Google’s algo May start thinking it’s an SEO targeting content farm and hurt CNET’s SEO score accordingly.


We should not be optimizing for search engines. We should be optimizing for humans. If Google wants to shoot themselves in the foot by refusing to show their users the content they're looking for, let them. Don't let Google drag you down with them by encouraging you to make your site worse.


That's all good in theory and all, but we all know that the #1 step of most humans is to type a search into Google, and that Humans refuse to look at anything outside the top page of hits (maybe the first ~5 entries).

The name of the game, if you're trying to serve humans, is to get to the top ~5 entries on Google.


My website is https://robbyzambito.me.

Did you discover that from Google? Did you refuse to look at that link because it wasn't presented as a top 5 hit on Google? Are you not human?

My point is that information can be propagated outside of search engines very easily. Every link on Hacker News is presented to you outside of a search engine.

Optimize for humans. Tell humans that you made something. If Google doesn't want to service their users, that's not a problem you should work on fixing.


Are you seriously suggesting that manual word of mouth marketing over niche forums has the same reach as hitting the top 5 hits on Google through Search Engine Optimization?


No. I'm arguing that "reach" is not something you should strive for. Quality is.


Seriously, stop and think about what you're saying.

These are companies looking to make money to pay staff. Not die on a hill of what they wish reality was, rather than what reality actually is.


Seriously, stop and re-read the above. You think the current incentives are natural and we must accept this?

No one is telling any individual company to do a specific thing in a vacuum. The suggestion is we re-shape the incentives we have created as an industry and society. It's about values, and we should bucket the discussion because the current batch of MBAs in charge at these companies, who will likely all be gone in 18 months no matter what, are focused on the immediate? Come on. That's not how anyone should determine what their values are.


I've re-read it.

There's no sign of that suggestion in this thread. The GP was talking about today, and has never mentioned huge sweeping policy changes from multiple governments around the world to make Google's business model obsolete.


When did the conversation turn to nature?


If I were to elect you to be the new CEO of CNET tomorrow, what would be your action plan?

Because so far it’s not very promising, and “just let the company die” is not a strategy people are going to implement.


I would leave, because I do not want to be the CEO of CNET.


"Strive for quality!"

"Okay, how?"

"You'll figure it out. Bye!"


No, they won't figure it out. That's why I'd leave. I believe quality is incompatible with the size of their "target audience". You cannot make everyone happy.


That's all well and all, but CNET is an advertising supported website, so "reach" is its revenue stream.


Okay. And do you think that's a good thing?


Ah yes, HN would be so much better if every company on Earth constantly posted links to their sites in comment threads.

Information can and does get propagated outside of search engines very easily. That’s the premise of huge industries like email marketing, social media, banner ads, and PR. Are these marketing channels universally beloved by humans? Not exactly.


I don't know why you think anyone suggested every company on Earth should post links to their sites in comment threads. I am not every company on Earth. I am me. My comment was meant to demonstrate me talking as me. Not as anyone else.

I think the disconnect here is that you are assuming everything should scale infinitely. Ie: if one company posts, every company should post. If one company sends out spam emails, every company should send out spam emails. If one company tries to make their website #1 on Google for a given search, every company should do the same.

My whole point about talking to your target audience was meant to directly contradict this. Your target audience can only be a finite set of people, because you can only talk to a finite set of people. Otherwise, you're just producing noise and hoping someone will listen to it.


The point of the joke is that what works for you, won’t work for companies. Different goals, different reactions.

The irony to this whole thread is that CNET is actually highly targeted about their audience development, which is how they decided which content to prune in the first place.


> what works for you, won’t work for companies

Citation needed. I can tell you about lots of companies that have existed for years. That I never found from Google.

> Different goals, different reactions.

I don't know what you mean by "reactions" here, but I don't think that their goals are good if they end up making what they produce worse in the eyes of their target audience in order to meet that goal.

> CNET is actually highly targeted about their audience development

Some people are saying their audience is tens of millions of people, and others are saying their audience is "highly targeted". Which is it? Those two are claims are extremely incompatible.


Coming in with a discussion about CNET and then talking about your own personal website seems... misguided?

I don't know if CNET is making the right call here. But I can at least understand the logic and thought process they went through, however misguided it appears.


That sounds instead like optimizing your car for the parking lot. If Google doesn't see your site, you've got to put 100x the effort into marketing that site.


And how do you propose making that happen?


1. Identify a target audience

2. Identify what that audience wants (by asking them)

3. Build what the target audience asked for if you agree that it is a sensible thing to create.

4. Tell the target audience that you have built your thing.

Note that "target audience" here is probably not Google, but on all 4 points, SEO replaces "target audience" with "Google". Pruning old information is not something that any group of humans have requested besides Google.


If your target audience is a close set of friends, family, professional associates, or other well-defined community, then yes, reaching out through that specific network is probably the best way of reaching it.

Which in mondern parlance is what is being accomplished by newsletters, mailing lists, and various social media feeds.

But if you're in the business, as in commercial enterprise, of reaching out to a large and generally undefined audience, or are responsible for government or NGO services, etc., general Web search is among your most valuable marketing mechanisms. And that sector includes literally billions of people and hundreds of billions of dollars worth of transactions or equivalent value.

And manual outreach through a small group ... really isn't effective.

I'm not saying that your specific use case is invalid. I am saying, as someone with a deep and abiding animosity to virtually all modern marketing, advertising, and SEO, that modern marketing, advertising, and SEO do in fact have a place. And that your comments here are largely asserting otherwise.


> But if you're in the business, as in commercial enterprise, of reaching out to a large and generally undefined audience

I don't think people should skip step 1 in the name of making a buck. Having a target audience should be critical for a commercial endeavor.

> And manual outreach through a small group ... really isn't effective.

Why do you think that?


Target audiences are identified in mass-marketing campaigns, the term of art is market research.

Your second question simply doesn't deserve a response.


> Your second question simply doesn't deserve a response.

Clearly it did, and quite a disrespectful one at that. I guess (and I have to guess, since you won't tell me) you think that because you can't talk to many people, and you think it is self evident that businesses should target many people.

I do not think that is self evident. I think instead of few people targeting many people, many people should target few people. You have probably heard that you can't make everyone happy. That's true. But it becomes less true the smaller the group that "everyone" is. If your target audience is a handful of people, you can make them all happy. You can even potentially individualize what you produce to meet the needs of specific individuals. You cannot do that when your target audience is a billion people.

Be nicer next time.


Isn’t the most effective way to do 4 to get your thing at the top of Google?


If I want to tell my mom something, the most effective way is not to try to sneak my message into some search result of hers.

When I say "target audience" and "humans" I mean it in the most literal sense. The most effective way to tell people that you built something for them is to tell them.


Let's make this concrete. Let's say you're CNET—your target audience is millions upon millions of people who are interested in tech news. What is your alternative strategy for telling these tens of millions of people that you have a new article on topic X?

Note that only a tiny fraction of your audience is already subscribed to some form of push notification from your service.


> What is your alternative strategy for telling these tens of millions of people that you have a new article on topic X?

None. I don't know tens of millions of people.

Why do you think that CNET - a relatively small group of people - should have the power to reach an influence tens of millions of people, even if only in a small way per person? I don't think that's healthy for anyone involved.


I see. So when you refer to your mom you meant that literally—humans should only ever strive to do business with people they know directly.

I happen to disagree, but it's obvious that we're working from such completely different axioms that there's no point in arguing this further.


The premise you're suggesting is only the most effective way to tell a small number of people (ie tell them directly).

That's why radio, TV, newspaper advertising was so effective for so long.

Even for a small convenience store, larger scale - mass audience, automated - advertising is almost always a necessity and a far more effective use of time.


3 days ago: https://news.ycombinator.com/item?id=37068464 ("CNET is deleting old articles to try to improve its Google Search ranking", >500 comments)


Ouch. Google's SEO PR rep really messed up the damage control. So many non-answers and links that essentially say "I'm not an engineer but trust me, the engineers I talk with say you're all wrong."


Welcome to Google's SEO PR strategy since their inception. If Google says to do something, you should try to do it. If they say NOT to do something, you should definitely do it.


I don't know much about SEO or content pruning, so I make no judgment on the conclusion of this article.

What I can say is that this is genuinely one of the worst-written pieces I've ever read. Nowhere does the author provide any sort of argument for their conclusion. At no point during that long, rambling rant did the author give a single reason why we should agree with them, nor did they propose alternative solutions to the problem that CNET is facing.

If this were written for a high-school english class it would receive a failing grade.


By any chance, did you stop reading at the transition that looks awfully like the beginning of an ad box or some sort of call to action for whatever they’re presumably selling? The article actually continues and somewhat substantiates it’s argument, but I agree if you read only what appears to be the main body of the article, there’s very little there.

The whole thing isn’t terrific either, but the page layout further detracts.


It also displays a hilarious real live instance of the metaforgotten trope.

"A good marriage is like an orange"


Even if this strategy is good in terms of seo, they could have just archived those articles on a different domain.


I wonder if also adding a redirect would still incur an SEO penalty.


Yes, sortof. 'Crawl budget' issues.


Can they not just noindex them?


Sure, and I agree. But also, CNET has gotten more attention because of this stunt than anything else in the last decade because people WON'T STOP TALKING ABOUT IT. So while this technique probably won't help anyone else the same way, for CNET specifically I bet this has absolutely increased their Google traffic.


> people WON'T STOP TALKING ABOUT IT

Maybe this is anecdotal on my part, but I’ve heard nothing else about this since the initial article [1], then this — by the same author.

On the other hand, maybe it’s anecdotal on your part, and not really that many people care, besides maybe this one author.

> has absolutely increased their Google traffic

I really doubt it, unless you have something to back the claim up.

But even if so, this type of attention won’t sustain, and long term, it will likely only hurt them.

[1] https://news.ycombinator.com/item?id=37107149


Will it though? I don’t visit CNET and I still won’t. Google isn’t going to rank any of their articles higher.


It helps them by generating a ton of links and mentions which Google still uses as a signal for authority. Getting high quality links from all sorts of places around the web is worth its weight in gold. So with these new links it will help to improve their rankings for a wide variety of keywords.


This isn't the kind of attention a (allegedly) journalism outlet benefits from.


How the hell did we get here? Were Microsoft and oracle and all these super rich tech giants asleep when Google was eating their lunch? Facebook too. It’s nuts


I think Oracle is/was run by salespeople. They wanted you to pay a fortune for a database. Even if you could use a free database, they wanted you to pay a fortune for theirs. Because theirs could do a barrel roll and a hand-stand at the same time.

Then of course when sales went down, you can always threat and extort your current customers: https://archive.is/Tbvw0

Microsoft just wanted to protect their windows thingy and pretend the internet didn't exists. They couldn't so they tried to embugger it. Once they had dominating market share with IE6, they simply said they weren't making money on IE and weren't going to update it any more. So they left it bug ridden for 5 years until the next release.

Google shows up, with all their bondage and evil intentions carefully wrapped in satin and free beer on tap. But now the beer is finished and the satin is worn out. Everybody is hungover and tied up. The sounds coming from the next room is like something is being strapped on.


The thing is, even though I distrust Google a lot, I distrust Microsoft, Oracle and Facebook even less. Even if Bing got 2x better than Google at being able to show me what I actually wanted to find, I'd probably prefer Google for quite some time before even thinking of giving Bing a shot.

For all I know, they do have better search engines than Google right now. But also, fuck those guys.


No, almost everybody saw Google rising. Their capture of the search market and the relatively quick translation of sales from that was a very prominent event in tech circles. And Facebook wasn't relevant during the initial establishment and rise of Google (Facebook's first billion dollar sales year was 2010, 12 years after the founding of Google).

Steve Ballmer (then CEO of Microsoft) swore he'd kill Google and bury Eric Schmidt - this is 2005 or earlier. Google's sales for 2004 for reference were $3.18 billion (for Alphabet now it's $282 billion).

https://arstechnica.com/information-technology/2005/09/1106/

https://www.cnet.com/tech/tech-industry/court-docs-ballmer-v...


I'd be surprised if search engines don't penalize websites that 404 previously crawled pages. They'd want to avoid the risk of a page disappearing and causing a poor user experience.


They clearly couldn't care less about user experience. This is all about selling ads. The web was a beautiful thing for a moment in time, but the rotting corpse that is left after "adtech" did its thing is an anti-user monstrosity, to a degree that is almost comical.


> an anti-user monstrosity

The web today has 6 billion users instead of maybe 50mm in the golden age you're reminiscing about.


Can that be attributed to advancements of the web, or just the incredible advancements and price drops in the devices used to access it?


It depends on the pages. Search engines actually reward sites for 404ing previously crawled pages, if the pages had little value in the search index.

Whether that is done on purpose, or is an unintended side effect of other ranking decisions, is up for debate.


Not necessarily 404. They could redirect (301) to another, better ranked similar content.


That doesn't work for a news site—unless they're doing a ton of duplicate stories for some reason, "similar" isn't good enough if someone is looking for an article to go with a specific headline they found through search.


I don't know why you're being downvoted. Link rot is a cancer we should all work against.


Websites get redesigned, companies go out of business, etc. It's a nice principle that links should live on forever, but it's not very practical for a decentralized web (which many profess to be in favor of).


>Websites get redesigned, companies go out of business

these two things are not remotely the same. if you are redesigning your website then it's your responsibility to make the old links redirect to the new pages. if you don't do that, it's a broken redesign.


Yes. Things should not die, we should all live forever, as bits and bytes


> A news website is not a bonsai tree

well said


But if traffic originators (e.g. Google) have a bonsai section that comprises the front half of their results... you can't fault a lot of non-bonsai trees for considering how they might qualify.

At the end of the day, this is Google's mess.

The article summarizes the situation well (clarification added by me):

>> I think CNET is making a mistake, but they’ve also identified a serious problem for the digital news ecosystem: Old content is a burden to carry [because it lacks a viable economic model], and intentionally or not, Google isn’t making that lift any easier.


> you can't fault a lot of non-bonsai trees for considering how they might qualify.

I will 100% fault them for acting on it. Just like I'll fault a company for choosing terrible single use plastics that increase their bottom line a bit, and companies that use AI to reduce head count. You can disagree with a decision even if you understand why it was made.


Companies should be using AI to reduce head count whenever possible. Welfare should be provided by the state, not privatized. Productivity gains are good.

What if people are put out of work by dropping single-use plastics?


> A news website is not a bonsai tree

well said

If the big box store's garden center only sells bonsai trees, then all of the growers will switch to bonsai trees.

Google is the big box store of the internet.


The main argument here is probably that news sites should have some obligation to the historical record. Normal product companies should arguably archive old product documentation in some form but relatively few would argue that they shouldn't regularly prune marketing literature, whitepapers, and other information that gets stale and is in the way of finding current stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: