For a project, we want to exclude some information on a page so that it does not appear on Google search results. We mainly want to hide the usernames from the comments, so that those users are not findable via Google. This is mainly meant for the protection of the users (and also for avoiding customer care calls when people find themselves in Google on pages they don't want to be found).
There are different solutions to this problem and we assume we didn't find the perfect one yet, so maybe someone of our readers has some insights.
Here are our solutions:
- Remove the usernames from the page, when the Googlebot is detected. Could work. No negative impact to the general visitor. Just two questions: If Googlebot disguises itself as a different user agent, we deliver it the usernames nevertheless, what does it do with it? Add it to the index? Treat the site as “you deliver different results to Google than your visitors. You're bad. You lost your karma”? (which we have to avoid, of course). I doubt such a small change will trigger that alarm or that it will end up in the index, but no one knows for sure (at least I didn't find anything).
Conclusion: Could work, unknown risk that it does bad things.
- Use images instead of text. As long as Google doesn't do OCR, that works. The general visitor can't copy&paste, but apart from it, it works for him. But blind people do have a problem then and accessibility is important on that site.
Conclusion: Does work. But is not accessible and may look very strange.
- Use Javascript to write the name. Actually we do that as spam prevention already on our website (and generally in the CMS) for email addresses. Mine looks something like and the function “obfscml” does then deobfuscate it (as you can guess the algorithm is not very fancy :)). I thought that works (and it certainly works for most spam email harvesters), but when I searched for this email address at Google I was quite surprised to find that page on the top spot. So Google actually does execute javascript on the page (it's not just regexing or parsing or similar as I read on other pages, the algorithm is not that easy).
So I went to the next stage and put the actual function into an external javascript file and – just to make sure – exclude it from Googlebot with robots.txt. And a week later, it was gone from the index. So that worked (and if it wouldn't work, Google wouldn't play by the robots.txt rules and I trust them that much)
Conclusion: It works, if you put the javascript function into an external file and exclude this with robots.txt. People with Javascript disabled don't see the name, but everyone else does and wouldn't notice anything. Still not perfect.
- Last but not least: A combination of images and javascript. You use images and write the name with javascript into the alt tag. The only people not being able to read the name are blind people with javascript disabled (and that's usually not the case, AFAIK). But still: Only works until Google does OCR and it may look alien in your page.
That's what we came up with. Nothing is really satisfying and there's no official solution by Google, as far as I know. Excluding whole pages from Google is easy, but excluding just parts of it almost impossible without dirty hacks.
If anyone comes up with a way more elegant solution, we really like to hear it. The comments are open.