SEO Tip #133: How Can I Make Sure That Google Knows My Content Is Original?
Matt Cutts: I can get into a lot of interesting stuff about how to crawl the web. If you really want to know about a signal the inquest rate says you want to sample it two times that frequency but that fact is you can always change a webpage. So the perception of being able to crawl the entire web and having a perfect copy every instant is a little bit flawed because at any time we can only go and fetch a certain finite number of pages.
If we tried to fetch them all, and our architecture can almost support that then the web might crash from all of those requests. We try to crawl in a relatively polite way. We also try to prioritize based on things like the PageRank of a particular page or maybe a site might have a lot of PageRank.
The question is essentially if A is getting crawled a lot but the original article starts on B what if A rips off B? There are ways you can help to guard against that. For example if you do a Tweet people will see it. People may link to it and we may follow those links faster than we will discover it the other way.
Another thing you can do is hook up things like Pub Sub Hubub, which will ping various places. There is a very limited amount in which we will use Pub Sub Hubub to help improve our crawl and that might change over time. That’s a great way to asynchronously say hey there is a new article or there is a new blog post.
But let’s go ahead and play with this hypothetical scenario. If A has copied your article and changed the time stamp that’s a little bit deceptive you know. It’s as if they are claiming that they have written it so you can do a couple things.
If you have already authored that article you can always do what is known as a digital millennium copyright act, sort of notice, where you send in this DMCA request. You can find the information at Google.com/DMCA.html and basically what you are saying is this site copied me but I’m the original author. Now this other site can counter notify and say, “No I wrote this page,” which has some penalties to it if they are lying or they cannot dispute it and the stuff disappears off the other site. So if someone is ripping you off you can always to a DMCA notice.
You can also for example if it’s an auto generated site that is ripping off or scraping a bunch of people you can also do a SPAM report because that’s not a high quality site and it’s not the sort of thing we want to have in our index.
But, let’s just play it all the way out to the corner case, it is possible that we will find an article on one site before we find it on the other site. And so it is definitely the case that we try to hard to find out who the original creator of that particular piece of content but I wouldn’t claim that we are perfect. We do as much as I can think of to try to figure out what are the ways that people can indicate that they wrote the content.
In fact in Google News we just introduced a couple new tags, almost as an experiment to see how well it works to sort of say, here is the original author of this content. So there are approaches we are exploring to sort of figure out if there are other ways to do that but at least for the time being in theory it is possible to have an article. In practice it tends to not happen that often and you do have ways you can get around that or ways that you can take action from a DMCA request all the way up to a SPAM report.
Hope that helps.