SEO Tip #20: How can I optimize for “deep web” crawling?
Matt Cutts: We recently published a paper in VLDB, which I believe stands for Very Large Data Bases, that talks about our criteria all the way. We try to do it safely so that people don’t want their forms to be crawled, we won’t crawl them.
And so there are some very simple things you can do. Rather than having text that has to be filled out like a zip code, if you can make it drop down for example, that’s much more helpful. If you can make it so that it is not a huge form with 20 things to fill out, but more like one drop down or two dropdowns, that’s going to be a lot easier as well.
I definitely encourage you to go read the paper. There’s nothing super-duper confidential in it. Of course if you can make it so you’re not part of the deep web, if you can take those pages that’s your database and have an HTML site maps so that people can reach all the different pages on your site by crawling through categories or geographic areas then we don’t have to fill up forms.
Google is a pretty good company about being able to index the deep web through forms, but not every search engine does that. And so if you can expose that database somewhere where people can get to all the pages on your site just by clicking not by submitting a form, then you’re going to open yourself up to even wider audiences.
So if you can do that, that’s what I’d recommend. But if you can’t do that, then I’d say check out this paper from the VLDB conference where the team talked about it in more detail.