4 tips for technical SEO
Ian Lurie Apr 30 2012
We’re all talking a lot about content and social media these days. But visibility boils down to technical SEO: Ensuring that search engines can easily find and categorize every page of your site. These are the top 4 I look at first when I’m auditing a site:
1: Check server response codes
You can use our server response code checker for starters. Just make sure your server is delivering a 404 for a broken link, a 200 for a page that’s just fine, and a 301 for a redirect. You can learn about server response codes in this post I wrote a couple years ago.
2: Seek and destroy duplicate content
Duplicate content hurts site quality and crawl efficiency. You need to get rid of it. We have our own crawler for testing this, but you can use Screaming Frog SEO Spider. Distilled has a fantastic article that includes detailed instructions on using Screaming Frog to find duplicates.
3: Find unreachable pages
Site owners find all sorts of ways to make pages on their site vanish. They orphan pages; they break links; they remove all possible ways of reaching a specific page. You need to find those.
There’s no one fantastic way I know to do this. But some of the tricks I use are:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
4: Look for spider traps
Content management systems like WordPress have lots of extra little snippets of code they use to schedule tasks, deliver content via AJAX, handle searches and generate navigation. That’s fine, but if a search bot starts beating the poop out of some of these snippets, they can suck the life out of your server.
Here’s an example: A few weeks ago, RandFish was kind enough to tweet about a post we’d written on this very blog. That same day, I had an article go live on TechCrunch. As a result, we got about 5x our normal traffic. No big deal.
Unless, of course, you’ve already got GoogleBot rattling around between a WordPress AJAX script and a database scheduler every 15 seconds or so. Then your server coughs, sputters and flips over on its back, waiting for a tummy rub. It also locks up so badly that no amount of cursing or talking nice will get it to let you log in and fix it, by the way. In case you were wondering. And I know you were.
When I looked at our log files for the last month, I found two URLs that GoogleBot kept hitting: wp-cron.php and admin-ajax.php
‘Kept hitting’ means ‘latched onto like a leech at a blood bank’. GoogleBot hit these files 4-5 times per minute.
We disallowed them, and voila: No more crashing server.
That was a classic spider trap: Pages or scripts no bot should find, but did.
Check your log file BEFORE your site crashes and you can avoid our embarrassment.
Lots more
These four tips are just for starters. You can check for broken links, work on site speed and clean up your code, for example. But once the really easy stuff is addressed, the four ideas above should keep you busy for a while.
What do you all look for in a technical site audit?

Ian Lurie
CEO
Ian Lurie is CEO and founder of Portent Inc. He's recorded training for Lynda.com, writes regularly for the Portent Blog and has been published on AllThingsD, Forbes.com and TechCrunch. Ian speaks at conferences around the world, including SearchLove, MozCon, SIC and ad:Tech. Follow him on Twitter at portentint. He also just published a book about strategy for services businesses: One Trick Ponies Get Shot, available on Kindle. Read More
Good to know. First time I heard about spider traps. I have already dissallowed /wp-admi, which has admin-ajax.php and the wp-cron.php file too :D
Thanks for the advice.
Ian,
Great tips on some technical SEO practices that largely fly under the radar. I’m all about focusing on developing and sharing great, value-adding content, but these structural and technical fixes set the stage for that content to garner the SERP ranking it deserves.
I appreciate your willingness to share!
Thanks for taking the time to discuss this. I’ve have to tell you, you are right on. Do you mind if I reference to this blog from my newsletter?