Google Plays Nice: XML sitemaps, images, and a mystery

Ian Lurie Jul 20 2010

Last month, Google announced that they’re now accepting mixed-media XML sitemaps: You can put images, video and regular page URLs all into the same map.
I saw this and rubbed my hands together while cackling maniacally. I could finally make sure that valuable images that clients paid for, or paid to have taken, would get indexed!
I won’t go into the details of Googles image, video and page sitemap spec. You can read it all in their post. But my fumbling about in code led to an interesting discovery: Google’s not all that picky.
I wrote my own little Python crawler over the weekend.
Yes, I know how pathetic that sounds.
Anyway, I wrote a Python crawler. It goes out to a site, grabs the URLs of all pages and all images, and puts ’em all into an XML sitemap. Neato! My first real use of Python. Alas, I did it horribly wrong, and generated a sitemap that munges images and page URLs together as if they were all the same. That is, I put both images and page URLs between <url> and </url> tags. Turns out, that’s wrong.
According to Google’s post, you have to use all sorts of fancy code to insert images and video into an otherwise normal XML sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.sitemaps.org/schemas/sitemap-image/1.1"
        xmlns:video="http://www.sitemaps.org/schemas/sitemap-video/1.1">
  <url> 
    <loc>http://www.example.com/foo.html</loc> 
    <image:image>
       <image:loc>http://example.com/image.jpg</image:loc> 
    </image:image>
    <video:video> 
    <video:content_loc>http://www.example.com/videoABC.flv</video:content_loc>    
      <video:title>Grilling tofu for summer</video:title> 
    </video:video>
  </url>
</urlset>

Eesh. It’s actually a good thing I didn’t see that before I started. I would’ve thrown my hands up and never learned Python.
But here’s the thing: I submitted the incorrectly-formatted sitemap before I knew I’d done it wrong. Once I saw the error of my ways, I got ready to watch Google wallop my company site, or at least ignore all of the images. But they didn’t. Instead, Google is indexing the images I included in the sitemap, even though the sitemap’s wrong.
Before I generated the sitemap, a site:portentinteractive.com search in Google images showed about 100 images. The day after, it showed 180 images. Today, it shows 193 images.
Clearly, Google’s tolerant of dorky semi-competent programmers like me. But the real question, and I’m honestly curious if anyone out there knows, is: Do we have to use the fancy formatting, or is packing images into URL elements going to work for the long term?
Comment below if you have a theory, or if you happen to be a Google engineer.

Related, recent, and whatnot

tags : conversation marketing