Adding Images to your Sitemaps
Sitemaps are an invaluable resource for search engines. They can highlight the important content on a site and allow crawlers to quickly discover it. Images are an important element of many sites and search engines could equally benefit from knowing which images you consider important. This is particularly true for images that are only accessible via JavaScript forms, or for pages that contain many images but only some of which are integral to the page content.
Now you can use a Sitemaps extension to provide Google with exactly this information. For each URL you list in your Sitemap, you can add additional information about important images that exist on that page. You don’t need to create a new Sitemap, you can just add information on images to the Sitemap you already use.
Adding images to your Sitemaps is easy. Simply follow the instructions in the Webmaster Tools Help Center or refer to the example below:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/sample.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
</url>
</urlset>
We index billions of images and see hundreds of millions of image-related queries each day. To take advantage of that traffic most effectively, take a moment to update your Sitemap file with information on the images from your site.
Multiple Sitemaps in the same directory
We've gotten a few questions about whether you can put multiple Sitemaps in the same directory. Yes, you can!
You might want to have multiple Sitemap files in a single directory for a number of reasons. For instance, if you have an auction site, you might want to have a daily Sitemap with new auction offers and a weekly Sitemap with less time-sensitive URLs. Or you could generate a new Sitemap every day with new offers, so that the list of Sitemaps grows over time. Either of these solutions works just fine.
Or, here's another sample scenario: Suppose you're a provider that supports multiple web shops, and they share a similar URL structure differentiated by a parameter. For example:
http://example.com/stores/home?id=1
http://example.com/stores/home?id=2
http://example.com/stores/home?id=3
Since they're all in the same directory, it's fine by our rules to put the URLs for all of the stores into a single Sitemap, under http://example.com/ or http://example.com/stores/. However, some webmasters may prefer to have separate Sitemaps for each store, such as:
http://example.com/stores/store1_sitemap.xml
http://example.com/stores/store2_sitemap.xml
http://example.com/stores/store3_sitemap.xml
As long as all URLs listed in the Sitemap are at the same location as the Sitemap or in a sub directory (in the above example http://example.com/stores/ or perhaps http://example.com/stores/catalog) it's fine for multiple Sitemaps to live in the same directory (as many as you want!). The important thing is that Sitemaps not contain URLs from parent directories or completely different directories -- if that happens, we can't be sure that the submitter controls the URL's directory, so we can't trust the metadata.
The above Sitemaps could also be collected into a single Sitemap index file and easily be submitted via Google webmaster tools. For example, you could create http://example.com/stores/sitemap_index.xml as follows:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>http://example.com/stores/store1_sitemap.xml</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store2_sitemap.xml</loc>
<lastmod>2006-10-01</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store3_sitemap.xml</loc>
<lastmod>2006-10-05</lastmod>
</sitemap>
</sitemapindex>
Then simply add the index file to your account, and you'll be able to see any errors for each of the child Sitemaps.
If each store includes more than 50,000 URLs (the maximum number for a single Sitemap), you would need to have multiple Sitemaps for each store. In that case, you may want to create a Sitemap index file for each store that lists the Sitemaps for that store. For instance:
http://example.com/stores/store1_sitemapindex.xml
http://example.com/stores/store2_sitemapindex.xml
http://example.com/stores/store3_sitemapindex.xml
Since Sitemap index files can't contain other index files, you would need to submit each Sitemap index file to your account separately.
Whether you list all URLs in a single Sitemap or in multiple Sitemaps (in the same directory of different directories) is simply based on what's easiest for you to maintain. We treat the URLs equally for each of these methods of organization.
What’s new with Sitemaps.org?
What has the Sitemaps team been up to since we announced sitemaps.org? We've been busy trying to get Sitemaps adopted by everyone and to make the submission process as easy and automated as possible. To that end, we have three new announcements to share with you.
First, we're making the sitemaps.org site available in 18 languages! We know that our users are located all around the world and we want to make it easy for you to learn about Sitemaps, no matter what language you speak. Here is a link to the Sitemap protocol in Japanese and the FAQ in German.
Second, it's now easier for you to tell us where your Sitemaps live. We wondered if we could make it so easy that you wouldn't even have to tell us and every other search engine that supports Sitemaps. But how? Well, every website can have a robots.txt file in a standard location, so we decided to let you tell us about your Sitemap in the robots.txt file. All you have to do is add a line like
Sitemap: http://www.mysite.com/sitemap.xml
to your robots.txt file. Just make sure you include the full URL, including the http://. That's it. Of course, we still think it's useful to submit your Sitemap through Webmaster tools so you can make sure that the Sitemap was processed without any issues and you can get additional statistics about your site
Last but not least, Ask.com is now also supporting the Sitemap protocol. And with the ability to discover your Sitemaps from your robots.txt file, Ask.com and any other search engine that supports this change to robots.txt will be able to find your Sitemap file.
Duplicate content summit at SMX Advanced
The most important things to know about duplicate content are:
- Google wants to serve up unique results and does a great job of picking a version of your content to show if your sites includes duplication. If you don't want to worry about sorting through duplication on your site, you can let us worry about it instead.
- Duplicate content doesn't cause your site to be penalized. If duplicate pages are detected, one version will be returned in the search results to ensure variety for searchers.
- Duplicate content doesn't cause your site to be placed in the supplemental index. Duplication may indirectly influence this however, if links to your pages are split among the various versions, causing lower per-page PageRank.
At the summit at SMX Advanced, we asked what duplicate content issues were most worrisome. Those in the audience were concerned about scraper sites, syndication, and internal duplication. We discussed lots of potential solutions to these issues and we'll definitely consider these options along with others as we continue to evolve our toolset. Here's the list of some of the potential solutions we discussed so that those of you who couldn't attend can get in on the conversation.
Specifying the preferred version of a URL in the site's Sitemap file
One thing we discussed was the possibility of specifying the preferred version of a URL in a Sitemap file, with the suggestion that if we encountered multiple URLs that point to the same content, we could consolidate links to that page and could index the preferred version.
Providing a method for indicating parameters that should be stripped from a URL during indexing
We discussed providing this in either an interface such as webmaster tools on in the site's robots.txt file. For instance, if a URL contains sessions IDs, the webmaster could indicate the variable for the session ID, which would help search engines index the clean version of the URL and consolidate links to it. The audience leaned towards an addition in robots.txt for this.
Providing a way to authenticate ownership of content
This would provide search engines with extra information to help ensure we index the original version of an article, rather than a scraped or syndicated version. Note that we do a pretty good job of this now and not many people in the audience mentioned this to be a primary issue. However, the audience was interested in a way of authenticating content as an extra protection. Some suggested using the page with the earliest date, but creation dates aren't always reliable. Someone also suggested allowing site owners to register content, although that could raise issues as well, as non-savvy site owners wouldn't know to register content and someone else could take the content and register it instead. We currently rely on a number of factors such as the site's authority and the number of links to the page. If you syndicate content, we suggest that you ask the sites who are using your content to block their version with a robots.txt file as part of the syndication arrangement to help ensure your version is served in results.
Making a duplicate content report available for site owners
There was great support for the idea of a duplicate content report that would list pages within a site that search engines see as duplicate, as well as pages that are seen as duplicates of pages on other sites. In addition, we discussed the possibility of adding an alert system to this report so site owners could be notified via email or RSS of new duplication issues (particularly external duplication).
Working with blogging software and content management systems to address duplicate content issues
Some duplicate content issues within a site are due to how the software powering the site structures URLs. For instance, a blog may have the same content on the home page, a permalink page, a category page, and an archive page. We are definitely open to talking with software makers about the best way to provide easy solutions for content creators.
In addition to discussing potential solutions to duplicate content issues, the audience had a few questions.
Q: If I nofollow a substantial number of my internal links to reduce duplicate content issues, will this raise a red flag with the search engines?
The number of nofollow links on a site won't raise any red flags, but that is probably not the best method of blocking the search engines from crawling duplicate pages, as other sites may link to those pages. A better method may be to block pages you don't want crawled with a robots.txt file.
Q: Are the search engines continuing the Sitemaps alliance?
We launched sitemaps.org in November of last year and have continued to meet regularly since then. In April, we added the ability for you to let us know about your Sitemap in your robots.txt file. We plan to continue to work together on initiatives such as this to make the lives of webmasters easier.
Q: Many pages on my site primarily consist of graphs. Although the graphs are different on each page, how can I ensure that search engines don't see these pages as duplicate since they don't read images?
To ensure that search engines see these pages as unique, include unique text on each page (for instance, a different title, caption, and description for each graph) and include unique alt text for each image. (For instance, rather than use alt="graph", use something like alt="graph that shows Willow's evil trending over time".
Q: I've syndicated my content to many affiliates and now some of those sites are ranking for this content rather than my site. What can I do?
If you've freely distributed your content, you may need to enhance and expand the content on your site to make it unique.
Q: As a searcher, I want to see duplicates in search results. Can you add this as an option?
We've found that most searchers prefer not to have duplicate results. The audience member in particular commented that she may not want to get information from one site and would like other choices, but for that case, other sites will likely not have identical information and therefore will show up in the results. Bear in mind that you can add the "&filter=0" parameter to the end of a Google web search URL to see additional results which might be similar.
I've brought back all the issues and potential solutions that we discussed at the summit back to my team and others within Google and we'll continue to work on providing the best search results and expanding our partnership with you, the webmaster. If you have additional thoughts, we'd love to hear about them!
Source: http://googlewebmastercentral.blogspot.com/search/label/sitemaps
Updating and Deleting a Sitemap (from Google webmaster )
Updating a Sitemap
Google recommends initially submitting your Sitemap using Google Webmaster Tools. This will ensure that your Sitemap details are readily viewable on the first tab of the Sitemaps page in Webmaster Tools. In addition to Webmaster Tools, you can also submit (and resubmit) your Sitemap using the following methods:
- Sending a HTTP request to Google
- Including your Sitemap location in your robots.txt file
Resubmitting a Sitemap using Webmaster Tools
- On the Webmaster Tools Home page, click the site you want.
- Under Site configuration, click Sitemaps.
- Select the Sitemap you want to resubmit, and then click Resubmit.
Resubmitting a Sitemap by sending Google an HTTP request
If you do this, you don't need to resubmit it using Webmaster Tools. The Submitted column will continue to show the last time you manually clicked the link, but the Last Downloaded column will be updated to show the last time our system has fetched your Sitemap.
To resubmit your Sitemap using an HTTP request:
- Issue your request to the following URL:
www.google.com/webmasters/tools/ping?sitemap=sitemap_url
For example, if your Sitemap is located at http://www.example.com/sitemap.gz, your URL will become:
www.google.com/webmasters/tools/ping?sitemap=http://www.example.com/sitemap.gz
- URL encode everything after the /ping?sitemap=:
www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.gz
- Issue the HTTP request using wget, curl, or another mechanism of your choosing.
A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that Google has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid. To obtain status information about your Sitemap, resubmit it using Webmaster Tools account. We recommend that you resubmit a Sitemap no more than once per hour. An easy way to do this is to use the Sitemap Generator to set up an automated job to generate and submit Sitemaps on a regular basis.
Note: If you are providing a Sitemap index file, you only need to issue a single HTTP request that includes the location of the Sitemap index file; you don't need to issue individual requests for each Sitemap listed in the index.
Including your Sitemap location in your robots.txt file
To tell search engines about the location of your Sitemap, include the following line in your robots.txt file (updating with your own path and filename):
sitemap: http://www.example.com/sitemap.xml
Deleting a Sitemap
You can delete a Sitemap from your account, so that it no longer appears in Webmaster Tools. However, Google will continue to fetch your Sitemap data until you remove the Sitemap file from your webserver. If you want to keep the Sitemap on your server for use by other search engines, you can use your robots.txt file to prevent Google from accessing it.
To delete a Sitemap from your Webmaster Tools account:
- On the Webmaster Tools Home page, click the site you want.
- Under Site configuration, click Sitemaps.
- Select the Sitemap you want to delete, and then click Delete.
If you delete a site from your account, all associated Sitemaps will be deleted as well. If you delete all Sitemaps associated with a site, that site will still be listed in your account so you can view information about it.
Source: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=34609