• Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 848 other followers

  • Karen Kallets Twitter

  • Recent Posts

  • Top Clicks

    • None
  • Pages

  • Top Posts

  • Categories

  • Recent Comments

    Shana on How can you get your blog inde…
    Quentin on Blogging Mistakes for Beg…
    Adell on What other blogs are saying ab…
    Jackie on Ping-O-Matic is your one stop…
    Hermine on What other blogs are saying ab…
    Ans on Tools for creating business we…
    Harris Ferencz on 2010 Marketing Plans: Facebook…
    Kevin Moreland on How can you get your blog inde…
    Twitter on Retweeting your own tweet…
    spam filtering on How can you get your blog inde…
    tweet on Retweeting your own tweet…
    Rich on 5 Best Printer Friendly WordPr…
    rtyecript on How can you get your blog inde…
    cp cheats on How can you get your blog inde…
    cp cheats on Ping-O-Matic is your one stop…
  • Archives

WordPress adds Sitemap ping to blogs.

 The WordPress email newsletter had the following information on WordPress adding sitemap pings.  That is very good news since Google Caffeine launched and has changes the way your pages are now searched and ranked and you may be seeing a fall in your blog traffic.  The pings will help this fall off in readers and help to regain your blog traffic.  Be sure you are adding new blogs and keeping the content fresh. 

  

Reach Out and Ping Someone

Niall Kennedy | February 11, 2010 at 12:12 am | Tags: search, sitemap | Categories: search | URL: http://wp.me/pf2B5-Yj
” Publishing your blog on WordPress.com lets you focus on your content while we sweat the technical stuff, including helping your content reach a larger audience.

 We just turned on sitemap pings for all WordPress.com blogs. Now, immediately after you publish or delete a page or post, your WordPress.com blog sends a ping to Google, Bing, Yahoo! and Ask*. These immediate notifications help the major search engines receive your new content as quickly as possible (often within seconds), so your blog can show up in search results faster.  

Tools to make your content findable
Sitemap pings are just one of the ways WordPress.com helps your content reach a large audience moments after you hit “Publish.” Every blog includes support for webmaster validation through Google, Bing and Yahoo! webmaster portals. Post updates are sent through Ping-o-Matic!, a ping relay tool owned by the WordPress Foundation, to major feed reader and blog search engines. Our Publicize feature updates your Yahoo! and Twitter accounts with a short summary and a link back to your blog content. These are just some of the ways WordPress.com helps you find your audience. 

* Note: We only expose content to search engines for public blogs on WordPress.com.”

Advertisements

Creating a sitemap on Google, Bing, Yahoo and Ask.

XML Sitemap—or sitemap is a list of the pages on your website, each search engine needs it to their specifications and requirements.  When you create and submit a sitemap helps make sure that Google, Yahoo, Bing and Ask know about all the pages on your site, including URLs that may not be  found by the  normal crawling process.  Having a site map for your website it a key to getting searched by the various search engine bots that I have written about in a previous blog about the 5 kinds of  Search Bots.  Your webmaster or agency should have all this information at their finger tips, if they do not,  you may not be getting the proper search engines you want going to your site.

Knowing what to ask, beyond… do we have a sitemap or do we have an XML feed needs this knowledge.  Ask if they have done all the formatting needed to have the XML sitemap work on all the search engines is key.  Having all the information at your finger tips to get the search engines to the search engines and what they need is priceless. 

Here is the data that I use from each one of the search engines as to what they require and how to:

____________________________________________________________________

Creating and submitting sitemaps to Google

The Google site says:

“About Sitemaps

Sitemaps are a way to tell Google about pages on your site we might not otherwise discover. In its simplest terms, a XML Sitemap—usually called Sitemap, with a capital S—is a list of the pages on your website. Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google’s normal crawling process.”

In addition to regular Sitemaps, you can also create Sitemaps designed to give Google information about specialized web content, including video, mobile, News, Code Search, and geographical (KML) information.

When you update your site by adding or removing pages, tell Google about it by resubmitting your Sitemap. Google doesn’t recommend creating a new Sitemap for every change.”
______________________________________________________________________________

Bing Webmaster Tools

To submit an XML-based Sitemap to Bing:

Step 1: Copy and paste the entire URL below as a single URL into the address bar of your browser:

     http://www.bing.com/webmaster/ping.aspx?sitemap=www.YourWebAddress.com/sitemap.xml

Step 2: Change “www.YourWebAddress.com” to your domain name

Step 3: Press ENTER”

_________________________________________________________________________

How Yahoo! supports sitemaps

“How Yahoo! Supports Sitemaps

Yahoo! supports the Sitemaps format and protocol as documented on www.sitemaps.org .

You can provide us a feed in the following supported formats. We recognize files with a .gz extension as compressed files and decompress them before parsing.

  • RSS 0.9, RSS 1.0, or RSS 2.0, for example, CNN Top Stories
  • Sitemaps, as documented on  www.sitemaps.org
  • Atom 0.3, Atom 1.0, for example, Yahoo! Search Blog
  • A text file containing a list of URLs, each URL at the start of a new line. The filename of the URL list file must be urllist.txt. For a compressed file the name must be urllist.txt.gz.
  • Yahoo! supports feeds for mobile sites. Submitting mobile Sitemaps directs our mobile search crawlers to discover and crawl new content for our mobile index. When submitting a feed that points to content on the mobile web, please indicate whether the encoding of the content was done with xHTML or WML.

    In addition to submitting your Sitemaps to Yahoo! Search through Yahoo! Site Explorer, you can:

  • Send an HTTP request. To send an HTTP request, please send using our ping API.
  • Specify the Sitemap location in your site’s robots.txt file. This directive is independent of the user-agent line, so you can place it wherever you like in your file.
  • Yahoo! Search will retrieve your Sitemap and make the URLs available to our web crawler. Sitemaps discovered by these methods cannot be managed in Site Explorer and will not show in the list of feeds under a site.

    Note: Using Sitemap protocol supplements the other methods that we use to discover URLs. Submitting a Sitemap helps Yahoo! crawlers do a better job of crawling your site. It does not guarantee that your web pages will be included in the Yahoo! Search index.”

    ___________________________________________________________________

    Ask.com webmaster FAQ

    “Web Search

    The Ask.com search technology uses semantic and extraction capabilities to recognize the best answer from within a sea of relevant pages. Instead of 10 blue links, Ask delivers the best answer to user’s questions right at the top of the page. By using an established technique pioneered at Ask, our search technology uses click-through behavior to determine a site’s relevance and extract the answer. Unlike presenting text snippets of the destination site, this technology presents the actual answer to a user’s question without requiring an additional click through. Underpinning these advancements are Ask.com’s innovative DADS, DAFS, and AnswerFarm technologies, which break new ground in the areas of semantic search, web extraction and ranking. These technologies index questions and answers from numerous and diversified sources across the web. It then applied its semantic search technology advancements in clustering, rephrasing, and answer relevance to filter out insignificant and less meaningful answer formats. In order to extract and rank exciting answers, as opposed to merely ranking web pages, Ask.com continues to develop a unique algorithms and technologies that are based on new signals for evaluating relevancy specifically tuned to questions.

    The Ask Website Crawler FAQ

    Ask’s Website crawler is our Web-indexing robot (or crawler/spider). The crawler collects documents from the Web to build the ever-expanding index for our advanced search functionality at Ask and other Web sites that license the proprietary Ask search technology.

    Ask search technology is unique from any other search technology because it analyzes the Web as it actually exists — in subject-specific communities. This process begins by creating a comprehensive and high-quality index. Web crawling is an essential tool for this approach, and it ensures that we have the most up-to-date search results.

    On this page you’ll find answers to the most commonly asked questions about how the Ask Website crawler works. For these and other Webmaster FAQs, visit our Searchable FAQ Database.

    Frequently Asked Questions

    1. What is a website crawler?
    2. Why does Ask use a website crawler?
    3. How does the Ask crawler work?
    4. How frequently will the Ask Crawler index pages from my site?
    5. Can I prevent Teoma/Ask search engine from showing a cached copy of my page?
    6. Does Ask observe the Robot Exclusion Standard?
    7. Can I prevent the Ask crawler from indexing all or part of my site/URL?
    8. Where do I put my robots.txt file?
    9. How can I tell if the Ask crawler has visited my site/URL?
    10. How can I prevent the Ask crawler from indexing my page or following links from a particular page?
    11. Why is the Ask crawler downloading the same page on my site multiple times?
    12. Why is the Ask crawler trying to download incorrect links from my server? Or from a server that doesn’t exist?
    13. How did the Ask Website crawler find my URL?
    14. What types of links does the Ask crawler follow?
    15. Can I control the rate at which the Ask crawler visits my site?
    16. Why has the Ask crawler not visited my URL?
    17. Does Ask crawler support HTTP compression?
    18. How do I register my site/URL with Ask so that it will be indexed?
    19. Why aren’t the pages the Ask crawler indexed showing up in the search results?
    20. Can I control the crawler request rate from Ask spider to my site?
    21. How do I authenticate the Ask Crawler?
    22. Does Ask.com support sitemaps?
    23. How can I add Ask.com search to my site?
    24. How can I get additional information?

     

    Q: What is a website crawler?
    A: A website crawler is a software program designed to follow hyperlinks throughout a Web site, retrieving and indexing pages to document the site for searching purposes. The crawlers are innocuous and cause no harm to an owner’s site or servers.

    Q: Why does Ask use website crawlers?
    A: Ask utilizes website crawlers to collect raw data and gather information that is used in building our ever-expanding search index. Crawling ensures that the information in our results is as up-to-date and relevant as it can possibly be. Our crawlers are well designed and professionally operated, providing an invaluable service that is in accordance with search industry standards.

    Q: How does the Ask crawler work?

    • The crawler goes to a Web address (URL) and downloads the HTML page.
    • The crawler follows hyperlinks from the page, which are URLs on the same site or on different sites.
    • The crawler adds new URLs to its list of URLs to be crawled. It continually repeats this function, discovering new URLs, following links, and downloading them.
    • The crawler excludes some URLs if it has downloaded a sufficient number from the Web site or if it appears that the URL might be a duplicate of another URL already downloaded.
    • The files of crawled URLs are then built into a search catalog. These URL’s are displayed as part of search results on the site powered by Ask’s search technology when a relevant match is made.

     

    Q: How frequently will the Ask Crawler download pages from my site?
    A: The crawler will download only one page at a time from your site (specifically, from your IP address). After it receives a page, it will pause a certain amount of time before downloading the next page. This delay time may range from 0.1 second to hours. The quicker your site responds to the crawler when it asks for pages, the shorter the delay.

    Q. Can I prevent Teoma/Ask search engine from showing a cached copy of my page?
    A: Yes. We obey the “noarchive” meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.

    < META NAME = “ROBOTS” CONTENT = “NOARCHIVE” >

    If you would like to specify this restriction just for Teoma/Ask, you may use “TEOMA” in place of “ROBOTS”.

    Q: Does Ask observe the Robot Exclusion Standard?
    A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to robots which parts of their site should not be visited by the robot. For more information on the RES, and the Robot Exclusion Protocol, please visit http://www.robotstxt.org/wc/exclusion.html.

    Q: Can I prevent the Ask crawler from indexing all or part of my site/URL?
    A: Yes. The Ask crawler will respect and obey commands that direct it not to index all or part of a given URL. To specify that the Ask crawler visit only pages whose paths begin with /public, include the following lines:

    # Allow only specific directories
    User-agent: Teoma
    Disallow: /
    Allow: /public

     

    Q: Where do I put my robots.txt file?
    A: Your file must be at the top level of your Web site, for example, if http://www.mysite.com is the name of your Web site, then the robots.txt file must be at http://www.mysite.com/robots.txt.

    Q: How can I tell if the Ask crawler has visited my site/URL?
    A: To determine whether the Ask crawler has visited your site, check your server logs. Specifically, you should be looking for the following user-agent string:

    User-Agent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma)

     

    Q: How can I prevent the Ask crawler from indexing my page or following links from a particular page?
    A: If you place the following command in the section of your HTML page, the Ask crawler will not index the document and, thus, it will not be placed in our search results:

    < META NAME = “ROBOTS” CONTENT = “NOINDEX” >

    The following commands tell the Ask crawler to index the document, but not follow hyperlinks from it:

    < META NAME = “ROBOTS” CONTENT = “NOFOLLOW” >

    You may set all directives OFF by using the following:

    < META NAME = “ROBOTS” CONTENT = “NONE” >

    See http://www.robotstxt.org/wc/exclusion.html#meta for more information.

    Q: Why is the Ask crawler downloading the same page on my site multiple times?
    A: Generally, the Ask crawler should only download one copy of each file from your site during a given crawl. There are two exceptions:

    • A URL may contain commands that “redirect” the crawler to a different URL. This may be done with the HTML command:
      < META HTTP-EQUIV=”REFRESH” CONTENT=”0; URL=http://www.your page address here.html” >

      or with the HTTP status codes 301 or 302. In this case the crawler downloads the second page in place of the first one. If many URLs redirect to the same page, then this second page may be downloaded many times before the crawler realizes that all these pages are duplicates.

    • An HTML page may be a “frameset.” Such a page is formed from several component pages, called “frames.” If many frameset pages contain the same frame page as components, then the component page may be downloaded many times before the crawler realizes that all these components are the same.

     

    Q: Why is the Ask crawler trying to download incorrect links from my server? Or from a server that doesn’t exist?
    A: It is a property of the Web that many links will be broken or outdated at any given time. Whenever any Web page contains a broken or outdated link to your site, or to a site that never existed or no longer exists, Ask will visit that link trying to find the Web page it references. This may cause the crawler to ask for URLs which no longer exist or which never existed, or to try to make HTTP requests on IP addresses which no longer have a Web server or never had one. The crawler is not randomly generating addresses; it is following links. This is why you may also notice activity on a machine that is not a Web server.

    Q: How did the Ask Website crawler find my URL?
    A: The Ask crawler finds pages by following links (HREF tags in HTML) from other pages. When the crawler finds a page that contains frames (i.e., it is a frameset), the crawler downloads the component frames and includes their content as part of the original page. The Ask crawler will not index the component frames as URLs themselves unless they are linked via HREF from other pages.

    Q: What types of links does the Ask crawler follow?
    A: The Ask crawler will follow HREF links, SRC links and re-directs.

    Q. Can I control the rate at which the Ask crawler visits my site?
    A. Yes. We support the “Crawl-Delay” robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.

    Q: Why has the Ask crawler not visited my URL?
    A: If the Ask crawler has not visited your URL, it is because we did not discover any link to that URL from other pages (URLs) we visited.

    Q: Does Ask crawler support HTTP compression?
    A: Yes, it does. Both HTTP client and server should support this for the HTTP compression feature to work. When supported, it lets webservers send compressed documents (compressed using gzip or other formats) instead of the actual documents. This would result in significant bandwidth savings for both the server and the client. There is a little CPU overhead at both server and client for encoding/decoding, but it is worth it. Using a popular compression method such as gzip, one could easily reduce file size by about 75%.

    Q: How do I register my site/URL with Ask so that it will be indexed?
    A: We appreciate your interest in having your site listed on Ask.com and the Ask.com search engine. Your best bet is to follow the open-format Sitemaps protocol, which Ask.com supports. Once you have prepared a sitemap for your site, add the sitemap auto-discovery directive to robots.txt, or submit the sitemap file directly to us via the ping URL. (For more information on this process, see Does Ask.com support sitemaps?) Please note that sitemap submissions do not guarantee the indexing of URLs.

    Create your Web site and set up your Web server to optimize how search engines look at your site’s content, and how they index and trigger based upon different types of search keywords. You’ll find a variety of resources online that provide tips and helpful information on how to best do this.

    Q: Why aren’t the pages the Ask crawler indexed showing up in the search results at Ask.com?
    A: If you don’t see your pages indexed in our search results, don’t be alarmed. Because we are so thorough about the quality of our index, it takes some time for us to analyze the results of a crawl and then process the results for inclusion into the database. Ask does not necessarily include every site it has crawled in its index.

    Q: Can I control the crawler request rate from Ask spider to my site?
    A: Yes. We support the “Crawl-Delay” robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.

    Q. How do I authenticate the Ask Crawler?
    A: A. User-Agent is no guarantee of authenticity as it is trivial for a malicious user to mimic the properties of the Ask Crawler. In order to properly authenticate the Ask Crawler, a round trip DNS lookup is required. This involves first taking the IP address of the Ask Crawler and performing a reverse DNS lookup ensuring that the IP address belongs to the ask.com domain. Then perform a forward DNS lookup with the host name ensuring that the resulting IP address matches the original.

    Q: Does Ask.com support sitemaps?
    A: Yes, Ask.com supports the open-format Sitemaps protocol. Once you have prepared the sitemap, add the sitemap auto-discovery directive to robots.txt as follows:

    SITEMAP: http://www.the URL of your sitemap here.xml

    The sitemap location should be the full sitemap URL. Alternatively, you can also submit your sitemap through the ping URL:

    http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml

    Please note that sitemap submissions do not guarantee the indexing of URLs. To learn more about the protocol, please visit the Sitemaps web site at http://www.sitemaps.org.

    Q: How can I add Ask.com search to my site?
    A: We’ve made this easy, you can generate the necessary code here.

    Q: How can I get additional information?
    A: Please visit our full Searchable FAQ Database.”