SiteMap: Google sitemaps in MODx
From MODx Wiki
This article is meant to provide information on what is a machine-readable sitemap and how to create one for a MODx running site. It does not concern human-readable site maps -- a document containing a list of all documents on a site that are accessible to a visitor.
I am the author of a SiteMap snippet which creates a Google sitemap for a site running MODx. Recently, a request for a step-by-step procedure of creating a sitemap document was posted in the support thread on MODx forums. Apparently, the instructions I provided were not sufficient. Thus I decided to prepare more thorough documentation for the snippet and shortly describe the sitemaps. -- grad
Contents
|
What are sitemaps?
Machine-readable sitemaps provide automated user agents - search engines bots or spiders crawling the web - with information about content and structure of the site. In fact sitemaps can be thought of as another form of a feed (like RSS) whose target are various Internet services.
What differs machine-readable sitemaps from human-readable ones?
- The document format - the sitemaps provided for bots most often use XML format or plain text while those intended for human visitors use (X)HTML. In consequence the sitemaps for bots contain only the content and structure. They do not contain presentation and behavior layers.
- The scope - the sitemaps intended for bots most often contain more types of content than their counterparts provided for humans - (X)HTML documents, various feeds, and other sitemaps.
From this point by term sitemaps we will refer only to those intended for automated user agents.
What is the purpose of sitemaps?
Sitemaps are to provide information understandable for Internet services such as search engines.
What do they need it for? To get the content the site provides and the meta-information describing it. That is not only what the document is about, but also when it was last modified, what is the average frequency of modification, what type of document is it, what type of information it provides, what are its relations to other documents (its place in the structure) and similar.
Variations of sitemaps
There are a few different variations of sitemaps:
- Google sitemaps - see Google sitemap (Wikipedia) for a description.
On 2006-11-15 Microsoft, Google and Yahoo! announced they unite to support sitemaps based on the Sitemap Protocol. From now on, the primary source of information for Sitemap Protocol is www.sitemaps.org. -- grad
- Resources of a Resource - see Resources of a Resource (Wikipedia) for a description.
I tried to use it but it seems to be suspended. The team does not respond to e-mails and the site is not updated. -- grad
- Various applications of Resource Description Framework (Wikipedia)
- Others?
Step-by-step: How to create a Google sitemap in MODx using SiteMap snippet?
To create a Google sitemap document for a site running MODx one may use SiteMap snippet. At the current stage of development it can output a sitemap in two formats:
- Sitemap Protocol - the native format for Google sitemaps (although Google accepts different formats).
- A text file containing a list of URLs.
Create sitemap.xml document and call the snippet
We will concentrate on Sitemap Protocol:
- Create a document in the root level of your document tree (it does not to sit in the root level but it has to be available from the root level of the site). Use blank document template (an empty one) and do not use Rich Text Editor.
- Set the content type to text/xml.
- Set alias to sitemap.xml.
- In the document content textarea place a snippet call:
[!SiteMap? &format=`sp`!]
- Save and publish the document.
That's all! You just published your Google sitemap. Your document should be accessible from the URL http://www.yourdomain.com/sitemap.xml.
Now, take a look at it. It is a pure XML so it may be not easily readable. Try examining the source of the page, it should help. You should notice that it lists all documents from your site, including special pages like 404 error page, login pages, search results and so on.
You definitely don't want to bother Google crawler with them. You would rather want them to remain hidden. To achieve it you have to take a look at SiteMap parameters.
Exclude unwanted documents
There are two ways to exclude documents from the sitemap:
- Excluding document templates.
- Using a template variable.
Both rules apply simultaneously! That means that if you don't exclude the document by template it can be excluded by the template variable. Also it makes no sense to attach this template variable to templates that are to be excluded from the sitemap.
You can also exclude Weblinks entirely.
By template
Documents may be excluded by document template they use. This is most suitable for documents that are to be excluded permanently. Use &excludeTemplates parameter with a comma-separated list of templates.
In the example below all documents using one of templates blank, empty, and hidden will be excluded from sitemap:
[!SiteMap? &format=`sp` &excludeTemplates=`blank, empty, hidden`!]
From version 1.0.8, you can also specify template IDs in this list.
[!SiteMap? &format=`sp` &excludeTemplates=`blank, empty, hidden, 3, 4`!]
You may want to use IDs instead of names to prevent the sitemap exclusion being lost if a template is renamed.
By template variable
Documents may also be excluded by setting a value to a template variable. This is most suitable for excluding documents temporarily. Use &excludeTV to set per-document flag.
In the example below all documents with template variable sitemap_exclude set to 1 will be excluded from the sitemap:
[!SiteMap? &format=`sp` &excludeTV=`sitemap_exclude`!]
Excluding weblinks
(version 1.0.6+)
The optional excludeWeblinks parameter (boolean) excludes weblinks from the sitemap, since they often point to external sites (which don't belong on your sitemap), or redirecting to other internal pages (which are already in the sitemap). Google Webmaster Tools generates warnings for excessive redirects.
[!SiteMap? &excludeWeblinks=`1` !]
Set document priority and change frequency
Priority
The Google sitemap protocol allows setting the document priority relative to other documents on the site. The lowest priority is 0.0 and the highest is 1.0. The default and neutral value is 0.5.
This value has no effect on your pages compared to pages on other sites, and only lets the search engines know which of your pages you deem most important so they can order the crawl of your pages in the way you would most like. -- Using the sitemap protocol
Setting priority is optional. Use &priority parameter to set it for a document.
In the example below each document in the sitemap will get a priority set in sitemap_priority template variable:
[!SiteMap? &format=`sp` &priority=`sitemap_priority`!]
Change frequency
The sitemap protocol allows setting a frequency on which a document is likely to change. As with priority it is rather a hint, than a command. It will only suggest the Google bot on the frequency of recrawling the page.
Even though search engine crawlers consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. It is also likely that crawlers will periodically crawl pages marked "never" so that they can handle unexpected changes to those pages. -- Using the sitemap protocol
Setting change frequency is optional. Use &changefreq parameter to set it for a document.
In the example below each document in the sitemap will get a change frequency set in sitemap_changefreq template variable:
[!SiteMap? &format=`sp` &changefreq=`sitemap_changefreq`!]
Use sitemap.xsl to examine the sitemap document in a browser
The sitemap document contains XML that is rather hard to read in a browser. You might want to examine it to check results. The simplest way to get a readable form is to view the source -- the SiteMap snippet produces well-formatted code.
The other way is to use a XSL stylesheet. If you take a look at the source of your sitemap document you will notice a link to sitemap.xsl on line 2:
<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?>
All you have to do is to provide sitemap.xsl document. Once you do you will be able to view your nicely formatted sitemap in a browser. Here is a step-by-step procedure:
- First, go to enarion.net and download it (the latest version is 1.5a).
- Next, login to manager and duplicate the sitemap.xml.
- Change document alias to sitemap.xsl.
- Paste the contents of the downloaded file into the document content filed. Remember not to use Rich Text Editor.
- Save and publish the newly created document.
- It's done.
Now, you should be able to see a nice table with links, last modification date, change frequency and priority using a browser.
If you use Mozilla/FireFox you may have to add a line to .htaccess file:
# Mozilla requires this to handle XSL stylesheets
AddType application/xml .xsl
SiteMap snippet
Outpust a machine readable site map for search engines and robots in Google's Sitemap Protocol (XML) and URL list (TXT) formats.
Details
- Author
- Grzegorz Adamiak (grad)
- Version
- 1.0.6 @ 2008-02-29 15:03:04
- License
- LGPL
- MODx
- 0.9.2.1
- Download from MODx repository
- http://modxcms.com/sitemap-711.html
- Support thread in MODx forums
- http://modxcms.com/forums/index.php?topic=5754
History
- # 1.0.6
- - Optional parameter to exclude weblink from sitemap
- # 1.0.5
- - Non-searchable documents now excluded from sitemap
- # 1.0.4
- - Added option to display as HTML list (ul/li)
- # 1.0.3
- - Added ability to specify a URL for the XSL
- # 1.0.2
- - Reworked fetching of template variable value to get INHERITED value.
- # 1.0.1
- - Reworked fetching of template variable value, now it gets computed value instead of nominal; however, still not the inherited value.
- # 1.0
- - First public release.
TODO
I'm not sure at the moment which way it will go...
Parameters
- &format
- Allows to set the sitemap output format.
- Accepted values
- sp - The snippet outpust a Google sitemap in Sitemap Protocol format (XML).
- txt - The snippet outputs a list of URLs in plain text.
- Default value
- sp - Google sitemap.
- &startid
- Allows to decide which part of the site will be covered by the sitemap.
- Accepted values
- integer - ID of the document from which the sitemap is to be created.
- Default value
- 0 - The site root. The sitemap will include documents from the whole site.
- &priority
- Allows to set the relative priority for each document on the site (only for the Sitemap Protocol).
- Accepted values
- string - The name of the template varialbe.
- Default value
- sitemap_priority - Not used if the template variable specified will not be found.
- Example of template variable
- Input type: Dropdown list.
- Input option values:
5==1.0||4==0.7||3==0.5||2==0.3||1==0.0
You may use more or less values if you wish. - Default value: 0.5
- &changefreq
- Allows to set the change frequency for each document on the site (only for the Sitemap Protocol).
- Accepted values
- string - The name of the template variable.
- Default value
- sitemap_changefreq - Not used if the template variable specified will not be found.
- Example of template variable
- Input type: Dropdown list.
- Input option values:
Always==always||Hourly==hourly||Daily==daily||Weekly==weekly||Monthly==monthly||Yearly==yearly||Never==never
- Default value: monthly seems reasonable.
- &excludeTemplates
- Allows to exclude documents that use templates specified.
- Accepted values
- string - Comma-separated list of document template names.
- Default value
- null - No document are excluded.
- &excludeTV
- Allows to exclude documents with use of a template variable. Setting the value for this template variable to 1 will exclude the document from the sitemap. This setting is independet of &excludeTemplates.
- Accepted values
- string - The name of the template variable.
- Default value
- sitemap_exclude - Not used if the template variable specified will not be found.
- Example of template variable
- Input type: Dropdown list.
- Input option values:
Include==0||Exclude==1
- Default value: 0.
- &excludeWeblinks
- Exclude weblinks from the sitemap.
- Accepts a boolean value
Example calls
[!SiteMap!]Will
- include all published documents from the whole site;
- will exclude documents with the template variable named sitemap_exclude set to 1;
- will use Sitemap Protocol format (XML);
- and will set change frequency and priority for documents if template variables with default names (sitemap_changefreq and sitemap_priority) exist.
[!SiteMap? &format=`txt` &stardid=`28`!]Will
- output a plain text list
- with URLs of all published documents
- being descendants of document with ID 28.
[!SiteMap? &excludeTemplates=`blank, hidden` &excludeTV=`hide`!]Will
- output a sitemap in Sitemap Procotol format;
- will exclude documents using blank and hidden templates;
- will exclude documents with the template variable named hide set to 1;
- will include documents from the whole site;
- and will set change frequency and priority for documents if template variables with default names (sitemap_changefreq and sitemap_priority) exist.
