META tags have two possible attributes:
HTTP headers are defined in RFC1945 (HTTP/1.0) and RFC2068 (HTTP/1.1). Note that RFC2068 states that multiple headers with the same name may be present only if the values may be concatenated.
HTTP headers may be generated by CGI scripts, and in Apache and CERN httpd by using a side file containing metadata. Other servers may have other mechanisms to generate headers. Note that certain server-generated headers may not be overridden (such as Date), and that others are only meaningful with a non-200 status code. Using an HTTP header is preferable to using META tags, since the header will be understood by cache agents and proxies in addition to browsers, and metadata (such as PICS data) may be associated with image files, sound files, etc.
However, new HTTP headers should not be created without checking for conflict with existing ones since it is possible to interfere with server and proxy operation.
Content-Disposition
Web robots may delete expired documents from a search engine, or schedule a revisit.
Dates must be given in RFC850 format, in GMT. E.g. (META tag):
See also CacheNow for discussion about cache control, page expiry, etc.
Pragma
Content-Style-Type
Default-Style
Set the document's preferred style sheet, taken from an stylesheet specified elsewehere e.g. in a LINK element.
Content-Language
May be used to declare the natural language of the document. May be used by robots to categorize by language. The corresponding Accept-Language header (sent by a browser) causes a server to select an appropriate natural language document. E.g.
Window-target
Specifies the named window of the current page; can be used to stop a page appearing in a frame with many (not all) browsers. E.g.
Set-Cookie
Sets a "cookie" in Netscape Navigator. Values with an expiry date are considered "permanent" and will be saved to disk on exit. E.g.
PICS-Label
Platform-Independant Content rating Scheme. Typically used to declare a document's rating in terms of adult content (sex, violence, etc.) although the scheme is very flexible and may be used for other purposes.
See also the PICS HOWTO.
Cache-Control
Specifies the action of cache agents. Possible values:
NAME attributes
Keywords
Keywords used by search engines to index your document in addition to words from the title and document body. Typically used for synonyms and alternates of title words. E.g.
Author
Typically the unqualified author's name.
Generator
Typically the name and version number of a publishing tool used to create the page. Could be used by tool vendors to assess market penetration.
Formatter
Copyright
Rating
Simple content rating.
VW96.ObjectType
Based on an early version of the Dublin Core report, using a defined schema of document types such as FAQ, HOWTO.
HTML 4.0
DC-CHEM
Other Organisations
Agent Markup Language
GeoCities
IMS
Fireball
Google
Web Counts
Other Resources
Thesauri
Other METAdata
- <META HTTP-EQUIV="name" CONTENT="content">
- <META NAME="name" CONTENT="content">
META tags should be placed in the head of the HTML document, between the <HEAD> and </HEAD> tags (especially important in documents using FRAMES).
HTTP-EQUIV tags
META tags with an HTTP-EQUIV attribute are equivalent to HTTP headers. Typically, they control the action of browsers, and may be used to refine the information provided by the actual headers. Tags using this form should have an equivalent effect when specified as an HTTP header, and in some servers may be translated to actual HTTP headers automatically or by a pre-processing tool.
Note: While HTTP-EQUIV META tag appears to work properly with Netscape Navigator, other browsers may ignore them, and they are ignored by Web proxies, which are becoming more widespread. Use of the equivalent HTTP header, as supported by e.g. Apache server, is more reliable and is recommended wherever possible.HTTP headers are defined in RFC1945 (HTTP/1.0) and RFC2068 (HTTP/1.1). Note that RFC2068 states that multiple headers with the same name may be present only if the values may be concatenated.
HTTP headers may be generated by CGI scripts, and in Apache and CERN httpd by using a side file containing metadata. Other servers may have other mechanisms to generate headers. Note that certain server-generated headers may not be overridden (such as Date), and that others are only meaningful with a non-200 status code. Using an HTTP header is preferable to using META tags, since the header will be understood by cache agents and proxies in addition to browsers, and metadata (such as PICS data) may be associated with image files, sound files, etc.
However, new HTTP headers should not be created without checking for conflict with existing ones since it is possible to interfere with server and proxy operation.
Content-Disposition
Content-Type: text/comma-separated-values
Content-Disposition: inline; filename=openinexcel.csv
Expires
The date and time after which the document should be considered expired. Controls cacheing in HTTP/1.0. In Netscape Navigator, a request for a document whose expires time has passed will generate a new network request (possibly with If-Modified-Since). An illegal Expires date, e.g. "0", is interpreted as "now". Setting Expires to 0 may thus be used to force a modification check at each visit.Web robots may delete expired documents from a search engine, or schedule a revisit.
Dates must be given in RFC850 format, in GMT. E.g. (META tag):
<META HTTP-EQUIV="expires" CONTENT="Wed, 26 Feb 1997 08:21:57 GMT">
or (HTTP header):
Expires: Wed, 26 Feb 1997 08:21:57 GMT
In HTTP 1.0, an invalid value (such as "0") may be used to mean "immediately".
Note: While the Expires HTML META tag appears to work properly with Netscape Navigator, other browsers may ignore it, and it is ignored by Web proxies. Use of the equivalent HTTP header, as supported by e.g. Apache, is more reliable.See also CacheNow for discussion about cache control, page expiry, etc.
Pragma
Controls cacheing in HTTP/1.0. Value must be "no-cache". Issued by browsers during a Reload request, and in a document prevents Netscape Navigator cacheing a page locally.
Content-Type
The HTTP content type may be extended to give the character set. As an HTTP/1.0 header, this unfortunately breaks older browsers. As a META tag, it causes Netscape Navigator to load the appropriate charset before displaying the page. E.g.<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-2022-JP">
It is now recommended to always use this tag, even with the previously-default charset ISO-8859-1. Failure to do so may cause display problems where, for instance, the document uses UTF-8 punctuation characters but is displayed in ISO or ASCII charsets.
Content-Script-Type
E.g.
<META HTTP-EQUIV="Content-Script-Type" CONTENT="text/javascript">
Specifies the default scripting language in a document. See MIMETYPES for applicable values.Content-Style-Type
E.g.
<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
Specifies the default style sheet language for a document.Default-Style
Set the document's preferred style sheet, taken from an stylesheet specified elsewehere e.g. in a LINK element.Content-Language
May be used to declare the natural language of the document. May be used by robots to categorize by language. The corresponding Accept-Language header (sent by a browser) causes a server to select an appropriate natural language document. E.g.<META HTTP-EQUIV="Content-Language" CONTENT="en-GB">
or (HTTP header)
Content-language: en-GB
languages are specified as the pair (language-dialect); here, English-British
Refresh
Specifies a delay in seconds before the browser automatically reloads the document. Optionally, specifies an alternative URL to load. E.g.<META HTTP-EQUIV="Refresh" CONTENT="3;URL=http://www.some.org/some.html">
or (HTTP header)
Refresh: 3;URL=http://www.some.org/some.html
In Netscape Navigator, has the same effect as clicking "Reload"; i.e. issues an HTTP GET with Pragma: no-cache (and If-Modified-Since header if a cached copy exists).
Note: If a script is executed which reloads the current document, the action of the Refresh tag may be undefined. (e.g. <body onLoad= "document.location='otherdoc.doc'>)Window-target
Specifies the named window of the current page; can be used to stop a page appearing in a frame with many (not all) browsers. E.g.<META HTTP-EQUIV="Window-target" CONTENT="_top">
or (HTTP header)
Window-target: _top
Ext-cache
Defines the name of an alternate cache to Netscape Navigator. E.g.<META HTTP-EQUIV="Ext-cache"
CONTENT="name=/some/path/index.db; instructions=User Instructions">
Set-Cookie
Sets a "cookie" in Netscape Navigator. Values with an expiry date are considered "permanent" and will be saved to disk on exit. E.g.<META HTTP-EQUIV="Set-Cookie"
CONTENT="cookievalue=xxx;expires=Friday, 31-Dec-99 23:59:59 GMT; path=/">
PICS-Label
Platform-Independant Content rating Scheme. Typically used to declare a document's rating in terms of adult content (sex, violence, etc.) although the scheme is very flexible and may be used for other purposes.See also the PICS HOWTO.
Cache-Control
Specifies the action of cache agents. Possible values:- Public - may be cached in public shared caches
- Private - may only be cached in private cache
- no-cache - may not be cached
- no-store - may be cached but not archived
Note that browser action is undefined using these headers as META tags.
Vary
Specifies that alternates are available. E.g.<META HTTP-EQUIV="Vary" CONTENT="Content-language">
or (HTTP header)
Vary: Content-language
implies that if a header Accept-Language is sent an alternate form may be selected.
Lotus
The Lotus publishing tool generates Bulletin-Date and Bulletin-Text attributes. Bulletin-Text contains a document description.
NAME attributes
META tags with a name attribute are used for other types which do not correspond to HTTP headers. Sometimes the distinction is blurred; some agents may interpret tags such as "keywords" declared as either "name" or as "http-equiv".
Robots
Controls Web robots on a per-page basis. E.g.<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
Robots may traverse this page but not index it.
Altavista supports:- NOINDEX prevents anything on the page from being indexed.
- NOFOLLOW prevents the crawler from following the links on the page and indexing the linked pages.
- NOIMAGEINDEX prevents the images on the page from being indexed but the text on the page can still be indexed.
- NOIMAGECLICK prevents the use of links directly to the images, instead there will only be a link to the page.
Google supports a NOARCHIVE extension to this scheme to request the Google search engine from caching pages; see the Google FAQ
See also the /robots.txtexclusion method.
See also the /robots.txtexclusion method.
Description
A short, plain language description of the document. Used by search engines to describe your document. Particularly important if your document has very little text, is a frameset, or has extensive scripts at the top. E.g.<META NAME="description" CONTENT="Citrus fruit wholesaler.">
Keywords
Keywords used by search engines to index your document in addition to words from the title and document body. Typically used for synonyms and alternates of title words. E.g.<META NAME="keywords" CONTENT="oranges, lemons, limes">
Author
Typically the unqualified author's name.Generator
Typically the name and version number of a publishing tool used to create the page. Could be used by tool vendors to assess market penetration.Formatter
Classification
Undefined.Copyright
Source: Publishing tools
Typically an unqualified copyright statement.Rating
Simple content rating.VW96.ObjectType
Based on an early version of the Dublin Core report, using a defined schema of document types such as FAQ, HOWTO.Defined by Queen's University of Belfast; a restricted set including e.g. "Contact Information", "Image".
Dublin Core
DC.TITLE, DC.CREATOR, DC.SUBJECT , DC.DESCRIPTION , DC.PUBLISHER , DC.CONTRIBUTORS , DC.DATE , DC.TYPE, DC.FORMAT , DC.IDENTIFIER, DC.SOURCE ,DC.LANGUAGE , DC.RELATION, DC.COVERAGE, DC.RIGHTS
Dublin Core Elements. See the Reference DescriptionHTML 4.0
HTdig
HTdig tags. See the HTdig META page.DC-CHEM
HTdig notification
searchBC
searchBC is a regional search engine which uses a number of common tags such as Keywords. revisit is used as a hint for scheduling revisits.
Apple META tags
Author-Corporate, Author-Personal, Author-Personal, Publisher-Email, Identifier-URL, Identifier, Coverage, Bookmark -
Kodak
IBM
Page-Enter, Page-Exit, Site-Enter, Site-Exit
Defines special effects transition; e.g.<meta http-equiv="Page-Enter"
content="revealTrans(Duration=3.0,Transition=2)">
SHOE
Instance-Delegate, Instance-Key - see the SHOE Project at the University of Maryland (Simple HTML Ontology Extensions)
Microsoft Word
Microsoft Word 97 supports a number of HTML META attributes in the HTML export option. Content-Type is used to set the charset, Generator is set and various other tags may optionally be set.
SIC87
RDU
The RDU Metadata Search Engine (original URL dead) listed many tags, including the following:
- contributor
- custodian
- custodian_contact
- custodian_contact_position
- east_bounding_coordinate
- north_bounding_coordinate
- relation
- reply-to
- south_bounding_coordinate
- west_bounding_coordinate
Other Organisations
- DMV MetaData for Mathematical Papers
- Maple Square (ex. Sympatico)
Agent Markup Language
GeoCities
GILS
Government Information Locator Service - a US government initiative. See
- Washington GILS metadata attribute set
- GILS profile (version 2)
IMS
Fireball
The German search engine Fireball. See the metadata page and meta-tag generator. Supports Author, Publisher, Keywords, Description plus page-topic, page-type.
Geotags
- Geo.Region - Geographic regions from ISO3166-2
- Geo.Placename - Free Text place name
- Geo.Position - Latitude;Longitude in decimal degrees using the WGS84 datum.
Google
- googlebot: noarchive - do not allow google to display cached content
- googlebot: nosnippet - do not allow google to display excerpt or cached content
- googlebot: noindex - similar to the robots meta element
- googlebot: nofollow
MSSmartTags
- MSSmartTagsPreventParsing: TRUE - prevent Microsoft Smart Tags being applied to a page
See glassdog.com/smarttags, office.microsoft.com. However, it looks at the moment as if SmartTags have been abandoned.
UnSpam
An initative of unspam.com to forbid compliant robots from harvesting email addresses. Usage:
<meta name="no-email-collection" value="[link to your terms]" />
Replace the [link to your terms] with a link to your terms of use page. Alternatively you may include a link to www.unspam.com/noemailcollection
See how_to_avoid_spambots
See how_to_avoid_spambots
Miscellaneous
- Version
- Template
- Operator
- Creation
- Host
- Document
- Subject
- Build
- Distribution - global,local, iu
- Resource-type - document (for ALIWeb)
- Location (geographic location; from Sympatico)
- Random Text (e.g., META NAME="Tom Jones")
Web Counts
Attributes in use counted by a Web robot here.
Also counted 3 July 97.
IAFA Template Statistics from the ROADS project
Also counted 3 July 97.
IAFA Template Statistics from the ROADS project
Other Resources
- Resource Description Framework (RDF) - a W3C specification
- The META Generator (CGI script)
- REL, REV tags
- Resource Description (Connolly)
- Metadata Architecture (TimBL, Jan 97)
- META reference by Galactus
- HTML 3.2
- META Tagging for Search Engines at stars.com
- aditudes metatags - links, some info on search engine support
- draft-musella-html-metatag-01.txt, version 2
- HTML Writers Guild FAQ
- Jaggery the Rascal's notes.
- metadata at ERIN
- report from the May 96 Indexing Workshop
- draft-daviel-metadata-reg.txt (not.)
- Dublin Core Generator at UKOLN
- Dublin Core Metadata Template at Nordic Metadata Project
- Linking Metadata ( draft-daviel-metadata-link-00.txt)
- ADAM Quick Guide to Metadata
Thesauri
- GETTY Art & Architecture Thesaurus
- GETTY Artist Names Thesaurus
- TGN - GETTY Thesaurus of Geographic Names
- TGM I US Library of Congress Thesaurus for Graphic Materials I: Subject Terms
- TGM II US Library of Congress Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms
- LIV US Library of Congress Legislative Indexing Vocabulary
- GLIN US Library of Congress Global Legal Information Network (GLIN) Thesaurus
- ISSN (International Standard Serials Number)
- LCSH (paper/CDROM)
- LCSH, (telnet)
Other METAdata
- Metadata Search Engine
- MetaWeb - the Australian metadata project at DSTC
- The Metadata Repository Service
- Meta Content Framework uwing XML (Netscape)
- MCF Specification
- MCF Vocabulary
- MCF File Spec.
- NSDI MetaData
- SHOE Project
- ROADS Project
- PICS-SE (AID)
- WebDAV proposal
- GILS (US Gov.t Information Locator Service), see also v2 differences
No comments:
Post a Comment