Pages

Sunday, June 26, 2011

Dictionary of HTML META Tags

META tags have two possible attributes:
  • <META HTTP-EQUIV="name" CONTENT="content">
  • <META NAME="name" CONTENT="content">
META tags should be placed in the head of the HTML document, between the <HEAD> and </HEAD> tags (especially important in documents using FRAMES).

HTTP-EQUIV tags

META tags with an HTTP-EQUIV attribute are equivalent to HTTP headers. Typically, they control the action of browsers, and may be used to refine the information provided by the actual headers. Tags using this form should have an equivalent effect when specified as an HTTP header, and in some servers may be translated to actual HTTP headers automatically or by a pre-processing tool.
Note: While HTTP-EQUIV META tag appears to work properly with Netscape Navigator, other browsers may ignore them, and they are ignored by Web proxies, which are becoming more widespread. Use of the equivalent HTTP header, as supported by e.g. Apache server, is more reliable and is recommended wherever possible.
HTTP headers are defined in RFC1945 (HTTP/1.0) and RFC2068 (HTTP/1.1). Note that RFC2068 states that multiple headers with the same name may be present only if the values may be concatenated.
HTTP headers may be generated by CGI scripts, and in Apache and CERN httpd by using a side file containing metadata. Other servers may have other mechanisms to generate headers. Note that certain server-generated headers may not be overridden (such as Date), and that others are only meaningful with a non-200 status code. Using an HTTP header is preferable to using META tags, since the header will be understood by cache agents and proxies in addition to browsers, and metadata (such as PICS data) may be associated with image files, sound files, etc.
However, new HTTP headers should not be created without checking for conflict with existing ones since it is possible to interfere with server and proxy operation.

Content-Disposition

Source: RFC2183 - Specify application handler (Microsoft), e.g.
Content-Type: text/comma-separated-values
Content-Disposition: inline; filename=openinexcel.csv

Expires

Source: HTTP/1.1 (RFC2068)
The date and time after which the document should be considered expired. Controls cacheing in HTTP/1.0. In Netscape Navigator, a request for a document whose expires time has passed will generate a new network request (possibly with If-Modified-Since). An illegal Expires date, e.g. "0", is interpreted as "now". Setting Expires to 0 may thus be used to force a modification check at each visit.
Web robots may delete expired documents from a search engine, or schedule a revisit.
Dates must be given in RFC850 format, in GMT. E.g. (META tag):
<META HTTP-EQUIV="expires" CONTENT="Wed, 26 Feb 1997 08:21:57 GMT">
or (HTTP header):
Expires: Wed, 26 Feb 1997 08:21:57 GMT
In HTTP 1.0, an invalid value (such as "0") may be used to mean "immediately".
Note: While the Expires HTML META tag appears to work properly with Netscape Navigator, other browsers may ignore it, and it is ignored by Web proxies. Use of the equivalent HTTP header, as supported by e.g. Apache, is more reliable.
See also CacheNow for discussion about cache control, page expiry, etc.

Pragma

Controls cacheing in HTTP/1.0. Value must be "no-cache". Issued by browsers during a Reload request, and in a document prevents Netscape Navigator cacheing a page locally.

Content-Type

Source: HTTP/1.0 (RFC1945)
The HTTP content type may be extended to give the character set. As an HTTP/1.0 header, this unfortunately breaks older browsers. As a META tag, it causes Netscape Navigator to load the appropriate charset before displaying the page. E.g.
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-2022-JP">
It is now recommended to always use this tag, even with the previously-default charset ISO-8859-1. Failure to do so may cause display problems where, for instance, the document uses UTF-8 punctuation characters but is displayed in ISO or ASCII charsets.

Content-Script-Type

E.g.
<META HTTP-EQUIV="Content-Script-Type" CONTENT="text/javascript">
Source: HTML 4.0
Specifies the default scripting language in a document. See MIMETYPES for applicable values.

Content-Style-Type

E.g.
<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
Source: HTML 4.0
Specifies the default style sheet language for a document.

Default-Style

Source: HTML 4.0
Set the document's preferred style sheet, taken from an stylesheet specified elsewehere e.g. in a LINK element.

Content-Language

Source: HTTP/1.0, RFC1766
May be used to declare the natural language of the document. May be used by robots to categorize by language. The corresponding Accept-Language header (sent by a browser) causes a server to select an appropriate natural language document. E.g.
<META HTTP-EQUIV="Content-Language" CONTENT="en-GB">
or (HTTP header)
Content-language: en-GB
languages are specified as the pair (language-dialect); here, English-British

Refresh

Source: Netscape
Specifies a delay in seconds before the browser automatically reloads the document. Optionally, specifies an alternative URL to load. E.g.
<META HTTP-EQUIV="Refresh" CONTENT="3;URL=http://www.some.org/some.html">
 
or (HTTP header)
Refresh: 3;URL=http://www.some.org/some.html
In Netscape Navigator, has the same effect as clicking "Reload"; i.e. issues an HTTP GET with Pragma: no-cache (and If-Modified-Since header if a cached copy exists).
Note: If a script is executed which reloads the current document, the action of the Refresh tag may be undefined. (e.g. <body onLoad= "document.location='otherdoc.doc'>)

Window-target

Specifies the named window of the current page; can be used to stop a page appearing in a frame with many (not all) browsers. E.g.
<META HTTP-EQUIV="Window-target" CONTENT="_top">
 
or (HTTP header)
Window-target: _top

Ext-cache

Source: Netscape
Defines the name of an alternate cache to Netscape Navigator. E.g.
<META HTTP-EQUIV="Ext-cache" 
CONTENT="name=/some/path/index.db; instructions=User Instructions">

Set-Cookie

Sets a "cookie" in Netscape Navigator. Values with an expiry date are considered "permanent" and will be saved to disk on exit. E.g.
<META HTTP-EQUIV="Set-Cookie" 
CONTENT="cookievalue=xxx;expires=Friday, 31-Dec-99 23:59:59 GMT; path=/">

PICS-Label

Source: PICS
Platform-Independant Content rating Scheme. Typically used to declare a document's rating in terms of adult content (sex, violence, etc.) although the scheme is very flexible and may be used for other purposes.
See also the PICS HOWTO.

Cache-Control

Source: HTTP/1.1
Specifies the action of cache agents. Possible values:
  • Public - may be cached in public shared caches
  • Private - may only be cached in private cache
  • no-cache - may not be cached
  • no-store - may be cached but not archived
Note that browser action is undefined using these headers as META tags.

Vary

Source: HTTP/1.1
Specifies that alternates are available. E.g.
<META HTTP-EQUIV="Vary" CONTENT="Content-language">
or (HTTP header)
Vary: Content-language
implies that if a header Accept-Language is sent an alternate form may be selected.

Lotus

The Lotus publishing tool generates Bulletin-Date and Bulletin-Text attributes. Bulletin-Text contains a document description.

NAME attributes

META tags with a name attribute are used for other types which do not correspond to HTTP headers. Sometimes the distinction is blurred; some agents may interpret tags such as "keywords" declared as either "name" or as "http-equiv".

Robots

Source: Spidering
Controls Web robots on a per-page basis. E.g.
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
Robots may traverse this page but not index it.
Altavista supports:
  • NOINDEX prevents anything on the page from being indexed.
  • NOFOLLOW prevents the crawler from following the links on the page and indexing the linked pages.
  • NOIMAGEINDEX prevents the images on the page from being indexed but the text on the page can still be indexed.
  • NOIMAGECLICK prevents the use of links directly to the images, instead there will only be a link to the page.
Google supports a NOARCHIVE extension to this scheme to request the Google search engine from caching pages; see the Google FAQ 
See also the /robots.txtexclusion method.

Description

A short, plain language description of the document. Used by search engines to describe your document. Particularly important if your document has very little text, is a frameset, or has extensive scripts at the top. E.g.
<META NAME="description" CONTENT="Citrus fruit wholesaler.">

Keywords

Source: AltaVista, Infoseek.
Keywords used by search engines to index your document in addition to words from the title and document body. Typically used for synonyms and alternates of title words. E.g.
<META NAME="keywords" CONTENT="oranges, lemons, limes">

Author

Source: Publishing tools, e.g. Netscape Gold
Typically the unqualified author's name.

Generator

Source: Publishing tools, e.g. Netscape Gold, FrontPage, etc.
Typically the name and version number of a publishing tool used to create the page. Could be used by tool vendors to assess market penetration.

Formatter

Source: Publishing tools - Microsoft FrontPage

Classification

Source: Netscape Gold
Undefined.

Copyright

Source: Publishing tools
Typically an unqualified copyright statement.

Rating

Source: mk-metas, Weburbia (safe for kids)
Simple content rating.

VW96.ObjectType

Source: mk-metas
Based on an early version of the Dublin Core report, using a defined schema of document types such as FAQ, HOWTO.
Defined by Queen's University of Belfast; a restricted set including e.g. "Contact Information", "Image".

Dublin Core

Dublin Core Elements. See the Reference Description

HTML 4.0

The HTML 4.0 Specification is now available.

HTdig

HTdig tags. See the HTdig META page.

DC-CHEM

DC-CHEM. Chemical Metadata extensions

HTdig notification

searchBC

searchBC is a regional search engine which uses a number of common tags such as Keywords. revisit is used as a hint for scheduling revisits.

Apple META tags

Kodak

IBM

ABSTRACT, CC, ALIAS, OWNER - as used by IBM.

Page-Enter, Page-Exit, Site-Enter, Site-Exit

Source: Microsoft DHTML (Filters & Transitions)
Defines special effects transition; e.g.
<meta http-equiv="Page-Enter"
content="revealTrans(Duration=3.0,Transition=2)">
See e.g. Transitions Between Pages (Ruleweb)

SHOE

Instance-Delegate, Instance-Key - see the SHOE Project at the University of Maryland (Simple HTML Ontology Extensions)

Microsoft Word

Microsoft Word 97 supports a number of HTML META attributes in the HTML export option. Content-Type is used to set the charset, Generator is set and various other tags may optionally be set.

SIC87

1987 US SIC (Standard Industry Codes), used in Vancouver Webpages Classifieds. See US SIC Codes

RDU

The RDU Metadata Search Engine (original URL dead) listed many tags, including the following:

Other Organisations

Agent Markup Language

See the AML pages.

GeoCities

GILS

Government Information Locator Service - a US government initiative. See

IMS

See the IMS Project homepage.

Fireball

The German search engine Fireball. See the metadata page and meta-tag generator. Supports Author, Publisher, Keywords, Description plus page-topic, page-type.

Geotags

Google

  • googlebot: noarchive - do not allow google to display cached content
  • googlebot: nosnippet - do not allow google to display excerpt or cached content
  • googlebot: noindex - similar to the robots meta element
  • googlebot: nofollow

MSSmartTags

  • MSSmartTagsPreventParsing: TRUE - prevent Microsoft Smart Tags being applied to a page
See glassdog.com/smarttags, office.microsoft.com. However, it looks at the moment as if SmartTags have been abandoned.

UnSpam

An initative of unspam.com to forbid compliant robots from harvesting email addresses. Usage:
<meta name="no-email-collection" value="[link to your terms]" />
Replace the [link to your terms] with a link to your terms of use page. Alternatively you may include a link to www.unspam.com/noemailcollection 
See how_to_avoid_spambots

Miscellaneous

  • Version
  • Template
  • Operator
  • Creation
  • Host
  • Document
  • Subject
  • Build
  • Distribution - global,local, iu
  • Resource-type - document (for ALIWeb)
  • Location (geographic location; from Sympatico)
Deprecated:
  • Random Text (e.g., META NAME="Tom Jones")

Web Counts

Attributes in use counted by a Web robot here.
Also counted 3 July 97. 
IAFA Template Statistics from the ROADS project

Other Resources

Thesauri

Other METAdata

Any other META tags in use ? Please let me know

No comments:

Post a Comment