How to produce XHTML 1.0 Strict markup with TinyMCE

June 17, 2009
Development

TinyMCE is a WYSIWYG rich-text editor implemented with Javascript. Many web developers embed TinyMCE within content management systems to provide clients an easy way to create and edit content. In fact, we use TinyMCE in our own content management system. However, it is dangerous to provide users a powerful content editing tool like TinyMCE without adequate training. Invalid markup added to TinyMCE can destroy an otherwise perfect website.

I lose countless development hours correcting invalid markup created within TinyMCE. A majority of the invalid markup is created when users copy and paste text from Microsoft Word or Microsoft Office; users unknowingly paste an inordinate amount of hidden Microsoft meta-data into TinyMCE that causes rendering errors in various web browsers (the current stable version of TinyMCE does not properly remove this meta-data even if using TinyMCE's Paste from Word feature). This is not the user's fault. It is unreasonable to expect a user to know how to write valid XHTML. I needed a solution using TinyMCE to convert user input into valid XHTML 1.0 Strict markup. After hours of research, here is what I found.

Project Setup

For this tutorial, I assume you are using the following directory structure:

[PROJECT ROOT]
-->index.html
-->styles.css
-->js/

Install TinyMCE

  1. Download TinyMCE to your computer
  2. Extract the TinyMCE ZIP archive into this project's js/ directory.
  3. The final directory structure should look like this:
[PROJECT ROOT]
-->index.html
-->styles.css
-->js/
---->tinymce_3_2_4_1/
------>jscripts/
-------->tiny_mce/

When you unzip the TinyMCE file, the TinyMCE directory may be named differently than shown above.

Prepare the HTML File

We will start with simple HTML and CSS files. index.html uses the XHTML 1.0 Strict doctype. It has a simple form with a textarea. styles.css contains some simple styles for TinyMCE. We will reference this CSS file later. To save time, go ahead and grab the HTML source code and CSS styles. Paste the HTML source code into index.html. Paste the CSS styles into styles.css.

Initialize TinyMCE

First, add this line into the HEAD of index.html to import the TinyMCE library.

<script src="js/tinymce_3_2_4_1/jscripts/tiny_mce/tiny_mce.js" type="text/javascript"></script>

Be sure the path to tiny_mce.js is accurate based on the TinyMCE file you downloaded and extracted. Next, we need to initialize TinyMCE. Add this code immediately below the aforementioned code.

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced"
});
</script>

This will convert the textarea within index.html into a TinyMCE editor. You can view your progress by opening index.html in a web browser. If you are using the Safari web browser on Mac OS X, use this code instead. Be sure the Safari plugin is loaded for all subsequent examples, too.

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	plugins:"safari"
});
</script>

Our Objectives

  1. Ensure client input is converted into XHTML 1.0 Strict markup
  2. Remove unused classes from markup
  3. Remove empty HTML elements**
  4. Remove Microsoft meta-data
  5. Encode HTML entities (<,>,&)

**We do not want to remove all empty elements. Blank div elements are sometime used to place dynamic content. In this tutorial we will only remove empty p, em, and strong elements.

Objective 1: XHTML 1.0 Strict Markup

First, we tell the TinyMCE editor to use the XHTML 1.0 Strict doctype. To do so, we use the doctype parameter when initializing TinyMCE. The TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>"
});
</script>

The doctype parameter accepts a string value that is the entire doctype you wish to use. We also need to define a white list of valid XHTML elements and attributes. Any elements and attributes not defined in our white list should be removed from the TinyMCE editor. We tell the TinyMCE editor what elements and attributes are valid by setting the valid_elements parameter to a string. This string should adhere to an expected syntax. The syntax is beyond the point of this article, but you can read more about this syntax on the TinyMCE Wiki. I found a preset XHTML white list on TinyMCE's Wiki. I tweaked this preset white list to ensure it conformed to the Strict doctype. My modified white list should be added to the TinyMCE initialization code like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements : ""
	+"a[accesskey|charset|class|coords|dir<ltr?rtl|href|hreflang|id|lang|name"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|rel|rev"
	  +"|shape<circle?default?poly?rect|style|tabindex|title|type],"
	+"abbr[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"acronym[class|dir<ltr?rtl|id|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"address[class|align|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"area[accesskey|alt|class|coords|dir<ltr?rtl|href|id|lang|nohref<nohref"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup"
	  +"|shape<circle?default?poly?rect|style|tabindex|title|target],"
	+"base[href|target],"
	+"basefont[color|face|id|size],"
	+"bdo[class|dir<ltr?rtl|id|lang|style|title],"
	+"big[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"blockquote[cite|class|dir<ltr?rtl|id|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title],"
	+"body[class|dir<ltr?rtl|id|lang|link|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onload|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|onunload|style|title],"
	+"br[class|id|style|title],"
	+"button[accesskey|class|dir<ltr?rtl|disabled<disabled|id|lang|name|onblur"
	  +"|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|tabindex|title|type"
	  +"|value],"
	+"caption[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"cite[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"code[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"col[align<center?char?justify?left?right|char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|span|style|title"
	  +"|valign<baseline?bottom?middle?top|width],"
	+"colgroup[align<center?char?justify?left?right|char|charoff|class|dir<ltr?rtl"
	  +"|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|span|style|title"
	  +"|valign<baseline?bottom?middle?top|width],"
	+"dd[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"del[cite|class|datetime|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"dfn[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"dir[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"div[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"dl[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"dt[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"em/i[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"embed[height|src|type|width|class|contenteditable|contextmenu|dir|draggable|id|irrelevant|lang"
	+"|ref|registrationmark|tabindex|template|title|onabort|onbeforeunload|onblur|onchange|onclick|oncontextmenu"
	+"|ondblclick|ondrag|ondragend|ondragcenter|ondragleave|ondragover|ondragstart|ondrop|onerror|onfocus|onkeydown"
	+"|onkeypress|onkeyup|onload|onmessage|onmousedown|onmousemove|onmouseover|onmouseout|onmouseup|onmousewheel|onresize"
	+"|onscroll|onselect|onsubmit|onunload],"
	+"fieldset[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"form[accept|accept-charset|action|class|dir<ltr?rtl|enctype|id|lang"
	  +"|method<get?post|name|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onreset|onsubmit"
	  +"|style|title],"
	+"h1[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h2[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h3[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h4[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h5[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h6[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"head[dir<ltr?rtl|lang|profile],"
	+"html[dir<ltr?rtl|lang|version],"
	+"img[alt=''|class|dir<ltr?rtl|height"
	  +"|id|ismap<ismap|lang|longdesc|name|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|src|style|title|usemap|width],"
	+"input[accept|accesskey|alt"
	  +"|checked<checked|class|dir<ltr?rtl|disabled<disabled|id|ismap<ismap|lang"
	  +"|maxlength|name|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onselect"
	  +"|readonly<readonly|size|src|style|tabindex|title"
	  +"|type<button?checkbox?file?hidden?image?password?radio?reset?submit?text"
	  +"|usemap|value],"
	+"ins[cite|class|datetime|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"isindex[class|dir<ltr?rtl|id|lang|prompt|style|title],"
	+"kbd[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"label[accesskey|class|dir<ltr?rtl|for|id|lang|onblur|onclick|ondblclick"
	  +"|onfocus|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title],"
	+"legend[accesskey|class|dir<ltr?rtl|id|lang"
	  +"|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"li[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title|type"
	  +"|value],"
	+"link[charset|class|dir<ltr?rtl|href|hreflang|id|lang|media|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rel|rev|style|title|type],"
	+"map[class|dir<ltr?rtl|id|lang|name|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"meta[content|dir<ltr?rtl|http-equiv|lang|name|scheme],"
	+"noscript[class|dir<ltr?rtl|id|lang|style|title],"
	+"object[archive|class|classid"
	  +"|codebase|codetype|data|declare|dir<ltr?rtl|height|id|lang|name"
	  +"|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|standby|style|tabindex|title|type|usemap"
	  +"|width],"
	+"ol[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|start|style|title|type],"
	+"optgroup[class|dir<ltr?rtl|disabled<disabled|id|label|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"option[class|dir<ltr?rtl|disabled<disabled|id|label|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|selected<selected|style|title|value],"
	+"-p[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"param[id|name|type|value|valuetype<DATA?OBJECT?REF],"
	+"pre/listing/plaintext/xmp[align|class|dir<ltr?rtl|id|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title|width],"
	+"q[cite|class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"s[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"samp[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"script[charset|defer|language|src|type],"
	+"select[class|dir<ltr?rtl|disabled<disabled|id|lang|multiple<multiple|name"
	  +"|onblur|onchange|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|size|style"
	  +"|tabindex|title],"
	+"small[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"span[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"strike[class|class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"strong/b[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"style[dir<ltr?rtl|lang|media|title|type],"
	+"sub[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"sup[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"table[bgcolor|border|cellpadding|cellspacing|class"
	  +"|dir<ltr?rtl|frame|height|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|rules"
	  +"|style|summary|title|width],"
	+"tbody[char|class|charoff|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"td[abbr|axis|bgcolor|char|charoff|class"
	  +"|colspan|dir<ltr?rtl|headers|height|id|lang|nowrap<nowrap|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rowspan|scope<col?colgroup?row?rowgroup"
	  +"|style|title|valign<baseline?bottom?middle?top|width],"
	+"textarea[accesskey|class|cols|dir<ltr?rtl|disabled<disabled|id|lang|name"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onselect"
	  +"|readonly<readonly|rows|style|tabindex|title],"
	+"tfoot[char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"th[abbr|axis|bgcolor|char|charoff|class"
	  +"|colspan|dir<ltr?rtl|headers|height|id|lang|nowrap<nowrap|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rowspan|scope<col?colgroup?row?rowgroup"
	  +"|style|title|valign<baseline?bottom?middle?top|width],"
	+"thead[char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"title[dir<ltr?rtl|lang],"
	+"tr[abbr|bgcolor|char|charoff|class"
	  +"|rowspan|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title|valign<baseline?bottom?middle?top],"
	+"tt[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"u[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"ul[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title|type],"
	+"var[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title]"
});
</script>

For you XHTML 1.0 Strict evangelists, I did include embed in the white list. Technically, embed is not included in the XHTML 1.0 Strict DTD. However, embed is necessary to provide support for older browsers. Also, embed is available in the HTML 5 DTD. We now have a fully functional TinyMCE editor that enforces XHTML 1.0 Strict markup (with the one exception). However, we still have four remaining objectives.

Objective 2: Remove unused classes from markup

It is recommended that you provide a CSS file to style the contents of the TinyMCE editor. By styling the contents of the TinyMCE editor, the user will see what the final styled code will look like while using TinyMCE. We tell TinyMCE to use our CSS file by specifying the content_css parameter during TinyMCE initialization. This code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css"
});
</script>

NOTE: the valid_elements value is omitted for the sake of brevity. Be sure you still specify the valid elements in your own code! The value of content_css is a path to your CSS file relative to the current HTML file. Next, we tell TinyMCE to remove all unused classes from the client's markup that are not present in our CSS file. This code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true
});
</script>

For example, if a client provided this markup:

<div class="myClass">...</div>

And our CSS file did NOT include the CSS class selector .myClass{}, then TinyMCE would remove the unused class and output this code:

<div>...</div>

Objective 3: Remove empty HTML elements

We will now remove empty p, em, and strong elements from the TinyMCE editor with the cleanup_callback parameter. This parameter's value is the name of a custom Javascript function. Let's call this function myCustomCleanup and define it now.

function myCustomCleanup(type,value){}

This function accepts two parameters: the type of callback (ignored in this tutorial, but you can read more about this parameter on the TinyMCE Wiki), and the final HTML markup of the TinyMCE editor. Let's further define the implementation of the myCustomCleanup function.

function myCustomCleanup(type,value){
	var value = value + ""; //Ensure value is a string
	return value.replace(/<(p|em|strong)(>|[^>]*>)(\\s)*<\\/\\1>/ig,"");		
}

This function uses a Regular Expression to remove all empty p, em, and strong elements from the TinyMCE editor's markup. We tell TinyMCE to call our custom cleanup method during TinyMCE initialization. Our new TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true,
	cleanup_callback : "myCustomCleanup"
});
</script>

Objective 4: Remove Microsoft Meta-data

From my own experience, most Microsoft Word and Microsoft Office meta-data is contained within HTML comments. We will define a new function to remove HTML comments from the TinyMCE editor markup. This function looks like this:

//Citation: http://www.faqts.com/knowledge_base/view.phtml/aid/21761/fid/53
function removeHtmlComments(source){
	var html = source + ""; //Ensure source is a string
	var regX = /<(?:!(?:--[\\s\\S]*?--\\s*)?(>)\\s*|(?:script|style|SCRIPT|STYLE)[\\s\\S]*?<\\/(?:script|style|SCRIPT|STYLE)>)/g;
	return html.replace(regX, function(m,\$1){ return \$1?'':m; });
}

This method accepts the final HTML markup from the TinyMCE editor and removes all single and multi-line HTML comments except those within script and style elements. We now add this function to our myCustomCleanup method so it is called by TinyMCE. Our new myCustomCleanup method looks like this:

function myCustomCleanup(type,value){
	var value = value + ""; //Ensure value is a string		
	value = value.replace(/<(p|em|strong)(>|[^>]*>)(\\s)*<\\/\\1>/ig,"");
	return removeHtmlComments(value);
}

Note: You can force TinyMCE to run the cleanup_callback function at any time by clicking the Broom icon in the TinyMCE editor toolbar.

Objective 5: Encode HTML Entities

Last, we ensure characters like <, >, and & are encoded into their HTML entity equivalents. For example, & will become &. To do this, we specify the entity_encoding parameter during TinyMCE initialization. Our TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true,
	cleanup_callback : "myCustomCleanup",
	entity_encoding : "named"
});
</script>

The Final Result

We now have a TinyMCE installation that produces valid XHTML 1.0 Strict markup regardless of client input. It also removes unused classes from markup, deletes empty HTML elements, strips Microsoft meta-data, and encodes HTML entities! I hope this tutorial gets you moving in the right direction. This tutorial is not meant to be a final implementation. I am still tweaking this implementation to produce even better results. If you see room for improvement, kindly let me know by posting a comment. I'd love to hear your feedback.

Download a ZIP archive containing the final code

Comments

Tim's avatar
Tim
Hi, how can tinymce be edited to produce xhtml strict links as target _blank, _self are now obsolete? Would like it to open links in a new window using onclick="window.open(this.href); return false;"
Ed's avatar
Ed
Your cleanup function should use this regex:

value.replace/<(p|em|strong)(>|[^>]*>)\s*<\/\1>/ig,"")
oxpeople's avatar
oxpeople
Both:

value.replace/<(p|em|strong)(>|[^>]*>)(\\s)*<\\/\\1>/ig,"");

var regX = /<(?:!(?:--[\\s\\S]*?--\\s*)?(>)\\s*|(?:script|style|SCRIPT|STYLE)[\\s\\S]*?<\\/(?:script|style|SCRIPT|STYLE)>)/g;


have syntax error in Java script
any suggestion for solve this.
Geoff's avatar
Geoff
I'm using tinymce and i have a major problem where I'm using the following code in order to control the x browser experience...







When I edited the page that has that code in, with tinyMCE I get the following out put...







any ideas how I stop tinymce from doing that ?

Geoff.
Keny Lieou's avatar
Keny Lieou
Thanks
Tess's avatar
Tess
wow thanks. awesome
Joe Yoyoda's avatar
Joe Yoyoda

thanks

Will's avatar
Will

Awesome. I am now working on a wordpress plugin which will add this functionality. :)

Jon's avatar
Jon

Two words:

1) Thanks
2) Genius

 Pablo's avatar
Pablo

Good article. I wondering what about the class="mceNonEditable", It would be deleded since that class is not set in any style sheet. How can you avoid that happen?

Thank you

 Jeremiah Zielonkiewicz's avatar
Jeremiah Zielonkiewicz

Wow, man this is a salvation to me.
I was reading the T-MCE Docs for ages but still as I can see it took a hell lot of work to make MS Stuff don't mess up markup on pages.
Thanks to your work wa all can benefit, if I find additional things to do with T-MCE will post addtional comments.
For now, bookmarking, and will bare you in mind.

 Josh Lockhart's avatar
Josh Lockhart

@Chris I have experience with both TinyMCE and FCKEditor. I feel TinyMCE works best for me at the current time, but I realize both are probably the two leading candidates for Javascript-based WYSIWYG editors. However, neither is perfect. I am glad you are able to use this for your work. Also, keep an eye on CKEditor (FCKEditor v3.0) at http://ckeditor.com/ which is currently in beta. It will be coming soon and boasts some pretty cool features. For now at least, it's TinyMCE for me. Thanks for your feedback!"

 Chris Duesing's avatar
Chris Duesing

I abandoned TinyMCE in favor of FCK edit for this very reason. I am back to TinyMCE because it is bundled with Java's Richfaces. Now I don't have to worry about resurrecting old bugs thanks to this fix. Great work!

Leave a comment