Raise your hand if you’ve ever copied and pasted text from Word into a web-based editor to update your website.
Now, raise your other hand if, after you published that page to your website, the text you pasted had weird symbols & characters, or the fonts were all screwed up.
If you only have one hand in the air, high-five it with the other hand, because you know how to paste from Word!
Okay, put your hands down, I hope nobody saw you looking silly like this.
The fact is, this is a really common issue among people who update their websites using web-based HTML editors (also called “What You See Is What You Get” or “WYSIWYG”). Today, I’m going to show you why copying & pasting text from Microsoft-based products into a web-based HTML editor can royally mess with your formatting, and how to avoid doing this in the future.
What gets copied to your clipboard
When you copy text to your clipboard from any Microsoft product (post Office ’97), you are copying all of the invisible Microsoft XML formatting that goes with the text. In other words, you’re essentially copying HTML.
When you paste that text HTML into your web-based editor, you get the whole block of it…the text, the markup, and all the inline formatting for every. single. piece. of. text.
How Microsoft Word’s HTML markup affects your web page’s formatting
This isn’t meant to be a technical article, but you should understand that the formatting for your website — the fonts, colors, backgrounds, dimensions and even placement of certain sections — are stored in an external file called a CSS (Cascading Style Sheet) file. Your web page links to that file toward the top of the page, so that any HTML elements that follow can be formatted according to the rules you set for them in the CSS file.
So, when you paste Microsoft Word’s HTML markup into the page, it overrides the formatting rules you set up in the CSS file, and the text you pasted looks like it did in Word, not what it’s supposed to look like on your web page.
4 Ways to Prevent Microsoft Word HTML markup from entering your WYSIWYG editor
- Type directly into the editor itself.
- Paste into Notepad first, deselect your text, re-select it, then paste it into the WYSIWYG editor (putting it into Notepad first strips out everything but text, so you’ll probably have to reformat your lists, bold, italics, etc.).
- Use the “Paste From Word” icon in your WYSIWYG editor toolbar instead of Ctrl+V (or Cmd +V on Mac), but be aware that this doesn’t always erase all the markup.
- Use an online Word -> Clean HTML converter like http://word2cleanhtml.com.
Character Encoding – you wanted a $, but got a €
Now your text is formatted the way it’s supposed to be, but you have weird symbols or characters that weren’t there before. This is a result of a mismatch in character encoding. What that actually means is a bit complicated to explain, so just realize you need to have both your Office documents and your web page set to the same encoding, or you will always run into these weird characters and symbols when you copy & paste.
Any Office product after version 2010 uses UTF-8 as its character encoding, which is great, because most websites are set to UTF-8 as well. Here is a good resource to learn more about character encoding, and how to change it by default on your Office applications.
More Copy/Paste Surprises
These problems aren’t limited to just Microsoft Word:
- If you copy/paste from Outlook 2007+ it include Microsoft’s Word HTML.
- If you copy/paste from a forwarded email where the original sender uses Outlook 2007+, it includes Microsoft’s Word HTML.
- Copying tables from Excel creates an actual HTML table…trust me, stripping out all the extra markup on a table will be a huge chore.
Did I miss any? Leave them in the comments!