# post HTML character entities



## Redwald (May 12, 2006)

I'm having trouble using HTML character entities when posting to the
boards.  They just plain don't seem to work.

How do I accomplish the following?

“foo”

Touché!

mother—in—law (wrong kind of dash, but who cares...)

People who have no idea what I'm talking about probably can't help me, but
here's an official reference to what the heck I'm talking about.

http://www.w3.org/TR/REC-html40/sgml/entities.html


----------



## BSF (May 12, 2006)

Generally speaking you cannot use HTML on the forum.  Take a look at the bbcode reference for the bbcode that replicates many of the functions for HTML.  Down toward the bottom of the page is a box titled "Posting Rules".  Look there to see if the subforum allows HTML, vB code, smilies and imbedded images.  That is also where you can access the bbcode reference for EN World.


----------



## Redwald (May 12, 2006)

BardStephenFox said:
			
		

> Generally speaking you cannot use HTML on the forum.  Take a look at the bbcode reference for the bbcode that replicates many of the functions for HTML.  Down toward the bottom of the page is a box titled "Posting Rules".  Look there to see if the subforum allows HTML, vB code, smilies and imbedded images.  That is also where you can access the bbcode reference for EN World.




I'd seen the bbcode (vB code?) reference before; it appears to have no substitute for character entities.

The forum where I had been trying to post allows vB code, smilies, and images, but not HTML.  Bummer.  Seems an oversight on the part of the programmer, as HTML character entity support is far less prone to annoying side effects than unbalanced HTML tags.

(Well, with the exception of ‏, which can have funny consequences on browsers that implement it wrongly as opposed to just ignoring it, but I imagine the odds of inadvertent layout damage due to that are vanishingly small.)

Thanks for the quick reply.


----------



## BSF (May 12, 2006)

I think the concerns on HTML are inline cross-linking/cross-scripting issues.  It would be possible to practically hijack a board by aggressive cross-linking threads.  This would be bad for a board like EN World if somebody inlined a porn site/board in the middle of a thread.  Yeah, the mods could clean it up.  But until somebody did, it would look like EN World was hosting that content.


----------



## Redwald (May 13, 2006)

BardStephenFox said:
			
		

> I think the concerns on HTML are inline cross-linking/cross-scripting issues.  It would be possible to practically hijack a board by aggressive cross-linking threads.  This would be bad for a board like EN World if somebody inlined a porn site/board in the middle of a thread.  Yeah, the mods could clean it up.  But until somebody did, it would look like EN World was hosting that content.




Right.  I wasn't asking for full HTML support though, just for HTML character entities.  One of the web browsers I use doesn't render non-ASCII characters if they're literally inlined.

I'm not aware of a cross-site scripting vulnerability involving the use of character entities.  You may be thinking of the issues raised in Unicode Technical Report #36: Unicode Security Considerations (section 2, "Visual Security Issues"), but prohibiting the usage of HTML character entities while still permitting literal inlining of Unicode characters doesn't really do anything to guard against this kind of attack.

E.g., if you can read the following:

ｗｗｗ.ｇｏｏｇｌｅ.ｃｏｍ

and if the following is  visible, looks the same as the above, and is a hyperlink:

http://ｗｗｗ.ｇｏｏｇｌｅ.ｃｏｍ/

Then neither EN World's forum software nor your browser is guarding you from this issue, and the lack of support for HTML character entities seems more like a vB design oversight than a deliberate security measure.

(For me, using Mozilla Firefox on Linux, the plain text Google lookalike displays normally, but my browser has replaced the fullwidth Latin characters in the hyperlinked text with decimal HTML character entities as recommeded in UTR #36, referenced above.)

I offer all of this for the sake of discussion and enlightenment, not because this is a problem I expect the EN World admins to fix.  It's a limitation in the underlying software, so I reckon I'll just have to deal with it, or patch the other browser I use to pass through non-ASCII as is.  Maybe there's a configuration option for it—I'll poke through the docs.

I think the two-word answer to my question, for now at least, is “you can't.”


----------



## Jdvn1 (May 13, 2006)

é

é

í

Seems to work for me.

Oh, just don't used the HTML-characters. Use the actual characters.


----------



## Redwald (May 13, 2006)

Jdvn1 said:
			
		

> é
> 
> é
> 
> ...




Right.  That's why I said:



			
				Redwald said:
			
		

> One of the web browsers I use doesn't render non-ASCII characters if they're literally inlined.




The Content-Type header that EN World's software (vBulletin) produces is ISO-8859-1, but this is not an encoding capable of expressing much of Unicode.  The other web browser I use appears to be getting confused when it encounters codepoints outside of that.  (Interestingly, UTF-8 that is interpreted as ISO-8859-1 has a very characteristic appearance, and that's not what I see in Mozilla Firefox or this other browser.  Apparently the software is trying to be smart but only partially succeeding.)

Anyway, all of these issues can basically be ignored if people stick to ASCII and use HTML character entities for everything else -- hence my original request for how to achieve that.


----------



## Jdvn1 (May 13, 2006)

Sorry, I get lost in the lingo sometimes.

Then, yeah. "You can't."



You can go back and fill in those letters, but that could be very tedious if there are a lot of them. I suppose if you were any good at that sort of thing, you could make a script to convert it... copy/paste into notepad first, or something.


----------



## hong (May 13, 2006)

I never even knew there was an "MS Gothic" font before this.


----------



## Bront (May 13, 2006)

hong said:
			
		

> I never even knew there was an "MS Gothic" font before this.



Imagine Bill Gates with piercings and tattoos.


----------



## Jdvn1 (May 14, 2006)

Bront said:
			
		

> Imagine Bill Gates with piercings and tattoos.



 No thanks.


----------



## Redwald (May 22, 2006)

*I found the problem.*

Okay, I've figured out what's going on.

EN World emits the following HTTP header for its pages (at least in the
message boards):


```
Content-Type: text/html; charset=ISO-8859-1
```

However, the actual character set used is CP1252, a.k.a. "Code Page 1252",
also known as
Windows-1252.

This is not the same character set as ISO-8859-1.

The reason this works for most of the people most of the time is that like
practically every successful Western character encoding since 1968,
Windows-1252 is a superset of ASCII.  That is, the first 128 codepoints of
Windows-1252 are identical to ASCII (which only has 128 codepoints, being a
7-bit standard).

It would be helpful to people's browsers if EN World would issue a correct
charset attribute in the HTTP Content-Type header.  Is this possible?

If not, I can work around it by manually overriding the character set used to
render the page in my browser.  Kinda tedious for me, but I recognize that
I may be a minority of one as EN World users go.

For a lot more about this issue, see the following:
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html


----------



## Redwald (May 22, 2006)

*Testing "Windows" characters.*

euro €
left and right single quotes ‘ ’
left and right double quotes “ ”
middle dot •
en dash –
em dash —

Yup, this works for me.  I have to finagle things both in my text editor and
my web browser, but I can get it to work.  That's good.


----------

