Character Limit

Wistful Dream · May 07, 2012, 12:11:30 PM

I know its 65,300 some or something like that. However I have a post that is skirting near there, just at 62k ish and its cutting off the end of it. I don't know why >.< Help?

jouzinka · May 07, 2012, 12:16:11 PM

It may be because it's those 65K and some _including_ BB code markup, so if it's something heavily styled, you may exceed...

Avis habilis · May 07, 2012, 12:18:00 PM

It's 65535 bytes because of the database on the back end, if I remember correctly. Ordinarily that means 65535 characters, but characters that take more than one byte will reduce the number.

Wistful Dream · May 07, 2012, 12:40:07 PM

It is heavily coded. And there are characters that take more then one byte?

jouzinka · May 07, 2012, 01:11:35 PM

Accents - ä é î etc... ;-)

Vekseid · May 07, 2012, 01:25:27 PM

Yes.

The quote character (") in particular takes up six bytes. Enter takes up six. The ampersand (&) takes up five. Less than (<) and greater than (>) four each. There are a few other standard encoded characters, plus any non-ISO-8859-1 character is also so encoded.

So character count as the software sees it to stick in the database won't be the same as the character count as you might see in whatever editor you're using. The actual limit is 65,535 bytes, not characters, and escape sequences can be up to nine or so bytes long for e.g. some rare Chinese characters.

Sticking with ISO-8859-1 over UTF-8 there is a security and performance consideration, but even switching to UTF-8 isn't going to get rid of the three most common offenders there.

The new software stores all strings in compressed form, so the above will become much less of an issue - most English text tends to average to about ~two bits per character or so when compressed.

Oniya · May 07, 2012, 01:34:21 PM

Quote from: Vekseid on May 07, 2012, 01:25:27 PM
Yes.

The quote character (") in particular takes up six bytes.

Hmmm, yet another reason I can give for using 'Brit quotes'. (Really, I'm just lazy with the shift key.)

Vekseid · May 07, 2012, 01:39:20 PM

Those also take six. >_>

Wistful Dream · May 07, 2012, 01:49:55 PM

Ah well thank you for explaining :) I found a way to do what I wanted around that, interesting to learn.

Haibane · May 09, 2012, 09:52:13 AM

Quote from: Oniya on May 07, 2012, 01:34:21 PM
Hmmm, yet another reason I can give for using 'Brit quotes'. (Really, I'm just lazy with the shift key.)

Ooh, that's curious. I've never heard them called that before.

Double quotes and single quotes have two distinct grammatical functions here, the same as I assume they do in written American English. Definitely not interchangable as far as I'm aware.

Vekseid · May 09, 2012, 10:02:16 AM

Well, they have functions in HTML, which means they tend to get escaped by some overzealous programs like SMF here.

So " gets encoded as "
' gets encoded as '
& gets encoded as &
Return gets encoded as

Code Select

<br />

etc.

Beguile's Mistress · May 09, 2012, 10:04:28 AM

Does that mean that " would use up 5 characters of the maximum character limit?

Oreo · May 09, 2012, 10:09:11 AM

No, it uses up six.

Vekseid · May 09, 2012, 10:09:46 AM

Six, as mentioned. " is six characters.

New CMS uses compression for all meaningful strings, so most characters end up taking up about a quarter of a byte, unless you're writing in a particularly vast character space for some reason.

Beguile's Mistress · May 09, 2012, 10:12:12 AM

Okay.

I wrongly assumed the ; was punctuation.

Thank you.

Wistful Dream · May 09, 2012, 10:13:44 AM

Does this affect signatures too? Never thought of that before...

Vekseid · May 09, 2012, 10:17:42 AM

It shouldn't. Those get checked before they get processed.

Oniya · May 09, 2012, 11:36:35 AM

Quote from: Haibane on May 09, 2012, 09:52:13 AM
Ooh, that's curious. I've never heard them called that before.

Double quotes and single quotes have two distinct grammatical functions here, the same as I assume they do in written American English. Definitely not interchangable as far as I'm aware.

I call them that because I see them used in place of " in every British-published piece of text I've ever bought to set off dialogue. Quotes inside quotes in British-published works use the ", where American-published works use '. (It's probably a colonial-rebellion thing, like all those spelling differences.

)

Elliquiy Role Playing Forums

News:

Character Limit

Wistful Dream

jouzinka

Avis habilis

Wistful Dream

jouzinka

Vekseid

Oniya

Vekseid

Wistful Dream

Haibane

Vekseid

Beguile's Mistress

Oreo

Vekseid

Beguile's Mistress

Wistful Dream

Vekseid

Oniya