Character Limit

Started by Wistful Dream, May 07, 2012, 12:11:30 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Wistful Dream

I know its 65,300 some or something like that. However I have a post that is skirting near there, just at 62k ish and its cutting off the end of it. I don't know why >.< Help?

jouzinka

It may be because it's those 65K and some _including_ BB code markup, so if it's something heavily styled, you may exceed...
Story status: Not Available
Life Status: Just keep swimming...
Working on: N/A

Avis habilis

It's 65535 bytes because of the database on the back end, if I remember correctly. Ordinarily that means 65535 characters, but characters that take more than one byte will reduce the number.

Wistful Dream

It is heavily coded. And there are characters that take more then one byte?

jouzinka

Accents - ä é î etc... ;-)
Story status: Not Available
Life Status: Just keep swimming...
Working on: N/A

Vekseid

Yes.

The quote character (") in particular takes up six bytes. Enter takes up six. The ampersand (&) takes up five. Less than (<) and greater than (>) four each. There are a few other standard encoded characters, plus any non-ISO-8859-1 character is also so encoded.

So character count as the software sees it to stick in the database won't be the same as the character count as you might see in whatever editor you're using. The actual limit is 65,535 bytes, not characters, and escape sequences can be up to nine or so bytes long for e.g. some rare Chinese characters.

Sticking with ISO-8859-1 over UTF-8 there is a security and performance consideration, but even switching to UTF-8 isn't going to get rid of the three most common offenders there.

The new software stores all strings in compressed form, so the above will become much less of an issue - most English text tends to average to about ~two bits per character or so when compressed.

Oniya

Quote from: Vekseid on May 07, 2012, 01:25:27 PM
Yes.

The quote character (") in particular takes up six bytes.

Hmmm, yet another reason I can give for using 'Brit quotes'.  (Really, I'm just lazy with the shift key.)
"Language was invented for one reason, boys - to woo women.~*~*~Don't think it's all been done before
And in that endeavor, laziness will not do." ~*~*~*~*~*~*~*~*~*~*~Don't think we're never gonna win this war
Robin Williams-Dead Poets Society ~*~*~*~*~*~*~*~*~*~*~*~*~*~Don't think your world's gonna fall apart
I do have a cause, though.  It's obscenity.  I'm for it.  - Tom Lehrer~*~All you need is your beautiful heart
O/O's Updated 5/11/21 - A/A's - Current Status! - Writing a novel - all draws for Fool of Fire up!
Requests updated March 17

Vekseid

Those also take six. >_>

Wistful Dream

Ah well thank you for explaining :) I found a way to do what I wanted around that, interesting to learn.

Haibane

Quote from: Oniya on May 07, 2012, 01:34:21 PM
Hmmm, yet another reason I can give for using 'Brit quotes'.  (Really, I'm just lazy with the shift key.)
Ooh, that's curious. I've never heard them called that before.

Double quotes and single quotes have two distinct grammatical functions here, the same as I assume they do in written American English. Definitely not interchangable as far as I'm aware.

Vekseid

Well, they have functions in HTML, which means they tend to get escaped by some overzealous programs like SMF here.

So " gets encoded as &quot;
' gets encoded as '
& gets encoded as &amp;
Return gets encoded as
<br />

etc.

Beguile's Mistress

Does that mean that " would use up 5 characters of the maximum character limit?

Oreo


She led me to safety in a forest of green, and showed my stale eyes some sights never seen.
She spins magic and moonlight in her meadows and streams, and seeks deep inside me,
and touches my dreams. - Harry Chapin

Vekseid

Six, as mentioned. &quot; is six characters.

New CMS uses compression for all meaningful strings, so most characters end up taking up about a quarter of a byte, unless you're writing in a particularly vast character space for some reason.

Beguile's Mistress

Okay. :-)  I wrongly assumed the ; was punctuation.

Thank you.

Wistful Dream

Does this affect signatures too? Never thought of that before...

Vekseid

It shouldn't. Those get checked before they get processed.

Oniya

Quote from: Haibane on May 09, 2012, 09:52:13 AM
Ooh, that's curious. I've never heard them called that before.

Double quotes and single quotes have two distinct grammatical functions here, the same as I assume they do in written American English. Definitely not interchangable as far as I'm aware.

I call them that because I see them used in place of " in every British-published piece of text I've ever bought to set off dialogue.  Quotes inside quotes in British-published works use the ", where American-published works use '.  (It's probably a colonial-rebellion thing, like all those spelling differences.  ;) )
"Language was invented for one reason, boys - to woo women.~*~*~Don't think it's all been done before
And in that endeavor, laziness will not do." ~*~*~*~*~*~*~*~*~*~*~Don't think we're never gonna win this war
Robin Williams-Dead Poets Society ~*~*~*~*~*~*~*~*~*~*~*~*~*~Don't think your world's gonna fall apart
I do have a cause, though.  It's obscenity.  I'm for it.  - Tom Lehrer~*~All you need is your beautiful heart
O/O's Updated 5/11/21 - A/A's - Current Status! - Writing a novel - all draws for Fool of Fire up!
Requests updated March 17