Closed Thread Icon

Topic awaiting preservation: String size in bytes? (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=28654" title="Pages that link to Topic awaiting preservation: String size in bytes? (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: String size in bytes? <span class="small">(Page 1 of 1)</span>\

 
H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-20-2006 05:40

Hi All,

I'm tring to estimate the bandwidth that I will use in sending a few emails. I have the html I am going to send with the email stored in a php string.

Is there anyway i can calculate the size in bytes that will be sent, effectively the bandwidth that will be used. When you recieve an email I think the size depends on the charset used, so I can probably roughly calculate that, but is that actually what the email is sending?

I'm not sure if it encodes the string before it leaves my mail server or after or what. Any ideas?


I have tried mb_strlen(mb_encode_mimeheader($newsletterRichText,'ISO-8859-1', 'Q')) which seems to give me roughly the size of the email, just not sure if i'm going about this the right way.

Skaarjj
Maniac (V) Mad Scientist

From: :morF
Insane since: May 2000

posted posted 11-20-2006 08:56

Before it leaves yur mailserver, it's already encoded in whatever character set you've specified. Headers are added (and more and more are added along its bounce-path, and if it's scanned for spam or viruses at the other end, so on...), one of which specifies the character encoding, and it's sent out via an SMTP connection, server-to-server.

Now, you can account for the headers you put on it, for the character encoding you specify, and so on, and from that you can calculate a ball-park figure of how big it will be, but I don't think you'll ever be able to be exact about it. There's just too many variables, most of which you can't ready lock down data about.

If it was, for example, in UTF-8, you could say:

code:
$size = ($messagelength + $headers) *16;



But that's about as precise as you could get, I think. But I could be wrong.


Justice 4 Pat Richard

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 11-20-2006 13:19

Think you're missing a / 2, Skarrjj .

yeah... I'd simply send a mail to myself, remove any received headers,
and take whatever is left as an approximate upper bound on the outgoing bandwith.
(You can't calculate the actual bandwidth. It even depends on whether you're sending
multiple mails to a single mailserver, or each mail requiring a new connection, etc,
you never know what tcp retransmissions you'll need, etc).

Skaarjj
Maniac (V) Mad Scientist

From: :morF
Insane since: May 2000

posted posted 11-21-2006 00:30

Uhh... yeah, I am.

Duh. I just gave him the size in bits.

code:
$size = ($messagelength + $headers) *($utf8size / 8);




Justice 4 Pat Richard

H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-22-2006 01:52

Hmm,

So UTF8 has 16 bits per character?

Is the method I used in the first post correct since it takes into account the charset?

code:
mb_strlen(mb_encode_mimeheader($newsletterRichText,'ISO-8859-1', 'Q'))

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 11-22-2006 08:24

No it doesn't!
Sorry if you got that impression.

Iso-8859-1 has 8 bits per character.

Unicode assigns each character a code point.

UTF-8 is a method to map those codepoints to bits.
It encodes ASCII characters into 8 bits, extended ascii often
to 16, and will use even more bits for an 'exotic' character.

poi
Paranoid (IV) Inmate

From: Norway
Insane since: Jun 2002

posted posted 11-22-2006 08:29

UTF-8 uses 1 to 4 bytes per character.

To be more precise:

quote:
UTF-8 uses one to four bytes (strictly, octets) per character, depending on the Unicode symbol. Only one byte is needed to encode the 128 US-ASCII characters (Unicode range U+0000 to U+007F). Two bytes are needed for Latin letters with diacritics and for characters from Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac and Thaana alphabets (Unicode range U+0080 to U+07FF). Three bytes are needed for the rest of the Basic Multilingual Plane (which contains virtually all characters in common use). Four bytes are needed for characters in other planes of Unicode.

source: Wikipedia: UTF-8

Skaarjj
Maniac (V) Mad Scientist

From: :morF
Insane since: May 2000

posted posted 11-22-2006 10:08

Yeah, after my exams, and a hectic couple of weeks, I am officially braindead. I'll leave the hard-line pronouncements to the living, I think.


Justice 4 Pat Richard

H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-23-2006 07:26

Hmm yeah i thought so, so is there a function that will calculate the size based on the UTF-8 or whichever encoding I use per character?

I don't have a problem understanding that each can use 1 - 4 bytes, but how to actually find a function that does it was the biggest problem.

So your code above

code:
$size = ($messagelength + $headers) *($utf8size / 8);



really needs to get the byte value of each character in the string and add them together for the total.


On a side note, lets say i past some text into a windows .txt (notepad) document. What encoding does that use? Using my method above gives me roughly the same size as if i pasted the data into a .txt file.


So overall, if i copy the source of an email into a .txt file, is that size what I should be working out to use as an estimate before i send the email?

H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-23-2006 07:35

Here is what i actually have in a test.

Html string length + Text only string length = 698 characters WITHOUT headers

Outlook express says the email is 3Kb
Pasting the email source into a .txt file makes a 2,565 byte file.

Deleting my html and text only bits from the text file leaves it 1,274 bytes
Of that about 94 bytes were added to the headers by my antivirus.



So if those sizes relate to eachother, this would suggest most of the characters are 1 byte, and some are 2.

So in theory if i can get the bytesize of those 698 characters, and add say 1,300bytes on for the header, i should be pretty close?

H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-23-2006 07:48

From what I can tell I should only have byte characters, so the content should take 698 bytes.

Actually if I only paste my HTML & Text content into a text document i get 1,388 bytes. I'm probably comparing apples with oranges here.

H][RO
Paranoid (IV) Inmate

From: Australia
Insane since: Oct 2002

posted posted 11-23-2006 07:48

From what I can tell I should only have byte characters, so the content should take 698 bytes.

Actually if I only paste my HTML & Text content into a text document i get 1,388 bytes. I'm probably comparing apples with oranges here.




Help? :P

« BackwardsOnwards »

Show Forum Drop Down Menu