aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2004-04-07-party-like-it-s-1992.html
blob: 427202bc2dcb2dde0736961d5c394e72ba4db3bc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
date: "2004-04-07T21:41:10Z"
title: Party Like It's 1992
---

<p>
I've been using <a
href='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/security/security/cryptography_objects.asp'><acronym
title='Cryptographic API Component Object Model'>CAPICOM</acronym></a>
at work.  Since most <acronym
title='Component Object Model'>COM</acronym> objects are supposed to
work with <a href='http://msdn.microsoft.com/vbasic/'><acronym
title='Visual Basic'>VB</acronym></a>, the string values returned by
<acronym title='Component Object Model'>COM</acronym> functions (in my
case <a
href='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/security/security/certificate_export.asp'>CAPICOM::Certificate.Export()</a>)
have some bizarre and baroque semantics when called from C++.  One quirk
I found particularly amusing was the memory allocation behind <a
href='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/automat/htm/chap6_7isy.asp'><acronym
title='Binary STRing'>BSTR</acronym></a>s; here's what <a
href='http://blogs.gotdotnet.com/ericli/permalink.aspx/853ae05f-7610-4531-ab1b-070695e61168'>"Eric's
Complete Guide to BSTR Semantics"</a> has to say about what's 
happening under the hood for <a
href='http://msdn.microsoft.com/library/default.asp?url=/library/en-us/automat/htm/chap6_7isy.asp'><acronym
title='Binary STRing'>BSTR</acronym></a>s:
</p>

<blockquote cite='http://blogs.gotdotnet.com/ericli/permalink.aspx/853ae05f-7610-4531-ab1b-070695e61168'>
<p>
COM code uses the BSTR to store a Unicode string, short for "Basic
String". (So called because this method of storing strings was developed
for OLE Automation, which was at the time motivated by the development
  of the Visual Basic language engine.)
</p>

<p>...</p>

<p>
<ol>
<li>If you write a function which takes an argument of type BSTR then
you are required to accept NULL as a valid BSTR and treat it the same as
a pointer to a zero-length BSTR.  COM uses this convention, as does
Visual Basic and VBScript, so if you want to play well with others you
have to obey this convention.  If a string variable in VB happens to be
an empty string then VB might pass it as NULL or as a zero-length buffer
-- it is entirely dependent on the internal workings of the VB
program.</li>
<li>BSTRs are always allocated and freed with SysAllocString, SysAllocStringLen, SysFreeString and so on.  The underlying memory is cached by the operating system and it is a serious, heap-corrupting error to call "free" or "delete" on a BSTR.  Similarly it is also an error to allocate a buffer with "malloc" or "new" and cast it to a BSTR.  <u>Internal operating system code makes assumptions about the layout in memory of a BSTR</u> which you should not attempt to simulate.</li>
<li>The number of characters in a BSTR is fixed.  A ten-byte BSTR contains five Unicode characters, end of story.</li>
<li>
<p>A BSTR always points to the first valid character in the buffer.
This is not legal:</p>

<pre>
<code>
BSTR bstrName = SysAllocString(L"John Doe");
BSTR bstrLast = &amp;bstrName[5]; // ERROR
</code>
</pre>

<p>
bstrLast is not a legal BSTR
</p>
</li>
</ol>
</p>

<p>....</p>

<p>
When you call SysAllocString(L"ABCDE") the operating system actually allocates sixteen bytes.  <u>The first four bytes are a 32 bit integer representing the number of valid bytes in the string</u> -- initialized to ten in this case.  The next ten bytes belong to the caller and are filled in with the data passed in to the allocator.  <u>The final two bytes are filled in with zeros</u>. You are then given a pointer to the data, not to the header.
</p>
</blockquote>

<p>(Emphasis is mine)</p>

<p>
Strings with a length prefix <em>and</em> a double-NULL suffix.  Now
that's what I call <em>efficient</em> use of memory!  Seriously though,
this is like some sort of programming time warp; it reminds me of both
the Pascal-induced single-byte length prefix strings the 
<a href='http://developer.apple.com/macos/'>Mac Toolbox</a>
calls used and the associated (and equally wacky) string-conversion
functions.
 Ah, history.
</p>