8.3 Strings

The string type can be used to store a sequence of characters or binary data. By default, text is stored in the UTF-8 character encoding in strings which means that up to 4 bytes may be necessary to store one Unicode character. Strings are specified by enclosing them in double quotes. As a matter of style, you should always suffix string variables with the $ dollar sign so that a reader of your source code can easily see which variables carry strings and which carry numbers. For example:

 
a$ = "Hello World!"

This could also be written as:

 
a = "Hello World!"

But with the dollar sign at the end the code is more readable because we know that a is a string.

You can concatenate strings by using the .. operator. The code above could also be written as:

 
a$ = "Hello" .. " " .. "World!"

This will concatenate three strings into one string and write it to a$. See String concatenation for details.

If your string needs to contain a double quote, you can use escape code \" for that, e.g.:

 
; this will print Hello, "Mr. John Doe"!
DebugPrint("Hello, \"Mr. John Doe\"!")

Escape codes are always specified after one backslash character (\). If you need to put a backslash into a string, use a backslash character as the escape code (\\). The following escape sequences are supported by Hollywood:

 
\a    Ring the system bell
\b    Back space
\f    Form feed
\n    Newline character
\r    Carriage return
\t    Horizontal tab
\v    Vertical tab
\\    Backslash
\"    Double quote
\'    Single quote
\?    Question mark
\[    Square bracket open
\]    Square bracket close
\xxx  Code point

The last escape sequence allows you to insert characters directly by simply specifying their code point value after the backslash. The code point value must be specified in decimal notation only and may occupy up to three digits. Only Latin 1 code points in the range of 0 to 255 are allowed here. Every value greater than 255 will not be accepted. Using this escape sequence, you could insert a zero character in a string:

 
a$ = "Hello\0World"

In many programming languages a zero character defines the end of the string. Not so in Hollywood. Hollywood allows you to use as many zero characters as you want in your strings. All functions of the string library are zero character safe. For example, this code would return 11:

 
DebugPrint(StrLen("Hello\0World"))

However, that does not apply to functions that output text. The following example will print "Hello" because of the zero character:

 
; this will print "Hello" because a zero char terminates the string
DebugPrint("Hello\0World")

If a newline character follows a backslash, Hollywood will insert a newline character into the string also and will continue parsing the string on the next line. For example, the following two statements create the same string:

 
a$ = "Hello\nWorld!"
a$ = "Hello\
World!"

If you are using this feature, make sure the newline character is right behind the backslash. There must be no spaces/tabs between the backslash and the newline!

Another way to specify strings is to use a pair of double square brackets. This is especially useful if you have multiple lines of text that should be placed inside the string. An example:

 
a$ = [[
<HTML>
<HEAD>
<TITLE>My HTML Page</TITLE>
</HEAD>
<BODY>
<A HREF="http://www.airsoftsoftwair.de/" TARGET="_NEW">
http://www.airsoftsoftwair.de/</A>
</BODY>
</HTML>
]]

The above string initialization is equal to this code:

 
a$ = "<HTML>\n<HEAD>\n<TITLE>My HTML Page</TITLE>\n</HEAD>\n" ..
     "<BODY>\n<A HREF=\"http://www.airsoftsoftwair.de/\"" ..
     " TARGET=\"_NEW\">http://www.airsoftsoftwair.de/</A>\n" ..
     "</BODY>\n</HTML>\n"

You see that the first version is much more readable. So if you want to use multiple line strings, it is advised to use the [[...]] version. If a newline character follows after the initial [[ then this newline is ignored. Carriage return characters ('\r') are never included inside the long string. Every line break inside the long string will be converted to just a linefeed character ('\n'). You can also freely use double quotes in a string delimited by [[...]]. That is another advantage.

You can also store raw binary data in strings. For example, the DownloadFile() function can be used to download a file directly into a string. When using binary data inside strings, you have to be careful when calling functions of the string library. Functions of the string library normally expect valid UTF-8 data within the strings that are passed to them. Obviously, this won't be the case when you use strings as containers for raw binary data. To make strings containing raw binary data work with the functions of the string library as well, you need to explicitly tell those functions not to interpret the string data as UTF-8. This is done by passing the special character encoding constant #ENCODING_RAW in the optional encoding parameter most of the string library functions accept. Then the string library functions can also be used with strings containing raw binary data. See Character encodings for details.

Finally, there is no string length limit. Strings can be as large as system memory permits but when storing large amounts of data inside a string you should take some care and set the string to Nil when you no longer need it so that the garbage collector knows that it can free the memory allocated for this string. Consider the following example:

 
data$ = DownloadFile("http://www.airsoftsoftwair.de/images/" ..
                     "products/hollywood/47_shot1.jpg")
...do something with data$...
data$ = Nil

This will download the file at the specified URL and store the binary data in data$. Once the binary data in data$ has been processed, data$ is set to Nil to tell the garbage collector that it can release the memory occupied by data$. This is very important because otherwise it could happen that your script constantly consumes more memory.


Show TOC