Lesson 2: Types and File Operations

Contents

In this lesson, we will introduce the concept of types and discuss file functions.

Executive Summary

Lifelines normally tries to write anything it encounters that has a string value to a file. Those thing are: a string literal, a function that returns a string value, or a variable that has been set with a string value.

set(varb,expr) evaluates expr and associates varb with the value of the expression. set() returns no value. Variable names may contain upper- and lower-case letters, number characters (but may not start with a number), underscores, and periods.

If you can work the advanced exercise, you probably can safely skip this lessons.

What are types?

Perhaps you have heard it said, "You can't compare apples and oranges." But of course you can. It is just a matter of deciding what the basis of the comparison is. When you go to the store perhaps apples are 79¢ per pound and oranges are 49¢ per pound. There is a comparison. No it is not a comparison of the Platonic essence of appleness and orangeness, but you are in a store with money in your hand, so you have the basis of making the pertinent comparison. But if when you get to the store, apples are 4 for $1 and oranges are 49¢ per pound. This time perhaps you cannot make the comparison you want to make without scales and a calculator. The stuff is expressed in numbers (4/$1 and 49¢/pound), but the numbers cannot be compared directly.

And so it is with digital computers. Everything is expressed in the same currency, which is sometimes called 1s and 0s, but is really something more like high/low or on/off — I will not digress into this any further. The thing is, there are not really numbers or letters or pictures or tunes in the computer, there is stuff that for want of a better term I will call "data" and it is pretty much up to users, with the help of various programs, to make a letter to an insurance company, a picture of my cat, or iPod fodder out of that data. But it is all the same stuff. To keep track of what we want a particular bunch of stuff to mean, we have data types.

In the Manual these are called value types, and now would be a good time to review the Value Types section of the Manual, (which is 2.7 in my version).

Lifelines reports have a large number of value types in comparison to some other programming and scripting languages, but many of these are special to the purposes of Lifelines. EVENT, INDI, FAM, and NODE are all peculiar to processing GEDCOM-like data. LIST and TABLE are special ways of arranging other types of data. Several of the others are simply subsets or supersets of one another. But we will start with a type that is fairly simple to understand and also very common in other kinds of programming: STRING

STRING type

Near the end of the previous lesson we learned that when Lifelines finds somthing that looks like a string in a procedure, it will try to write that string to a file (and will prompt you for a filename if it does not know which file to write to). This suggests two questions immediately: What is a string? and What looks like a string to Lifelines? The second question is much easier to answer, but I will take a stab at the first one.

A string is a chunk of that data stuff which we have decided to interpret as letters and letter-sized symbols or what we call characters, which is really a better word than letters because in some languages characters are not really very much like letters. For example, @ is a character. So is #. You can have some pretty heated discussions on USENET about what the name of # is. In particular, in strings 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0 are characters. You cannot add the character 1 and the character 2 to get the character 3. They are characters (sometimes called numerals), not numbers.

Very young school children sometimes make a joke that 2 + 2 equals 22. This is not very funny to adults who never think of the problem until they start to work with computers. The number 2 plus the number 2 is the number 4. The way we usually want to add strings is called concatenation, and the string "2" concatenated with the string "2" is the string "22". We run into trouble when we try to add strings like numbers or numbers like strings (although there are specific circumstances in which we do want to do something like that).

Problems arise because there are many different ways to interpret that data stuff as characters, and the ways of doing it are called character sets. We are not getting any deeper into this here.

With the understanding that the underlying thing is data stuff, a string is 0 or more characters. The character set is the rule for translating between data stuff and characters.

Lifelines recognizes a string as a string if we put in double quotes.

In the previous lesson we got Lifelines to write Hello, world! to a file this way:


/* 
 * @progname       Hello, world!
 * @version        1.1
 * @author         YOUR NAME
 * @category       programing tutorial
 * @output         text
 * @description    Tutorial example one.
*/

proc main(){

"Hello, world!\n"

}

Lifelines recognized Hello, world! as a string because it was in double quotes. This is called a literal because the string is right there, literally between the double quotes. Well, not exactly. There is that \n thing. And how about if you want to put a double quote in a string?

Single backslashes (\) do not show up, but they give the following character a special meaning if it has one. The special meaning of n, the meaning it takes on as \n, is as the newline character in Unix-like systems (BSD, Linux, even Unix). \" means this is not the end of the string, but put a double quote character in the string here. \\ means put one backslash character here. A few other backslashed characters may do something special on your system, for example \t is the tab character, at least on Unix-type systems.

Whether any of the backslashed characters work on your non-Unix system depends upon how well Lifelines was adapted to your system. The newline character is particularly suspect. Fortunately, there are ways of writing reports that avoid this uncertainty, so you can write report programs that are likely to run on any system with Lifelines installed.

So one thing that looks like a string to Lifelines is a string literal, which is 0 or more characters enclosed in double quotes. (Yes, there is such a thing as a string that contains 0 characters. Well, no, that is actually nothing, but there is such a thing as nothing.)

Another thing that looks like a string to Lifelines is the return of a function that is supposed to return a string. Let's not get too fancy about what a function is here. Basically, you put something into a function and you get something back, and sometimes you do not even have to put something in.

One of the freebies is nl(). You do not have to put anything into it — in fact it is an error if you do put anything into it. What you get back is a string of one character (or more) that is guaranteed (as much as anything) to produce a new line, whatever system Lifelines is running on.

So let's go crazy and substitute this procedure for proc main in Hello, world!:


proc main(){
nl()
nl()
"Hello, world!"
nl()
nl()
}

Vary the number of nl() functions or even their arrangement until you are convinced that Lifelines is getting the newlines from the function. You can even put everything on one line. The newline characters in the output come from nl(), not from however you arrange the lines in proc main.

nl() provides a way of getting a new line even if you are not sure whether your report will run on a system that understands \n (or does with \n what you think \n should do). Lifelines also has qt() which returns a double quote character. Lifelines has the ability to print in any position on the page, which is much more powerful than using tab characters. So although backslashes will work to one degree or another on various systems, you can write reports without them, thereby assuring that your reports will run on any system with Lifelines.

Now for a function that have to put something into. Try this:


proc main(){nl()nl()upper("Hello, world!")nl()nl()}

The proc is hard to read but the output is not.

Moral of the story: The way you format lines in a proc doesn't matter, so do it in a way you find readable and sensible.

upper() is a function that takes a string (you have to put a string in) and returns a string. To see what happens when you do not put a string in, try this:


proc main(){nl()nl()upper(1)nl()nl()}

Wasn't pretty was it? Just as functions return values of a certain type, functions require values of a certain type (which is stretched a bit by saying there is a VOID type which is nothing at all — such as nl() takes).

You should be playing around with these examples, trying things out. If so you may discover I have lied to you a little bit. I have made a big deal about the difference between a number and a string. But if you try the example above with the number 0 instead of the number 1 Lifelines does not bitch and moan. Bracket the upper() function with qt() functions to show that upper(0) does not put out a string of even one character, but also the program does not blow up, as it does if you try upper(1).

As an exercise: if you did not write and run a report to do what was suggested in the previous paragraph, at least write out a proc that would do it.

Programming languages differ on this point, but to Lifelines all zeros are equal. The number zero, the boolean false which is 0, and the empty string ("") which is zero characters are all the same. That is why upper(0) works. Well, sort of works. It does not make Lifelines throw up. 0 is the same as "" to Lifelines. But Lifelines does have a different kind of nothing: void.

Void is the most profound kind of nothing. 0 is a number. It may mean nothing, but it is still a number and you can do numbery things with it including arithmetic operations (except dividing by it). "" is a string. It is pretty pathetic as strings go, but you can still do stringy things with it. But void is not a string or a number. It is really, no kidding nothing.

As we have seen, upper("") does not blow up Lifelines, and neither does upper(0) because 0 is the same as "" to Lifelines. But upper() — that is, nothing between the parentheses, does blow up. You sent upper a void argument, but upper requires one argument. Other functions require a void argument. We have met nl() and qt() each of which must have a void argument. They will blow up if you send them 0 or "" or something that evaluates to 0. Void means no kidding nothing, not 0.

So far that is two things that Lifelines recognizes as strings: string literals (things in double quotes), and the return of functions that are supposed to return strings. The third thing is variables which have string values.

Variables are simply names that can be given to values. A common way of doing that in Lifelines is to use the set() command. This is generally a lot easier to illustrate than to explain.

Edit Hello, world! so that proc main reads like this (and run it):


proc main(){
set(message,"Hello, world!")
nl()
nl()
message
nl()
nl()
}

set() tells Lifelines that you want to call the string "Hello, world!" by the name message. set() does not cause anything to be written to the output file. It is not really a function, and more to the point, it does not return a string (or returns void, if you want to look at it that way). But Lifelines now knows that message is a string because you gave it the value of a string, namely "Hello, world!" When Lifelines comes to message in the proc, it knows message has a string value and so it writes that string to the output file. What could possibly go wrong?

Well, one thing that can go wrong is that you can use set() to give a name to any kind of value in Lifelines. There is nothing about the name message that says it is the name of a string until you use set() to give message a string value. You could have forgotten to set message or you could have set it to a number or you could have remembered to set it but then set it to something else. Also, more complicated programs may involve several procedures and functions. Setting a variable in one of them does not necessarily result in the variable being set in the others (although you can make it so).

This should not scare you off variables because you cannot program without them. But it should make you think about using descriptive variable names to minimize the opportunities for error. For a string that will be written to output, message is a reasonable name. It is not such a reasonable name for a number. For something to be written to a file, line or page or paragraph might be better. You may think you will remember that you used x to stand for a surname and j to stand for the number of children someone had, but if you form the habit of using variable names like surname and numchildrn you will get much further.

In the above example, we used set() to give message a string value, namely the value of the string literal "Hello, world!" You could use set() to set message to a string using a function that returns a string or even using another variable that has a string variable.

As exercises: using print() verify that:


set(messageout,"HELLO, WORLD!")

and:


set(messageout,upper("Hello, world!"))

and:


set(messageup,upper("Hello, world!"))
set(messageout,messageup)

result in messageout having the same value. Do the same thing without print() to show that Lifelines recognizes the variables have string values and tries to write the values to a filename. (Watch out for the parentheses.)

Secrets of set()

Read the section of the Manual on expressions (which is 2.4 in my copy, but might be something else in your version).

We have introduced a few built in functions such as nl() and upper() without really saying what functions are (a situation we hope to remedy in the next lesson). set() looks like a function, but I think it really is not for reasons that do not really matter.

For set() to work you have to feed two things into it which are separated by a comma. The first thing has to be the name of a variable, and the second thing has to be an expression. (You did read the Manual section as advised, didn't you?)

The Manual section should make it clear what an expression is. But the Manual never says what is legal for the name of a variable. Here is a report you can use to experiment with variable names:


/*  * @progname       Variable Names Demo
    * @version        1.7
    * @author         YOUR NAME
    * @category       programing tutorial
    * @output         STDOUT
    * @description    Tutorial
 
Experiment with variable names.

*/

proc main(){


set(Message.out1,"HELLO, WORLD!")

set(Message.out,"HELLO, kitty!")


print(nl(),Message.out1,nl())

print(nl(),Message.out,nl())

}

These are my results: a variable name must start with a lower- or uppercase letter, and may contain additional lower- or uppercase letters. Variable names are case sensitive, which means date1, Date1, and DATE1 are all different names. A variable name may contain number characters, underscores, or periods. It may not contain anything that confuses Lifelines.

For example: message-out is not a legal variable name because the hyphen is also the minus sign, so Lifelines thinks this could be an expression standing for the value of message less (minus) the value of out, which would be a perfectly sensible thing if message and out were variables with number values. Obviously a variable name cannot contain a comma, because the comma is what Lifelines uses to separate things in parentheses. set(message,out,"Hello, world!") looks to Lifelines like you have given set() three things: a variable name message, a variable name out, and a string literal "Hello, world!". Yes, there is also a comma in "Hello, world!", but that does not bother Lifelines because it is inside the string literal.

Lifelines cannot permit three things in set(): set must have exactly two. With a little experimentation you should be able to see what characters can go in variable names, and as you learn more about Lifelines you may see logical reasons that some characters are not allowed. The Manual does not say and I have not discovered if there is a length limitation to variable names. Some languages do not allow variable names to be indefinately long, others allow variable names to be very long, but only consider the names to be different if they differ within the first so-many characters. You can experiment with those limits if you want to, but Lifelines gives you enough flexibility to use descriptive variable names. So use descriptive variable names.

So set() evaluates the second thing it gets. The result of evaluation is, naturally enough, a value. It then associates the first thing it got, the variable name, with that value. There are many cute little analogies for what "associates" means, so I might as well give you one. It is like set() writes the variable name on a box and puts the value in the box. After that, when Lifelines sees a variable name that matches the name on the box, it looks in the box with that name and uses the value in the box — not the box or the name on the box — for whatever it is doing. If Lifelines finds something that should be the name of a box, but there is no such box, Lifeline uses the 0 value which is the same as "". You can see this so by placing the line:


print(foo)

in a test program where foo has not been set. Since print() — print with a void argument — blows up (prove it), we know foo is not void, but is zero or the empty string.

As an advanced exercise: use print to show that set returns void, not 0 or the empty string.

[Previous | Beginning | Next]


©Copyright 2009 by Lars Eighner. Original material may be copied for personal use, but may not be sold, made available contingent on the payment of any fee or access charge and may not be bundled in any product which is sold for a fee or media charge or which requires any payment for access. In short, you cannot charge money for material I have made freely available. Software and other products mentioned may be trademarks belonging to their respective owners.

Answers to Exercises


proc main(){
nl()
qt()
upper(0)
qt()
nl()
}


/*  * @progname       Types Demo
    * @version        1.5
    * @author         YOUR NAME
    * @category       programing tutorial
    * @output         STDOUT
    * @description    Tutorial
 
To give a variable a string value you can use set() with
a string literal, a function that returns a string, or 
a variable that has a string value

*/

proc main(){

print(nl())

set(messageout,"HELLO, WORLD!")

print(messageout,nl())

set(messageout,upper("Hello, world!"))

print(messageout,nl())

set(messageup,upper("Hello, world!"))
set(messageout,messageup)

print(messageout,nl())

}


/*  * @progname       Types Demo
    * @version        1.6
    * @author         YOUR NAME
    * @category       programing tutorial
    * @output         STDOUT
    * @description    Tutorial
 
To give a variable a string value you can use set() with
a string literal, a function that returns a string, or 
a variable that has a string value

Be prepared for Lifelines to prompt you for a filename to catch the output.

*/

proc main(){

set(messageout,"HELLO, WORLD!")

messageout

nl()

set(messageout,upper("Hello, world!"))

messageout

nl()

set(messageup,upper("Hello, world!"))
set(messageout,messageup)

messageout

nl()

}

Answer to Advanced Exercise

Run a report with print(set(foo,"bar")) in it to see that it blows up. Compare with a report containing print(), which should blow up in the same. But reports with print(0) or print("") should both run.