1.5 Character Input and Output 字符输入输出

We are going to consider a family of related programs for processing character data. You will find that many programs are just expanded versions of the prototypes that we discuss here.

接下来我们看一组与字符型数据处理有关的程序。读者将会发现,许多程序只不过是这里所讨论的程序原型的扩充版本而已。

The model of input and output supported by the standard library is very simple. Text input or output, regardless of where it originates or where it goes to, is dealt with as streams of characters. A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character. It is the responsibility of the library to make each input or output stream confirm this model; the C programmer using the library need not worry about how lines are represented outside the program.

标准库提供的输入/输出模型非常简单。无论文本从何处输入,输出到何处,其输人/输出都是按照字符流的方式处理。文本流是由多行字符构成的字符序列,而每行字符则由0个或多个字符组成,行末是一个换行符。标准库负责使每个输人/输出流都能够遵守这一模型。使用标准库的C语言程序员不必关心在程序之外这些行是如何表示的。

The standard library provides several functions for reading or writing one character at a time, of which getchar and putchar are the simplest. Each time it is called, getchar reads the next input character from a text stream and returns that as its value. That is, after

   c = getchar();

the variable c contains the next character of input. The characters normally come from the keyboard; input from files is discussed in Chapter 7.

标准库提供了一次读/写一个字符的函数,其中最简单的是getchar和putchar两个函数。每次调用时,getchar函数从文本流中读入下一个输入字符,并将其作为结果值返回。也就是说,在执行语句

c = getchar()

之后,变量c中将包含输入流中的下一个字符。这种字符通常是通过键盘输入的。关于从文件输入字符的方法,我们将在第7章巾讨论。

The function putchar prints a character each time it is called:

   putchar(c);

prints the contents of the integer variable c as a character, usually on the screen. Calls to putchar and printf may be interleaved; the output will appear in the order in which the calls are made.

每次调用putchar函数时将打印一个字符。例如,语句

putchar(c);

将把整型变量c的内容以字符的形式打印出来,通常是显示在屏幕上。putchar与printf这两个函数可以交替调用,输出的次序与调用的次序一致。

Include(^A Tutorial Introduction/1.05 Character Input and Output/.*, ,titlesonly, editlink)

1. 1.5.2 Character Counting

The next program counts characters; it is similar to the copy program.

   1    #include <stdio.h>
   2 
   3    /* count characters in input; 1st version */
   4    main()
   5    {
   6        long nc;
   7 
   8        nc = 0;
   9        while (getchar() != EOF)
  10            ++nc;
  11        printf("%ld\n", nc);
  12    }

The statement

   ++nc;

presents a new operator, ++, which means increment by one. You could instead write nc = nc + 1 but ++nc is more concise and often more efficient. There is a corresponding operator -- to decrement by 1. The operators ++ and -- can be either prefix operators (++nc) or postfix operators (nc++); these two forms have different values in expressions, as will be shown in Chapter 2, but ++nc and nc++ both increment nc. For the moment we will will stick to the prefix form.

The character counting program accumulates its count in a long variable instead of an int. long integers are at least 32 bits. Although on some machines, int and long are the same size, on others an int is 16 bits, with a maximum value of 32767, and it would take relatively little input to overflow an int counter. The conversion specification %ld tells printf that the corresponding argument is a long integer.

It may be possible to cope with even bigger numbers by using a double (double precision float). We will also use a for statement instead of a while, to illustrate another way to write the loop.

   1     #include <stdio.h>
   2 
   3    /* count characters in input; 2nd version */
   4    main()
   5    {
   6        double nc;
   7 
   8        for (nc = 0; gechar() != EOF; ++nc)
   9            ;
  10        printf("%.0f\n", nc);
  11    }

printf uses %f for both float and double; %.0f suppresses the printing of the decimal point and the fraction part, which is zero.

The body of this for loop is empty, because all the work is done in the test and increment parts. But the grammatical rules of C require that a for statement have a body. The isolated semicolon, called a null statement, is there to satisfy that requirement. We put it on a separate line to make it visible.

Before we leave the character counting program, observe that if the input contains no characters, the while or for test fails on the very first call to getchar, and the program produces zero, the right answer. This is important. One of the nice things about while and for is that they test at the top of the loop, before proceeding with the body. If there is nothing to do, nothing is done, even if that means never going through the loop body. Programs should act intelligently when given zero-length input. The while and for statements help ensure that programs do reasonable things with boundary conditions.

2. 1.5.3 Line Counting

The next program counts input lines. As we mentioned above, the standard library ensures that an input text stream appears as a sequence of lines, each terminated by a newline. Hence, counting lines is just counting newlines:

   1    #include <stdio.h>
   2 
   3    /* count lines in input */
   4    main()
   5    {
   6        int c, nl;
   7 
   8        nl = 0;
   9        while ((c = getchar()) != EOF)
  10            if (c == '\n')
  11                ++nl;
  12        printf("%d\n", nl);
  13    }

The body of the while now consists of an if, which in turn controls the increment ++nl. The if statement tests the parenthesized condition, and if the condition is true, executes the statement (or group of statements in braces) that follows. We have again indented to show what is controlled by what.

The double equals sign == is the C notation for "is equal to" (like Pascal's single = or Fortran's .EQ.). This symbol is used to distinguish the equality test from the single = that C uses for assignment. A word of caution: newcomers to C occasionally write = when they mean ==. As we will see in Chapter 2, the result is usually a legal expression, so you will get no warning.

A character written between single quotes represents an integer value equal to the numerical value of the character in the machine's character set. This is called a character constant, although it is just another way to write a small integer. So, for example, 'A' is a character constant; in the ASCII character set its value is 65, the internal representation of the character A. Of course, 'A' is to be preferred over 65: its meaning is obvious, and it is independent of a particular character set.

The escape sequences used in string constants are also legal in character constants, so '\n' stands for the value of the newline character, which is 10 in ASCII. You should note carefully that '\n' is a single character, and in expressions is just an integer; on the other hand, '\n' is a string constant that happens to contain only one character. The topic of strings versus characters is discussed further in Chapter 2.

Exercise 1-8. Write a program to count blanks, tabs, and newlines.

Exercise 1-9. Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.

Exercise 1-10. Write a program to copy its input to its output, replacing each tab by \t, each backspace by \b, and each backslash by \\. This makes tabs and backspaces visible in an unambiguous way.

3. 1.5.4 Word Counting

The fourth in our series of useful programs counts lines, words, and characters, with the loose definition that a word is any sequence of characters that does not contain a blank, tab or newline. This is a bare-bones version of the UNIX program wc.

   1    #include <stdio.h>
   2 
   3    #define IN   1  /* inside a word */
   4    #define OUT  0  /* outside a word */
   5 
   6    /* count lines, words, and characters in input */
   7    main()
   8    {
   9        int c, nl, nw, nc, state;
  10 
  11        state = OUT;
  12        nl = nw = nc = 0;
  13        while ((c = getchar()) != EOF) {
  14            ++nc;
  15            if (c == '\n')
  16                ++nl;
  17            if (c == ' ' || c == '\n' || c = '\t')
  18                state = OUT;
  19            else if (state == OUT) {
  20                state = IN;
  21                ++nw;
  22            }
  23        }
  24        printf("%d %d %d\n", nl, nw, nc);
  25    }

Every time the program encounters the first character of a word, it counts one more word. The variable state records whether the program is currently in a word or not; initially it is "not in a word", which is assigned the value OUT. We prefer the symbolic constants IN and OUT to the literal values 1 and 0 because they make the program more readable. In a program as tiny as this, it makes little difference, but in larger programs, the increase in clarity is well worth the modest extra effort to write it this way from the beginning. You'll also find that it's easier to make extensive changes in programs where magic numbers appear only as symbolic constants.

The line

   nl = nw = nc = 0;

sets all three variables to zero. This is not a special case, but a consequence of the fact that an assignment is an expression with the value and assignments associated from right to left. It's as if we had written

   nl = (nw = (nc = 0));

The operator || means OR, so the line

   if (c == ' ' || c == '\n' || c = '\t')

says "if c is a blank or c is a newline or c is a tab". (Recall that the escape sequence \t is a visible representation of the tab character.) There is a corresponding operator && for AND; its precedence is just higher than ||. Expressions connected by && or || are evaluated left to right, and it is guaranteed that evaluation will stop as soon as the truth or falsehood is known. If c is a blank, there is no need to test whether it is a newline or tab, so these tests are not made. This isn't particularly important here, but is significant in more complicated situations, as we will soon see.

The example also shows an else, which specifies an alternative action if the condition part of an if statement is false. The general form is

   1    if (expression)
   2        statement1
   3    else
   4        statement2

One and only one of the two statements associated with an if-else is performed. If the expression is true, statement1 is executed; if not, statement2 is executed. Each statement can be a single statement or several in braces. In the word count program, the one after the else is an if that controls two statements in braces.

Exercise 1-11. How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?

Exercise 1-12. Write a program that prints its input one word per line.

ch3n2k.com | Copyright (c) 2004-2020 czk.