版本4和5间的区别
于2007-07-18 08:53:13修订的的版本4
大小: 16099
编辑: czk
备注:
于2007-07-18 12:04:30修订的的版本5
大小: 17912
编辑: czk
备注:
删除的内容标记成这样。 加入的内容标记成这样。
行号 88: 行号 88:
这里需要解决如何区分文件中有效数据与输入结束符的问题。C语言采取的解决方法是:在没有输入时,getchar函数将返回一个特殊值,这个特殊值与任何实际字符都不同。这个值称为EOF(end of f11e,文件结束)。我们在声明变量C的时候,必须让它大到足以存放
getchar函数返回的任何值。这里之所以不把c声明成char类型,是因为它必须足够大,除
了能存储任何可能的字符外还要能存储文件结束符EOF。因此,我们将c声明成int类型。
行号 89: 行号 93:

EOF定义在头文件<stdio.h>中,是一个整型数。其具体数值是什么并不重要,只要它与任何char类型的值都不相同即可。这里使用符号常量,可以确保程序不需要依赖于其对应的任何特定的数值。
行号 108: 行号 114:

对于经验比较丰富的C语言程序员,可以把这个字符复制程序编写得更精炼一些。在C语言中,类似于{{{
c = getchar()
}}}之类的赋值操作是一个表达式,并且具有一个值,即赋值后左边变量保存的值。也就是说,赋值可以作为更大的表达式的一部分出现。如果将为c赋值的操作放在while循环语句的测试
部分中,上述字符复制程序使可以改写成下列形式:{{{#!cplusplus
#include <stdio.h>

/* copy input to output; 2nd version */
main()
{
    int c;

    while ((c = getchar()) != EOF)
        putchar(c);
}
}}}在该程序中,while循环语句首先读一个字符并将其赋值给c,然后测试该字符是否为文件结束标志。如果该字符不是文件结束标志,则执行while语句体,并打印该字符。随后重复执行while语句。当到达输入的结尾位置时,while循环语句终止执行,从而整个main函数执行结束。

1.5 Character Input and Output 字符输入输出

We are going to consider a family of related programs for processing character data. You will find that many programs are just expanded versions of the prototypes that we discuss here.

接下来我们看一组与字符型数据处理有关的程序。读者将会发现,许多程序只不过是这里所讨论的程序原型的扩充版本而已。

The model of input and output supported by the standard library is very simple. Text input or output, regardless of where it originates or where it goes to, is dealt with as streams of characters. A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character. It is the responsibility of the library to make each input or output stream confirm this model; the C programmer using the library need not worry about how lines are represented outside the program.

标准库提供的输入/输出模型非常简单。无论文本从何处输入,输出到何处,其输人/输出都是按照字符流的方式处理。文本流是由多行字符构成的字符序列,而每行字符则由0个或多个字符组成,行末是一个换行符。标准库负责使每个输人/输出流都能够遵守这一模型。使用标准库的C语言程序员不必关心在程序之外这些行是如何表示的。

The standard library provides several functions for reading or writing one character at a time, of which getchar and putchar are the simplest. Each time it is called, getchar reads the next input character from a text stream and returns that as its value. That is, after

   c = getchar();

the variable c contains the next character of input. The characters normally come from the keyboard; input from files is discussed in Chapter 7.

标准库提供了一次读/写一个字符的函数,其中最简单的是getchar和putchar两个函数。每次调用时,getchar函数从文本流中读入下一个输入字符,并将其作为结果值返回。也就是说,在执行语句

c = getchar()

之后,变量c中将包含输入流中的下一个字符。这种字符通常是通过键盘输入的。关于从文件输入字符的方法,我们将在第7章巾讨论。

The function putchar prints a character each time it is called:

   putchar(c);

prints the contents of the integer variable c as a character, usually on the screen. Calls to putchar and printf may be interleaved; the output will appear in the order in which the calls are made.

每次调用putchar函数时将打印一个字符。例如,语句

putchar(c);

将把整型变量c的内容以字符的形式打印出来,通常是显示在屏幕上。putchar与printf这两个函数可以交替调用,输出的次序与调用的次序一致。

1. 1.5.1 File Copying 文件复制

Given getchar and putchar, you can write a surprising amount of useful code without knowing anything more about input and output. The simplest example is a program that copies its input to its output one character at a time:

read a character
    while (charater is not end-of-file indicator)
        output the character just read
        read a character

Converting this into C gives:

   1    #include <stdio.h>
   2 
   3    /* copy input to output; 1st version  */
   4    main()
   5    {
   6        int c;
   7 
   8        c = getchar();
   9        while (c != EOF) {
  10            putchar(c);
  11            c = getchar();
  12        }
  13    }

The relational operator != means "not equal to".

借助于getchar与putchar函数,可以在不了解其他输入/输出知识的情况下编写出数量惊人的有用的代码。最简单的例子就是把输入一次一个字符地复制到输出,其基本思想如下:

read a character
    while (charater is not end-of-file indicator)
        output the character just read
        read a character

将上述基本思想转换为C语言程序为:

   1 #include <stdio.h>
   2 
   3 /* copy input to output; 1st version  */
   4 main()
   5 {
   6     int c;
   7 
   8     c = getchar();
   9     while (c != EOF) {
  10         putchar(c);
  11         c = getchar();
  12     }
  13 }

其中,关系运算符!=表示“不等于”。

What appears to be a character on the keyboard or screen is of course, like everything else, stored internally just as a bit pattern. The type char is specifically meant for storing such character data, but any integer type can be used. We used int for a subtle but important reason.

字符在键盘、屏幕或其他的任何地方无论以什么形式表现,它在机器内部都是以位模式存储的。char类型专门用于存储这种字符型数据,当然任何整型(int)也可以用于存储字符型数据。因为某些潜在的重要原因,我们在此使用int类型。

The problem is distinguishing the end of input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF, for "end of file". We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.

这里需要解决如何区分文件中有效数据与输入结束符的问题。C语言采取的解决方法是:在没有输入时,getchar函数将返回一个特殊值,这个特殊值与任何实际字符都不同。这个值称为EOF(end of f11e,文件结束)。我们在声明变量C的时候,必须让它大到足以存放 getchar函数返回的任何值。这里之所以不把c声明成char类型,是因为它必须足够大,除 了能存储任何可能的字符外还要能存储文件结束符EOF。因此,我们将c声明成int类型。

EOF is an integer defined in <stdio.h>, but the specific numeric value doesn't matter as long as it is not the same as any char value. By using the symbolic constant, we are assured that nothing in the program depends on the specific numeric value.

EOF定义在头文件<stdio.h>中,是一个整型数。其具体数值是什么并不重要,只要它与任何char类型的值都不相同即可。这里使用符号常量,可以确保程序不需要依赖于其对应的任何特定的数值。

The program for copying would be written more concisely by experienced C programmers. In C, any assignment, such as

   c = getchar();

is an expression and has a value, which is the value of the left hand side after the assignment. This means that a assignment can appear as part of a larger expression. If the assignment of a character to c is put inside the test part of a while loop, the copy program can be written this way:

   1    #include <stdio.h>
   2 
   3    /* copy input to output; 2nd version  */
   4    main()
   5    {
   6        int c;
   7 
   8        while ((c = getchar()) != EOF)
   9            putchar(c);
  10    }

The while gets a character, assigns it to c, and then tests whether the character was the end-of-file signal. If it was not, the body of the while is executed, printing the character. The while then repeats. When the end of the input is finally reached, the while terminates and so does main.

对于经验比较丰富的C语言程序员,可以把这个字符复制程序编写得更精炼一些。在C语言中,类似于

c = getchar()

之类的赋值操作是一个表达式,并且具有一个值,即赋值后左边变量保存的值。也就是说,赋值可以作为更大的表达式的一部分出现。如果将为c赋值的操作放在while循环语句的测试部分中,上述字符复制程序使可以改写成下列形式:

   1 #include <stdio.h>
   2 
   3 /* copy input to output; 2nd version  */
   4 main()
   5 {
   6     int c;
   7 
   8     while ((c = getchar()) != EOF)
   9         putchar(c);
  10 }

在该程序中,while循环语句首先读一个字符并将其赋值给c,然后测试该字符是否为文件结束标志。如果该字符不是文件结束标志,则执行while语句体,并打印该字符。随后重复执行while语句。当到达输入的结尾位置时,while循环语句终止执行,从而整个main函数执行结束。

This version centralizes the input - there is now only one reference to getchar - and shrinks the program. The resulting program is more compact, and, once the idiom is mastered, easier to read. You'll see this style often. (It's possible to get carried away and create impenetrable code, however, a tendency that we will try to curb.)

The parentheses around the assignment, within the condition are necessary. The precedence of != is higher than that of =, which means that in the absence of parentheses the relational test != would be done before the assignment =. So the statement

   c = getchar() != EOF

is equivalent to

   c = (getchar() != EOF)

This has the undesired effect of setting c to 0 or 1, depending on whether or not the call of getchar returned end of file. (More on this in Chapter 2.)

Exercise 1-6. Verify that the expression getchar() != EOF is 0 or 1.

Exercise 1-7. Write a program to print the value of EOF.

2. 1.5.2 Character Counting

The next program counts characters; it is similar to the copy program.

   1    #include <stdio.h>
   2 
   3    /* count characters in input; 1st version */
   4    main()
   5    {
   6        long nc;
   7 
   8        nc = 0;
   9        while (getchar() != EOF)
  10            ++nc;
  11        printf("%ld\n", nc);
  12    }

The statement

   ++nc;

presents a new operator, ++, which means increment by one. You could instead write nc = nc + 1 but ++nc is more concise and often more efficient. There is a corresponding operator -- to decrement by 1. The operators ++ and -- can be either prefix operators (++nc) or postfix operators (nc++); these two forms have different values in expressions, as will be shown in Chapter 2, but ++nc and nc++ both increment nc. For the moment we will will stick to the prefix form.

The character counting program accumulates its count in a long variable instead of an int. long integers are at least 32 bits. Although on some machines, int and long are the same size, on others an int is 16 bits, with a maximum value of 32767, and it would take relatively little input to overflow an int counter. The conversion specification %ld tells printf that the corresponding argument is a long integer.

It may be possible to cope with even bigger numbers by using a double (double precision float). We will also use a for statement instead of a while, to illustrate another way to write the loop.

   1     #include <stdio.h>
   2 
   3    /* count characters in input; 2nd version */
   4    main()
   5    {
   6        double nc;
   7 
   8        for (nc = 0; gechar() != EOF; ++nc)
   9            ;
  10        printf("%.0f\n", nc);
  11    }

printf uses %f for both float and double; %.0f suppresses the printing of the decimal point and the fraction part, which is zero.

The body of this for loop is empty, because all the work is done in the test and increment parts. But the grammatical rules of C require that a for statement have a body. The isolated semicolon, called a null statement, is there to satisfy that requirement. We put it on a separate line to make it visible.

Before we leave the character counting program, observe that if the input contains no characters, the while or for test fails on the very first call to getchar, and the program produces zero, the right answer. This is important. One of the nice things about while and for is that they test at the top of the loop, before proceeding with the body. If there is nothing to do, nothing is done, even if that means never going through the loop body. Programs should act intelligently when given zero-length input. The while and for statements help ensure that programs do reasonable things with boundary conditions.

3. 1.5.3 Line Counting

The next program counts input lines. As we mentioned above, the standard library ensures that an input text stream appears as a sequence of lines, each terminated by a newline. Hence, counting lines is just counting newlines:

   1    #include <stdio.h>
   2 
   3    /* count lines in input */
   4    main()
   5    {
   6        int c, nl;
   7 
   8        nl = 0;
   9        while ((c = getchar()) != EOF)
  10            if (c == '\n')
  11                ++nl;
  12        printf("%d\n", nl);
  13    }

The body of the while now consists of an if, which in turn controls the increment ++nl. The if statement tests the parenthesized condition, and if the condition is true, executes the statement (or group of statements in braces) that follows. We have again indented to show what is controlled by what.

The double equals sign == is the C notation for "is equal to" (like Pascal's single = or Fortran's .EQ.). This symbol is used to distinguish the equality test from the single = that C uses for assignment. A word of caution: newcomers to C occasionally write = when they mean ==. As we will see in Chapter 2, the result is usually a legal expression, so you will get no warning.

A character written between single quotes represents an integer value equal to the numerical value of the character in the machine's character set. This is called a character constant, although it is just another way to write a small integer. So, for example, 'A' is a character constant; in the ASCII character set its value is 65, the internal representation of the character A. Of course, 'A' is to be preferred over 65: its meaning is obvious, and it is independent of a particular character set.

The escape sequences used in string constants are also legal in character constants, so '\n' stands for the value of the newline character, which is 10 in ASCII. You should note carefully that '\n' is a single character, and in expressions is just an integer; on the other hand, '\n' is a string constant that happens to contain only one character. The topic of strings versus characters is discussed further in Chapter 2.

Exercise 1-8. Write a program to count blanks, tabs, and newlines.

Exercise 1-9. Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.

Exercise 1-10. Write a program to copy its input to its output, replacing each tab by \t, each backspace by \b, and each backslash by \\. This makes tabs and backspaces visible in an unambiguous way.

4. 1.5.4 Word Counting

The fourth in our series of useful programs counts lines, words, and characters, with the loose definition that a word is any sequence of characters that does not contain a blank, tab or newline. This is a bare-bones version of the UNIX program wc.

   1    #include <stdio.h>
   2 
   3    #define IN   1  /* inside a word */
   4    #define OUT  0  /* outside a word */
   5 
   6    /* count lines, words, and characters in input */
   7    main()
   8    {
   9        int c, nl, nw, nc, state;
  10 
  11        state = OUT;
  12        nl = nw = nc = 0;
  13        while ((c = getchar()) != EOF) {
  14            ++nc;
  15            if (c == '\n')
  16                ++nl;
  17            if (c == ' ' || c == '\n' || c = '\t')
  18                state = OUT;
  19            else if (state == OUT) {
  20                state = IN;
  21                ++nw;
  22            }
  23        }
  24        printf("%d %d %d\n", nl, nw, nc);
  25    }

Every time the program encounters the first character of a word, it counts one more word. The variable state records whether the program is currently in a word or not; initially it is "not in a word", which is assigned the value OUT. We prefer the symbolic constants IN and OUT to the literal values 1 and 0 because they make the program more readable. In a program as tiny as this, it makes little difference, but in larger programs, the increase in clarity is well worth the modest extra effort to write it this way from the beginning. You'll also find that it's easier to make extensive changes in programs where magic numbers appear only as symbolic constants.

The line

   nl = nw = nc = 0;

sets all three variables to zero. This is not a special case, but a consequence of the fact that an assignment is an expression with the value and assignments associated from right to left. It's as if we had written

   nl = (nw = (nc = 0));

The operator || means OR, so the line

   if (c == ' ' || c == '\n' || c = '\t')

says "if c is a blank or c is a newline or c is a tab". (Recall that the escape sequence \t is a visible representation of the tab character.) There is a corresponding operator && for AND; its precedence is just higher than ||. Expressions connected by && or || are evaluated left to right, and it is guaranteed that evaluation will stop as soon as the truth or falsehood is known. If c is a blank, there is no need to test whether it is a newline or tab, so these tests are not made. This isn't particularly important here, but is significant in more complicated situations, as we will soon see.

The example also shows an else, which specifies an alternative action if the condition part of an if statement is false. The general form is

   1    if (expression)
   2        statement1
   3    else
   4        statement2

One and only one of the two statements associated with an if-else is performed. If the expression is true, statement1 is executed; if not, statement2 is executed. Each statement can be a single statement or several in braces. In the word count program, the one after the else is an if that controls two statements in braces.

Exercise 1-11. How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?

Exercise 1-12. Write a program that prints its input one word per line.

TCPL/1.05_Character_Input_and_Output (2008-02-23 15:34:13由localhost编辑)

ch3n2k.com | Copyright (c) 2004-2020 czk.