当前位置：文档库 › Evolving Feature-Extraction Algorithms Adapting Genetic Programming for Image Analysis

Evolving Feature-Extraction Algorithms Adapting Genetic Programming for Image Analysis

To appear in the Proceedings of the 1996 International Geoscience and Remote Sensing Symposium: Remote Sensing for a Sustainable Future, Washington: IEEE Press., 1996.

Evolving Feature-Extraction Algorithms:

Adapting Genetic Programming for Image Analysis in Geoscience and Remote Sensing Jason M. Daida*, Tommaso F. Bersano-Begey*, Steven J. Ross*, and John F. Vesecky**

*The University of Michigan, Artificial Intelligence Laboratory and the Space Physics Research Laboratory

2455 Hayward Avenue, Ann Arbor, Michigan 48109-2143

(313) 747-4581 FAX (313) 764-5137 EMAIL: daida@https://www.wendangku.net/doc/fd8763919.html,

**The University of Michigan, Department of Atmospheric, Oceanic and Space Sciences

2455 Hayward Avenue, Ann Arbor, Michigan 48109-2143

Abstract—This paper discusses a relatively new procedure in the computer-assisted design of pattern-extraction algorithms. The procedure involves the adaptation of genetic programming, a recent technique that has been used for automatic programming, for image processing and analysis. This paper summarizes several of the measures we have taken to develop two prototype systems that help a user to design pattern-extraction algorithms.

1. INTRODUCTION

The task of extracting patterns from remotely sensed imagery can be difficult, particularly if increasingly novel patterns need extraction or when using data from new imaging technologies. In such cases, off-the-shelf software—e.g., geographic image processing systems or image analysis software libraries—may be able to meet a part, but not all the specifications for a pattern-extraction problem at hand. To fulfill such specifications, the best software tool to use may end up being code that has been specifically designed for a particular problem.

To assist in the development of problem-specific pattern-extraction tools, there has existed a range of options from which one could choose. On one hand, one could choose to program such tools from scratch, usually in a high-level programming language. On the other hand, one could also program in an even higher-level macro-language (e.g., many geographic image processing systems or scientific visualization systems often feature a macro- or a scripting-language capability). In either case, success of custom-tailored, pattern-extraction tools resides not only in one’s ability to design a procedure with which to extract patterns, but also in one’s ability to program.

The overhead associated with programming can be substantial and often include such tasks as specifying software, programming, debugging, and testing, as well as contending with learning curves associated with new languages, new software libraries, or new programming tools. To reduce this overhead would be a desirable objective, since programming often represents only a means (albeit a time-consuming one) for delivering what is needed—namely, a tool for extracting a particular pattern.

In part to address this objective, previous work has included investigations of strong and weak methods in artificial intelligence. The terms “strong” and “weak” refer not to a measure of a method’s performance, but to the amount of knowledge about a given problem that a method requires. An example of a “strong” method is an expert system, which consists of a substantial portion of code that is problem-specific and a smaller portion of code that is generic. An example of a weak method is a neural network, which consists of a substantial portion of code that is largely generic, and a smaller part that is problem-specific. Either of these methods can still involve a fair amount of programming, although one could argue that these methods can reduce programming overhead. (In particular, by making programs more adaptable, more robust, and more intelligent, less programming would be required on the part of a user.)

In the past several years, there have been new techniques in weak methods that show additional promise in reducing programming overhead. This paper describes one of these techniques and highlights how this particular technique can be used for creating problem-specific pattern-extraction algorithms for use with remotely sensed image data.

2. GENETIC PROGRAMMING

This paper specifically discusses our use of genetic programming, an unsupervised technique that generates computer code. Originally introduced in 1989 by Koza [5], genetic programming has since been applied in a variety of domains, ranging from molecular biology to robotics to economics. Genetic programming has been shown to solve a number of problems that have served as benchmarks for neural nets [4, 6]. It has also been shown to solve problems in which neural nets might not be the most appropriate technology to use (e.g., symbolic regression) [6].

Genetic programming does have several attributes that potentially justify its use with remotely sensed image data. First, it has been shown to work with imperfect or incomplete problem specifications, while still yielding reasonable solutions [2]. Second, it has been able to produce extremely robust classification code that yields accuracies that are comparable to that which has been obtained with manually produced algorithms [4]. Third, it has been able to work with linear or nonlinear problems with little or no changes to its core (generic) portion [6, 7]. Fourth, it has been shown to work with a wide range of data types, including multi-channel image data [2]. Fifth, it has been able to output several types of code ranging from macros (which would require some type of interpreter) [6] to assembler [9]. Sixth, it has been used to automatically produce code with features such as subroutines, registers, and iterative loops [8]. Seventh, it has been routinely used to manipulate both symbolic and numeric data [6, 7].

Basic genetic programming can be briefly described as follows. A genetic programming run starts with a set of

randomly created programs that have been generated from the components that a user has supplied. We presume that somewhere in this random collection of programs are the building blocks necessary for the desired solution. The trick, of course, is to sift through this random code in order to synthesize a desired solution from these building blocks. To implement this trick, genetic programming uses two operations that are modeled after biological processes: natural selection and genetic crossover. In this case, natural selection means that only the most fit individuals reproduce and have offspring. In terms of an operator, natural selection means that these random programs need to be ranked by performance. Ranking is accomplished by means of a fitness function, which has been specified by a user before a run. A fitness function is supplied as a subroutine that tests for how close a program comes to a known result given a known set of inputs. Reproduction and the bearing of offspring refer to the biological process of how offspring are the genetic composite of both parents—i.e., through genetic crossover. In terms of an operator, crossover means that a portion of code from one program is replaced with a portion of code from another program. The resulting composite program is analogous to an “offspring.” Crossover occurs mostly among the fittest programs (the pairing of prospective parents is stochastic, but probabilities favor the fittest programs) and continues until a new population of offspring is attained. The operations corresponding to natural selection and crossover are then repeated for this new population. A genetic programming run continues for subsequent populations until a candidate program obtains the best score allowable under the user-supplied fitness function. Detailed descriptions of genetic programming can be found in [6, 7].

3. PROBLEMS

In spite of its apparent advantages and benefits, genetic programming has only recently appeared in the geoscience and remote sensing literature. Part of this relatively late appearance can be traced to two difficulties. One difficulty involves computational overhead. In Section 2, we noted that basic genetic programming involves the creation and test of many individual candidate programs. A typical run in genetic program may involve the creation and test of several thousand of such candidate programs. For numerous problems previously addressed by genetic programming, testing of each candidate solution is relatively quick and inexpensive (e.g., each program uses several tens of kilobytes of memory and executes in under a minute). For problems involving remotely sensed imagery, candidate solutions could easily use major blocks of memory and minutes to hours of CPU time.

The other difficulty is that the canonical genetic programming system, which has been freely available for some time, is in LISP. (The code for the canonical genetic programming system can be obtained at ftp://https://www.wendangku.net/doc/fd8763919.html,/pub/genetic-programming.) While LISP is arguably the most intuitive language for genetic programming to use, LISP has a few idiosyncrasies that can hinder processing remotely sensed imagery.

4. ADAPTATIONS

This section summarizes our experiences to date in adapting genetic programming for processing remotely sensed imagery [1, 2, 3]. Our particular experience involves the extraction of low-contrast ridge and rubble patterns in low-resolution ERS synthetic aperture radar images of arctic sea ice. (See [3] in this conference proceedings.) Our problem has been relatively typical of those that involve multiple channels of image data (in our case, several textural channels).

To solve our problem, we have implemented six adaptations to genetic programming. The first three have helped in reducing computation time (for a LISP version of our system) to several CPU hours on a Sun SPARCStation 20 workstation.

A fourth adaptation has enabled the LISP version of our system to process megabyte images. A fifth adaptation has helped to further reduce computation time to tens of minutes. A sixth adaptation has been proven necessary when using our methods for directly processing images with genetic programming.

4.1 Preprocessing

We preprocessed as much of the data as was possible to reduce genetic programming run time. For our problem, this has meant that the texture measures were computed beforehand. In particular, our problem was one in which it was possible to use texture channels, where each channel represents a different filtered version of the image data to be processed. (Two of the channels we specified corresponded to layers in an image pyramid: mean images filtered with kernel sizes of 3×3 and 5×5, respectively. The other two channels corresponded to edge detection: a Laplacian image of kernel size 5×5 and a Laplacian, 5×5, of a mean, 3×3, image.) In a sense, what was left for genetic programming to do was to formulate an algorithm (a rule set) that governs how the data in each channel was to be combined with the others.

Note that a complete algorithm produced by genetic programming would then consist not only of a rule set, but would also include the subroutines implicit with each filtered channel. While this may seem obvious, there may be unintentional side effects if one simply converted the algorithm produced by genetic programming to a standalone application and then used a different software implementation to generate each filtered channel.

4.2 Test Points

We used test points in an image that have been manually interpreted to serve as programming benchmarks. Each candidate algorithm produced in genetic programming is executed with these test points to assess that algorithm’s accuracy in extracting a desired pattern (i.e., in the context of a fitness function). The number of test points that need to be provided does not have to be large. For our problem, the best algorithm that we have obtained to process full-sized low-resolution ERS SAR data products was developed using only 53 test points.

Note that we used test points, rather than subimage patches, to serves as “training sets” for genetic programming. We have found that the use of subimage patches was too constraining, did not help, and has even hindered processing times. See [2].

4.3 Dynamic Fitness

Instead of requiring all candidate algorithms to use a fixed standard, we opted to use a sliding standard. By that we mean the number of test points needing to be solved at the outset of a genetic programming run are fewer than the number of test points needing to be solved towards the end of that run. We have found this technique to result in better quality algorithms than without this technique, when genetic programming needs to solve for the entire test point set at the outset. See [1, 2] for implementation details.

4.4 Chunking

LISP requires a fairly sizable amount of processing overhead, which negatively affects the size of an image that can be processed at any one time. As a workaround intended mostly for LISP systems, we chunked low-resolution ERS SAR image data into smaller subimages, processed the subimages, then integrated the processed subimages to obtained a whole derived data product. The nature of the operators that we used for genetic programming allowed for seamless integration of subimages.

Our first prototype was built around the canonical genetic programming kernel, which means that our prototype system was implemented in LISP.

4.5 C-Language Port

As well suited for LISP as genetic programming is, there have been several reasons that have prompted us to design our second prototype in C: larger array sizes, faster run times, broader user community, wider range of programming flavors and tools. We have developed our second prototype around a recently introduced C-language version of genetic programming. (The code for the C-language version of the genetic programming kernel is available at the following URL: https://www.wendangku.net/doc/fd8763919.html,/GA/software/lil-gp/) Early tests have shown that our second prototype runs about an order of magnitude faster than our LISP version.

4.6 Scaffolding

More often times than not, in cases where custom-designed code is needed, one is not able to start with an exact specification of a pattern. There often exists an uncertainty on what does, indeed, constitute the desired pattern. This uncertainty can show up as inappropriately interpreted test points, or even as inappropriately chosen test points. Several iterations of trying and interpreting different test points are usually the norm.

For this and other reasons described in [1, 2], we have designed our system with the deliberate intent to involve the user. In a sense, we have designed our system so that genetic programming facilitates the testing of a user’s hypotheses on what a desired pattern should be. The user learns about specifying a desired pattern in a consistent manner, while the computer assumes the role of a human programmer, who would logically extend a user’s specification into code. This cooperative relationship has been referred in the education and technology literature as scaffolding.

5. CONCLUSIONS

This paper has highlighted some of the adaptations used to incorporate genetic programming in the design of algorithms that extract features from remotely sensed images. In particular, this paper has summarized six adaptations: preprocessing, test points, dynamic fitness, chunking (for LISP versions), C-language port, and scaffolding. Although many applications have employed genetic programming as an unsupervised method, we have integrated genetic programming as part of an interactive system that serves as a tool for computer-assisted algorithm design. The overall system has been developed to help a user to focus more on the problem at hand and less on programming detail.

Additional references on genetic programming include [6, 7]. One can also obtain information and software about genetic programming at the following URL addresses:

?https://www.wendangku.net/doc/fd8763919.html,/research/genprog/

?ftp://https://www.wendangku.net/doc/fd8763919.html,/pub/genetic-programming

ACKNOWLEDGMENTS

This research has been partially funded with grants from the Office of Naval Research, the Naval Research Laboratory (Stennis) and the Space Physics Research Laboratory (U-M). We gratefully acknowledge the following individuals for their invaluable assistance: I. Kristo, S. Daida, J. Hommes, J. Koza, R. Onstott, R. Riolo, E. Soloway, and A. Wu.

BIBLIOGRAPHY

[1]Daida, J.M. T.F. Bersano-Begey, S.J. Ross, and J.F. Vesecky, “Computer-

Assisted Design of Image Analysis Algorithms: Dynamic and Static Fitness Evaluations in a Scaffolded Environment,” Submitted to Genetic Programming ’96.

[2]Daida, J.M., J.D. Hommes, T.F. Bersano-Begey, S.J. Ross, and J.F.

Vesecky, “Algorithm Discovery Using the Genetic Programming Paradigm: Extracting Low-Contrast Curvilinear Features from SAR Images of Arctic Ice,” Advances in Genetic Programming II, P. Angeline and K. Kinnear, Jr., (ed.), Cambridge: The MIT Press, 1996. In Press.

[3]Daida, J.M. R.G. Onstott, T.F. Bersano-Begey, S.J. Ross, and J.F. Vesecky,

“Ice Roughness Classification and ERS SAR Imagery of Arctic Sea Ice: Evaluation of Feature-Extraction Algorithms by Genetic Programming,”

Proceedings of IGARSS ’96, IEEE Press. In Press.

[4]Francone, F.D., P. Nordin, B. Banzhaf, “Benchmarking the Real-World

Generalization Capabilities of an Advanced, Compiling Genetic Programming System Using Sparse Data Sets,” Submitted to Genetic Programming ’96.

[5]Koza, J.R. “Hierarchical Genetic Algorithms Operating on Populations of

Computer Programs,” Proceedings of the 11th International Joint Conference on Artificial Intelligence, Volume 1., Morgan Kaufmann.

[6]Koza, J.R., Genetic Programming: On the Programming of Computers by

Means of Natural Selection, Cambridge: The MIT Press, 1992.

[7]Koza, J.R., Genetic Programming II: Automatic Discovery of Reusable

Programs, Cambridge: The MIT Press, 1994.

[8]Koza, J.R., D. Andre, “Evolution of Iteration in Genetic Programming,”

Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, Cambridge: The MIT Press. In Press.

[9]Nordin, P., “A Compiling Genetic Programming System that Directly

Manipulate the Machine Code,” Advances in Genetic Programming, K.

Kinnear, Jr. (ed.), Cambridge: The MIT Press, pp. 311–331.