文档库 最新最全的文档下载
当前位置:文档库 › Acrophile an automated acronym extractor and server

Acrophile an automated acronym extractor and server

Acrophile an automated acronym extractor and server
Acrophile an automated acronym extractor and server

Acrophile: An Automated Acronym Extractor and Server

Leah S. Larkey, Paul Ogilvie, M. Andrew Price Department of Computer Science

University of Massachusetts

Amherst, MA 01003

Email: {larkey, pogil, maprice}@https://www.wendangku.net/doc/e37501495.html,

Brenden Tamilio School of Cognitive Science Hampshire College

Amherst, MA 01002 Email: bat96@https://www.wendangku.net/doc/e37501495.html,

ABSTRACT

We implemented a web server for acronym and abbrevia-tion lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic extraction process. Several different extraction algorithms were evaluated and compared. The corpus re-sulting from the best algorithm is comparable to a high-quality hand-crafted site, but has the potential to be much more inclusive as data from more web pages are processed. KEYWORDS: Acronyms, information extraction INTRODUCTION

Acronyms are everywhere; we read and hear them but rarely think about them, except when we do not know what they mean. Every content domain has its own acronyms and abbreviations. In many of these areas, particularly those that are highly technical or bureaucratic, acronyms occur frequently enough to make it difficult for outsiders to comprehend text.

Many acronym and abbreviation dictionaries are available, both in printed form and on the World Wide Web. Some attempt to be all inclusive, others are specialized for par-ticular domains. There are searchable databases and simple lists. Some general problems in building such collections, or any dictionaries, are getting comprehensive coverage, and keeping the collection current. New abbreviations continually come into use. To keep their dictionaries growing, some maintainers allow users to submit new acro-nyms and definitions. This openness, however, can result in poor-quality data.

Acrophile is an automated system that builds and serves a searchable database of acronyms and abbreviations using information retrieval techniques and heuristic extraction. It was developed and built by students during an NSF REU (Research Experience for Undergraduates) summer pro-gram. The current version, available on the web at https://www.wendangku.net/doc/e37501495.html,/ciirdemo/acronym/, contains a set of acronyms and expansions that were extracted from a large static collection of web pages. The system can crawl the web for additional pages, extract additional acro-nym/expansion pairs, and collect them in a file. Periodi-cally, the database can be rebuilt, incorporating the addi-tional new pairs.

Another important goal of this project was to evaluate the quality of our automatically-built acronym and abbreviation databases. We developed evaluation techniques to compare different extraction algorithms and to compare the quality of our automatically-built databases with manually col-lected databases.

Our evaluation goals were to test the following hypotheses: 1.It should be possible to use IR techniques and heuristic

extraction to collect a set of acronyms and expansions which is at least as good and as comprehensive as care-fully constructed manually built lists available on the web.

2.In order to collect as many correctly expanded acro-

nyms as possible from an essentially unlimited corpus like the web, one should choose a strict algorithm that accepts few errors, even at the cost of missing some cases in specific documents. It should be possible to pick up those missed definitions from other contexts by processing more text, and the resulting lists should have higher precision than a similar-sized list produced by a less strict algorithm.

3.We should be able to increase the coverage of our col-

lection more efficiently by searching for acronyms than by processing random pages.

Related Work

Many acronym and abbreviation dictionaries have been compiled and published in books and many lists are avail-able on the web, such as Acronym Finder [1] and the World Wide Web Acronym and Abbreviation Server (WWWAAS) [17]. The Opaui Guide to Lists of Acronyms, Abbreviations, and Initialisms [13]has 124 links to acro-nym and abbreviation lists, some of them general, and some as specialized as the Dog fanciers acronym list [4] or the Mad Cow disease list [10].

All of these web-based lists appear to be built manually rather than by automatic extraction. The lists range in size from a few dozen items to over 127,000 acronym defini-

To appear in DL00.

Copyright ? 2000 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page or initial screen of the document. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to

tions in Acronym Finder [1]. The accuracy seems to vary widely. The primary problem with many large lists on the web is that they allow people to submit expansions. Some sites screen submissions carefully [12], others do not. As far as we can determine, no previous automatic extraction efforts have resulted in publicly searchable online databases of acronyms and none have received thorough evaluations. IBM advertises a tool for abbreviation extraction, IBM In-telligent Miner for Text, which allows corporations to proc-ess and categorize text documents [16]. Among the prod-uct’s features is the ability to extract abbreviation phrases. But the paper does not present any information on the heu-ristics used, nor does it present data on the quality of the re-sults.

Two small-scale acronym extraction projects have been de-scribed and received limited evaluation in unrefereed lit-erature. AFP (Acronym Finding Program) [15] is an acro-nym extraction algorithm which considers strings of from 3 to 10 uppercase letters as an acronym, and looks for candi-date expansions in windows of twice the number of letters in the acronym before and after the acronym. It only looks for matching letters occurring at word beginnings (or after hyphens) but allows some mismatch between the letters of the acronyms and the sequence of initial letters of the words. AFP was tested on 17 documents from the Depart-ment of Energy. It attained 93% recall and 98% precision on acronyms in this set with length of three or greater, 86% recall and 98% precision when two character acronyms were included.

TLA (Three Letter Acronyms) [18] was developed at the University of Waikato. It has no case requirements for acronyms, so that any token is a candidate acronym. The token is accepted if a matching sequence is found by taking up to three characters from adjacent words. TLA was evaluated on ten computer science technical reports, on which it obtained 91% recall and 68% precision. A newer approach by the same researchers uses compression models to identify acronyms and definitions [19]. This approach is less ad-hoc than a purely heuristic approach like ours, but requires a corpus of hand-marked training data.

None of these extraction systems have been used to process a large corpus of text and compile a searchable dictionary of acronyms.

Automated extraction projects for extracting non-acronym text relations bear some interesting similarities to the acro-nym problem. Several email extractors such as Atomic Harvester 98 [4] and EmailSiphon [6] can be found on the web. They crawl through every web page at a given site and extract every email address they can find, to compile lists to sell commercially. Extracting email addresses is simpler than finding acronyms and expansions because it does not require relating pairs of segments found in text. It is sufficient to search for the general pattern user-name@location. Higher accuracy can be gained by checking the suffix of an address for the existence of com-mon domains such as .edu, .com, and .gov.

The processes of extracting hyponyms [9] and citations [8] are more similar to the acronym task in that they require extracting a relation from text. Hearst’s hyponym extractor [9] finds pairs of noun phrases NP1 and NP2 such that NP1 is a kind of NP2, for example, nutmeg is a hyponym of spice. Her system finds hyponyms in text by looking for some simple patterns like “spices, such as nutmeg,”“spices, including nutmeg and sage”, or “such spices as nutmeg and sage.” As we find for acronyms, these heuris-tics provide reliable but not foolproof methods of finding hyponyms. Hearst ran the extraction algorithm on an ency-clopedia, and found many correct hyponyms which could be added to WordNet [1].

CiteSeer [8] is a system that extracts bibliographical cita-tions and references. Like Acrophile, it uses a set of heu-ristics to index information extracted from web pages. CiteSeer searches for pages that might contain PostScript documents and keywords such as PostScript and publica-tion. Once the documents are retrieved, the system verifies that they are legitimate publications by searching for the presence of a references section. When documents are parsed, the system saves the title, author, year of publica-tion, page numbers, and citation tag, a shorthand abbre-viation for identifying the cited paper in the text body. It uses a set of canonical rules, for example, that citation tags always appear at the beginning of references, author infor-mation generally precedes the title, and the publisher usu-ally follows the title. The developers of CiteSeer provide a facility on the web called ResearchIndex where users can search for references and citation information [14]. Terminology

An acronym is “a word formed from the initial letters or parts of a word, such as PAC for p olitical a ction c ommit-tee.” An abbreviation is “a shortened form of a word or phrase used chiefly in writing…” [3]. Thus, an acronym is an abbreviation whose letters are read as a word. This definition excludes abbreviations such as FBI and NAACP, which are pronounced by saying the individual letters in the abbreviation.

Our project covers a subset of abbreviations which is larger than the set of acronyms, but smaller than the set of all ab-breviations. We include abbreviations that do not form words, as long as their letters come from the words in the phrase. We also include abbreviations with numbers such as 4WD, and 3M, although our expansion algorithms can only deal successfully with cases where the digit stands for a spelled-out number (Four Wheel Drive), or acts as a mul-tiplier (Minnesota Mining and Manufacturing). It cannot handle cases like Y2K. We exclude abbreviations composed of letters that are not in the words (lb.), and abbreviations for single words rather than multiword phrases.

We use the term “expansion” for the phrase an acronym stands for.

In the remainder of this paper we describe the Acrophile

system, then the acronym extraction algorithms; finally, we evaluate the algorithms and compare the resulting collec-tions with some hand-crafted acronym collections on the web.

SYSTEM DESCRIPTION

The core of Acrophile is a large collection of acronyms and expansions, which was automatically extracted from web pages and indexed using Inquery 3.2, a probabilistic infor-mation retrieval system developed at the University of Massachusetts. Users can submit an acronym such as IRS, and see a list of expansions for that acronym, or they can submit words (such as Internal Revenue or revenue) and see the acronyms whose expansions contain those words. The system returns lists of acronyms and expansions, ranked by a quality score. One can also submit a URL to the acronym extractor and get a list of acronyms and expan-sions found on the page.

We first describe this collection and how it was created, then we describe the lookup system on the web.

Building and updating the database

Figure 1 outlines the process by which the database was created and how it can grow. A static collection of around 1 million (936,550) military and government web pages, comprising around 5 gigabytes of text, was processed in the

manner illustrated in Figure 1.

First, a Perl script performs some simple filtering on the web pages, to remove all HTML tags. The resulting stream of text is fed into our acronym extractor, a C program using flex and yacc, which incorporates the best of the four algo-rithms we tested in the evaluation reported below. The ac-ronym extractor produces a list of acronym and expansion occurrences. These pairs are marked to indicate whether they came from a parenthetical form such as DUI (Driving under the Influence), information which is later used in computing a confidence rating for the expansion. Acronym/expansion sets are then sorted and merged, ac-cumulating counts of occurrences. Occurrences in paren-theses are also counted separately. The output is tagged, creating pseudo-documents for indexing. Each pseudo-document has an acronym as its title and an expansion as its text. A confidence score is also placed in a tagged field. These data files are then indexed, with no stopping or stemming performed on acronyms or expansions. The in-dexing process creates a searchable Inquery database.

The shaded path through Figure 1 shows how the database can be expanded by crawling for web pages containing known acronyms. A list of acronyms is submitted as indi-vidual queries to AltaVista, using a modified version of Gnu’s wget. For each query, we retrieve the top n match-ing pages, returned from AltaVista ten at a time. Each re-sults page is piped through a hand-coded filter which at-tempts to remove all content except the URLs of the found documents. These URLs are then crawled in sets of 10 by another instance of the crawler. This crawling accumulates a new collection of web pages, which are processed like the static set, to extract an add-on set of acronym/expansion pairs. These can be added to the original set, and the data-base rebuilt.

The Search System

The search process is illustrated in Figure 2, below.

The system uses a client/server architecture which could accommodate multiple servers across a network, although at present our client and (single) server run on the same Unix system. The user types an acronym or phrase into a text box on the Acrophile search page. This query is sub-mitted via CGI to a custom Inquery web client developed for Acrophile. The client creates a network connection to the Inquery connection server, which issues search com-mands to an available Inquery server, which retrieves acro-nym/expansion pairs from the database. A ranked list is then returned through the connection server and back to the client. A confidence score is computed for each expansion, based on the stored occurrence counts. The list is sorted by this confidence score, filtered, and formatted for display in the user’s web browser. The user may select how many expansions they would like to see.

Figure 1: Building and updating Acrophile Figure 2: Searching for acronyms

Extraction Demonstration

In addition to the searchable acronym collection, the Acro-phile splash page also contains a link to an online extrac-tion demonstration, which accepts a URL from the user and extracts acronyms and expansions from the submitted document in real time. Currently, the results of this extrac-tion are not added to the online database.

EXTRACTION ALGORITHMS

The Acrophile extraction algorithms use flex, a lexical analyzer, and yacc, a parser, to process a text document to extract acronyms. Expansions for the acronyms are found in the text using a combination of document context and canonical rules, which match patterns in which acronyms are commonly defined in standard written English.

We developed several different versions of extraction algo-rithms and tested four of them. All versions work on the general principle of hypothesizing that a sequence is an acronym if it fits certain patterns, and confirming it as an acronym if a plausible expansion for it is found nearby. For all four algorithms, some normalization is performed after extraction. Two acronyms are considered equivalent if they differ only in capitalization. Two expansions for an acronym are considered equivalent if they differ only in capitalization or in the presence or absence of periods, hy-phens, or spaces.

Our four algorithms, called contextual, canonical, canoni-cal/contextual, and simple canonical, differ in what patterns are taken to indicate potential acronyms, what forms ex-pansions can be found in, and what text patterns indicate a possible acronym/expansion pair. The contextual, canoni-cal, and canonical/contextual algorithms are all related and arose by modifying an earlier contextual algorithm. The simple canonical algorithm was designed independently to try a more limited approach that might yield higher preci-sion on the acronyms it found. We did some initial tuning of algorithms based on their performance on a small pilot set of 12,380 Wall Street Journal articles from 1989.

The simple canonical algorithm (also called simple) is the strictest of the four. It finds only those acronym/expansion pairs which fit a small set of canonical forms, such as “ex-pansion (ACRONYM)”, or “ACRONYM or expansion”. The contextual algorithm, on the opposite end of the strict-ness continuum, looks for an expansion in the vicinity of the potential acronym without requiring any canonical pat-tern (“or”, parentheses, commas, etc.) indicating their rela-tionship. The canonical/contextual and canonical algo-rithms fall in between the other two. The four algorithms are contrasted in Table 1, which lists their major character-istics. The columns of the table summarize the four differ-ent algorithms. The top half of the table lists properties of hypothesized acronyms. The bottom half covers properties of the expansions. All four algorithms are described below. Finding Acronyms

The algorithms identify potential acronyms by scanning text for the patterns shown in the row labeled Acronym Patterns in Table 1. This row uses a pseudo-regular-expres-sion notation in which superscript + indicates one or more occurrences of a symbol, * indicates 0 or more occurrences, numbered superscripts indicate a specific number or range of occurrences. U stands for an uppercase letter, L a lower-case letter, D a digit, S an optional final s or ‘s, {sep} is a period or a period followed by a space, and {dig} is a num-ber between 1 and 9, optionally followed by a hyphen. Terms in square brackets are alternatives.

The contextual algorithm accepts acronyms that are all up-percase (USA), with periods (U.S.A.) or which have a se-quence of lowercase characters either at the end of the pat-tern following at least three uppercase characters (COGSNet), or internally following at least 2 uppercase characters (AChemS). An uppercase pattern can also have any number of digits, anywhere.

The canonical/contextual and canonical algorithms accept a wider range of acronym patterns. They have less constraint on lower case sequences, to allow patterns like DoD. Slashes and hyphens are allowed in acronyms, to get pat-terns like AFL-CIO and 3-D. Acronyms are not allowed to end with lower case characters except for s, and only 1 digit is allowed in an acronym.

The simple canonical algorithm takes a minimalist ap-proach, excluding acronyms with digits, periods, and spaces. An acronym must begin with an uppercase letter, followed by zero to 8 upper or lowercase letters, slashes, or dashes, and ending in an uppercase letter.

Acronym Expansion

Contextual Algorithm. The contextual algorithm finds ex-pansions by matching from the last character of the acro-nym to the front. It always saves the twenty most recent words scanned, so when a potential acronym is identified, it tries to find the expansion in this saved buffer. Otherwise, it looks for the expansion in the text following the acronym. It requires no canonical forms, so it can successfully deal with text like, “… is three dimensional. In 3D images…”The expansion rules can refer to a list of 35 noise words like and, for, of, and the, which are often skipped in acro-nyms, as in CIIR (Center for Intelligent Information Re-trieval). The algorithm tries to find a sequence of words such that the initial 1 to 4 characters from each non-noise word match the characters of the acronym, as in Bureau of Personnel (BUPERS). In addition:

?One initial character of a noise word can match an in-ternal acronym character, as in Department of Defense (DOD).

? A noise word can be skipped, as in Research Experi-ence for Undergraduates (REU).

?The initial character and the 4th, 5th, or 6th characters of potential expansion terms could be matched to acronym characters as in PostScript (PS). This is an attempt to simulate a crude morphemic decomposition, but without any knowledge of English prefixes.

Table 1: Properties of acronyms and expansions for four different algorithms

The contextual algorithm scans for an expansion until an-other acronym pattern is encountered, wherein the old ac-ronym is forgotten and the new one becomes the source for matching, or until the expansion is found or fails.

If a digit n is found in the acronym, the acronym receives some special handling. The algorithm tries replacing the digit and the following or preceding character with n repe-titions of the character, as in MMM for 3M. If it cannot find an expansion for this transformed acronym, it then tries matching the digit with the spelled out number, as in three dimensional for 3D. Periods in acronyms are ignored in looking for expansions.One of the major problems with the contextual algorithm was its greediness in trying to match more than one initial character from expansion terms. This would lead it to ex-pand NIST as National Institute of Standards, taking the t from Standards, rather than as National Institute of Stan-dards and Technology. A second problem, particularly with two letter acronyms, was the unacceptably high likeli-hood of finding a sequence of lower case words with a spu-rious match for the acronym, as in story from for SF. Contextual-Canonical. The canonical/contextual algorithm is a modification of the contextual algorithm to address the above two problems. First, canonical rules were added to

constrain when lower case words are accepted for expan-sions. Only if an acronym/expansion pair is found in a form in the row labeled Canonical Definition in Table 1, is a lower case expansion allowed. An expansion found via the contextual rules must be capitalized, except for noise words. Second, the algorithm tries conservatively, rather than greedily, to match multiple characters in an expansion term, addressing the problem illustrated with NIST, above. In addition, hyphens and slashes are allowed in acronyms, and are passed over silently in expanding them. If an ex-pansion term is hyphenated, such as Real-Time from CRICCS (Center for Real-Time and Intelligent Complex Computing Systems), the algorithm can either treat Real-Time as two words, or as a single word, not requiring a T in the acronym.

Canonical. The canonical algorithm was derived from the canonical contextual, filtering the output list so that only acronym/expansion pairs that were found in canonical form were retained.

Simple Canonical. The simple canonical algorithm was an attempt to do away with most of the complexity of the contextual algorithm and its derivatives. Like the canonical algorithm, the simple canonical algorithm requires that the acronym be found in certain textual contexts, but it accepts fewer canonical patterns for acronym/expansion pairs, and fewer acronym patterns. The algorithm searches for the forms in the Canonical Definition row of Table 1 in the order they are listed.

When checking the validity of a potential expansion, the algorithm has a few acronym/expansion matching schemes. Each of these schemes recursively checks shorter expan-sions first. The matching schemes are performed as fol-lows:

1)Uppercase strict: each letter in the acronym must be

represented, in order, by an uppercase letter in the ex-pansion. The expansion must begin with the first letter of the acronym.

2)Lowercase strict: each letter in the acronym must be

represented, in order, by the first letter of a word in the expansion. The expansion must begin with the first letter of the acronym and must not contain uppercase letters.

3)Uppercase loose: the first word must begin with the

first letter of the acronym and the last word must begin with a letter in the acronym. This scheme is extremely loose, and can result in expansions where some letters in the acronym are not matched at all.

The functions that check shorter expansions first remove words from the end of the expansion farthest from the ac-ronym, then the functions call themselves with the modified expansion. Each function will remove a word from the beginning of the expansion if the expansion follows the acronym, or from the end of the expansion if the expansion precedes the acronym. If the shorter expansion passes the requirements, the algorithm returns the short expansion with the acronym as valid. For example, Air Carrier Ac-cess Act (ACAA) fits the pattern “expansion (ACRO-NYM).” Since Air Carrier Access Act passes the uppercase strict test, it is returned as the valid expansion for ACAA. While Access Act would pass the uppercase loose test for ACAA, it would not be returned because the uppercase strict test is performed first.

EVALUATION OF ALGORITHMS

In order to evaluate how well our algorithms correctly find all the acronyms that are explicitly defined in a set of documents, we use standard information retrieval measures. Precision, that is, found correct/found total, measures the accuracy of extraction, and recall, that is, found cor-rect/known correct, measures the completeness of the ex-traction. For this evaluation we started with the 1M set, that is, the 936,550 military and government web pages that we processed for the Acrophile web database. From this set, we selected at random 170 pages that contained text and manually found all the acronyms with explicit defini-tions. These documents contain 353 defined acronyms, 10 with an ampersand or slash, and none with numbers or dashes. Variations in expansions that were accepted as correct were the omission or addition of an ‘s,’ and differ-ences in punctuation.

Table 2 shows recall and precision values for the four algo-rithms on the 353 acronyms in test set and on a subset con-taining the 328 acronyms of length three or higher.

Table 2: Recall and precision on 170 sample docs There were sixteen cases missed by all our algorithms be-cause the expansion was too far (more than twenty words) away from the acronym. We do not expect any algorithm to get these, and other researchers do not include such cases [15][18]. The results excluding these cases can be seen in

Table 3.

Table 3: Excluding distances > 20 words Precision is very high, especially on acronyms longer than two characters. Recall is considerably higher for the ca-nonical contextual algorithm than the other three algorithms but with lower precision. As expected, the two canonical algorithms have lower recall but higher precision. The contextual algorithm has lower recall, and slightly higher

precision than the canonical contextual algorithm, in a pat-tern indicating that a preponderance of its errors are on 2 letter acronyms. These results cannot be directly compared to the .93 recall and .98 precision found for acronyms longer than 2 characters in [15], and .91 recall and .68 pre-cision in [18], and roughly .80 recall and .90 precision in [19], because these studies are based on different text, and possibly different criteria for correctness.

COMPARISON WITH HAND CRAFTED LISTS

Two web collections were chosen for the comparison. We tried to use the largest and best quality sites from which we could easily get and parse lists of acronyms and expan-sions. We used WWWAAS, the World Wide Web Acro-nym and Abbreviation server at University College Cork in Ireland [17] and Acronym Finder, Mountain Data Sys-tems’s acronym database [1]. From WWWAAS, the smaller of the two sites, we could extract the entire data-base by submitting a “.” a s a query. The out put was con-verted from HTML to our format with lex. The items were not added to our database. For Acronym Finder, the larger site, we were not able to dump the entire database, but we were able to collect all the expansions for a test set to be described in the next section.

Size

First, Table 4 shows how our collection compares with the others in overall size. WWWAAS contains far fewer acro-nyms and expansions than our set. Acronym Finder con-tains more acronyms, but fewer expansions than we ex-tracted from 1M set described above. Processing additional pages outside of the military and government domain would undoubtedly find more acronyms.

Algorithm# Acronyms# Exps Avg Exps/Acro Contextual44,241143,620 3.25 CanCon51,726161,686 3.13 Canonical41,832117,746 2.81 Simple40,073119,081 2.97 WWWAAS12,10817,753 1.47

Ac.Finder60,000127,000 2.17 Table 4: Number of acronyms and expansions extracted from 1M pages by each algorithm, and at 2 web sites

Evaluation method

To go beyond size and compare the correctness of different collections is much more difficult than comparing algo-rithms on a fixed set of data. A major challenge was in defining “correct.” A usable criterion was to require that we could find the acronym in use on the web. By taking a random sample of 200 acronyms from each of our lists, we were able to determine that virtually all the acronyms in all the sets were real acronyms, that is, we were able to find them used as an acronym somewhere on the web. How-ever, it looked as though some expansions might be errone-ous, and we devised the following method to evaluate the accuracy of the set of expansions listed for an acronym.The test samples of acronyms and expansions. We ini-tially selected a sample of 55 acronyms for evaluating ex-pansions. Forty acronyms were chosen to mimic the distri-bution of acronym length found in the small Wall Street Journal collection. Acronyms with length 2, 3, and 4 were generated randomly, while others were selected at random from a longer list of acronyms of that type. We added 5 acronyms containing numbers, 5 known to have a large number of expansions, and 5 with dashes or slashes.

For each of the 55 acronyms, we collected a pool of expan-sions from the two reference databases on the Web, and from our four algorithms, run on the 1M set. We also added all the additional expansions that came up in the crawling experiments discussed below. We later found that for 10 of the 55 acronyms, none of the systems found any expansions. These 10 were removed from the evaluation, leaving 45 acronyms in the test set.1

Criteria for correct expansions. Our criterion for a correct expansion was similar to that for a correct acronym, that is, that we could find at least one example on the web defining that expansion for that acronym. We hired evaluators to examine pages returned from an AltaVista [2] search for a query consisting of the acronym and the expansion. If they could find the acronym defined with the target expansion on any web page, using a list of explicitly defined criteria, it was accepted as correct. Otherwise, it was incorrect. Scoring. We defined recall for this context as the number of correct expansions for an acronym found by one algorithm or system divided by the number of known correct expan-sions for that acronym found by all algorithms or systems evaluated. Similarly, we defined precision as the number of correct expansions for an acronym found by one algo-rithm or system, divided by the number of expansions, cor-rect or incorrect, found by that algorithm or system. We then averaged across acronyms.

To obtain a range of recall/precision points, we ranked the expansions by a confidence score, which was a function of how many times the expansion was found for an acronym, and another factor which we found highly related to reli-ability – whether an occurrence is in one of the two canoni-cal forms “expansion (ACRONYM)” or “ACRONYM (ex-pansion)”. Pilot research with the 1989 Wall Street Journal corpus showed that acronym/expansion pairs extracted from this frame were about five times more likely to be correct than pairs extracted from any other form. There-fore, we gave occurrences in this form more weight than 1 Several patterns in our results make us doubt that our test set of 45 acronym is representative. First, the average number of expansions per acronym is much higher in the test set than in the complete set. We are in the process of judging a better corpus of 200 acronyms. This list includes most of the 45 acronyms from the present test set, plus acronyms chosen randomly from a list of acronyms found in the evaluated systems. These judgments will allow a more reliable evaluation.

occurrences in other forms by counting them as five occur-rences.

An acronym’s expansion with a count of 1 in a very large corpus is somewhat likely to be erroneous. Expansions with a count of 10 are much more likely to be correct and expansions with counts of 30 are even more likely to be correct. The higher the count we require, the better accu-racy (precision) we can obtain. However, requiring higher counts also causes more legitimate expansions be missed. We can therefore get higher precision by requiring some threshold number of counts in order to accept an expansion for an acronym, but at the cost of lower recall. By varying this threshold, we obtain a range of recall-precision points for our evaluation below. The confidence scores are also used in the online search system, but they are transformed to C/(C+2), in order to range between 0 and 1.

Note that weighting the count does not bias our measure-ments of recall and precision, it only affects how acronyms are grouped by confidence to get a range of recall/precision values.

Table 5 shows the total number of expansions found for the 45 acronym test set by each of the 4 tested algorithms and the two web sites. It also shows recall and precision. The contextual and canonical/contextual algorithms find the largest number of expansions for the test acronyms. Con-sistent with the analysis on the 170 documents, the simple and canonical algorithms have higher precision and lower recall. Acronym Finder has performance similar to our algorithms. A more complete picture of the situation can be seen in Figure 3.

Algorithm# Exps Precision Recall Contextual1172.75.28 CanCon1055.76.34 Canonical573.79.21 Simple344.81.25 WWWAAS90.84.09 Acronym Finder450.76.31 Table 5: Number of expansions, precision, and recall for each system, measured on 45 test acronyms Figure 3 shows recall and precision curves for the four al-gorithms, evaluated on the 45 test acronyms whose expan-sions were all judged. The points on each curve show re-call and precision at thresholds of 1, 2, 3, 4, 5, 10, 15, 20, 25, and 30, computed as described above. The recall and precision values in Table 5 correspond to the threshold 1 points on Figure 3. The graph shows that for all algo-rithms, it is possible to attain precision values in the .95-.97 range, but only at very low recall levels, that is, for the ac-ronyms we have the most confidence in. The worst-per-forming algorithm is the contextual, with substantially lower values than the other values all along the recall preci-sion curve. The canonical/contextual algorithm and the simple algorithm perform the best across most of the curve, except at the high recall end, where the canoni-cal/contextual algorithm attains higher recall. In other words, the contextual rules of the canonical/contextual al-gorithm allow us to find more acronyms and/or expansions than we can find using canonical rules alone, but this non-canonical set also has more errors in it. The canonical al-gorithm falls between simple canonical and contextual al-gorithms in recall and precision.

The unconnected points on Figure 3 show the recall and precision values we measured for the handcrafted web sites, on the 45 test acronyms. Each site contributes a single point to the graph rather than a curve, because we have no way to vary a threshold.

WWWAAS, the smaller site, falls at the low end of recall, with a recall of .09 and precision of .84. Although precision (.84) appears good, compared to the other values in Table 5 (all in the .70’s), we see from the more complete recall-pre-cision curves that this value is comparable to our worst-per-forming algorithm – the contextual – at a threshold of 3. Our best algorithm, the canonical/contextual, has recall of .25 at the comparable value of precision, and precision of .96 at the comparable recall level.

Acronym Finder, the larger site, had recall and precision values of .26 and .76, comparable to our best algorithm, the canonical/contextual, at a threshold of 1. These results con-firm the hypothesis that our algorithms can create a corpus of acronyms and expansions that is comparable in quality to the best manually built site that we could evaluate.

Note that precision and especially recall values here are substantially lower than what we found in evaluating the extraction from 170 web pages. The difference is due to the different pool of expansions which were considered correct. We are certain that some of the acronym expan-sions we counted as incorrect were in fact correct, but were not found in the AltaVista search, resulting in lower preci-sion.

Figure 3: Recall and Precision on 45-acronym test

Recall

The pool of correct expansions has an even larger effect in reducing recall. An expansion is counted as missed if any other evaluated system found the expansion, whether or not it was present in the set of documents input to the acronym extractors. This makes the set of correct expansions a moving target that grows the more we search. The crawl-ing experiments below show that the same acronyms are used in many domains, and if we go beyond our military and government 1M set, more expansions will be found.

PROCESSING ADDITIONAL PAGES

This analysis addresses the extent to which we can find more expansions by searching the web for acronyms. We used the 55 test acronyms, submitted them as queries to AltaVista, and ran our extraction algorithms on the top 30and 100 pages that were returned for each query. This pro-cess found many new expansions for the target acronyms.As an illustrative example, Table 6 shows all the expan-sions for the acronym EWI , as found by all the systems mentioned. WWWAAS does not appear in the table be-cause it did not include the acronym EWI . The other man-ual site, Acronym Finder (AF) had three expansions listed,two correct and one incorrect. All four of our algorithms,run on the 1 million web pages, found the two correct ex-pansions for EWI listed by Acronym Finder, and did not get the incorrect expansion. In addition, our algorithms found a third correct expansion, and all but the contextual algo-rithm found another incorrect expansion. The additional pages found by searching and crawling more than doubled the number of correct expansions. When 30 pages were processed for each acronym query, four new correct expan-sions and one incorrect expansion were found. When 100pages were processed for each acronym query, another two correct expansions were found.

Table 6: Expansions for acronym EWI

In addition to finding more expansions for the target acro-nyms, extraction from the crawled pages found some new acronyms that had not been extracted before. For 1M+30,

318 new acronyms were found, and for 1M+100, 1120 new acronyms were found. None of these were the 55 acronyms targeted by the search.

Figure 4 shows the recall precision curves for the canonical contextual algorithm, processing 30 crawled pages per ac-ronym in addition to the basic 1M set, and 100 additional crawled pages per acronym, along with the old curve for the 1M set. This targeted crawling results in a huge in-crease in recall, without dropping precision except at the very highest recall levels – thresholds of 1. At a threshold of 2, precision (.75) is not appreciably lower than the preci-sion for the 1M pages alone at a threshold of 1 (.76), but recall has almost doubled from .28 to .54.

As a control, we also measured the performance of the ca-nonical contextual algorithm on comparably sized sets,consisting of the 1M set with the addition of either 55x30=1650, or 55x100=5500 randomly selected docu-ments. We did not include these results on the graph in Figure 4, however. The results were so similar to that of the 1M set alone that they could not be seen separately on the graph.

CONCLUSIONS

We were able to build in a largely automated manner, a searchable dictionary of acronyms and expansions which rivals the quality of a good manually constructed dictionary of acronyms, by extracting acronyms and expansions from a large corpus of static web pages. We showed that we can increase the precision (accuracy) of our extraction by rais-ing a threshold. Although this results in lower recall (cov-erage), we can increase recall by processing more pages.We can increase recall dramatically without loss of preci-sion by processing web pages that are returned by a search for the acronyms that we have already found. This two-stage strategy results in a collection that is superior to any

Figure 4: Adding source pages by searching for

target acronyms

Recall

manually built database, and it can be kept up-to-date in an automated manner.

FUTURE WORK

Dynamic Extraction

Given the great efficiency and success of finding additional expansions for an acronym by searching for the acronym and extracting expansions from the top 100 web pages re-turned, we are planning to add a facility to do this online. This will not replace the static database, however. There are some acronyms which spell existing words (IS, TIDES) for which the acronym may not occur in the top 100 pages returned from a search.

We would like to have the system automatically crawl for pages containing known acronyms, to continue to find new expansions for our acronyms, and to discover new acro-nyms. We have found that our confidence scores get dis-torted by this process because the same web pages may be processed many times. Presently we remove duplicate URLs from the set of pages for one acronym, but we do not keep a master list to prevent processing the same page again in a later run.

HTML Parsing

Our simple method of ignoring material inside HTML tags could be improved. We lose several occurrences of acro-nym/expansion pairs defined within the ALT property of tags, as in: Library of Congress (LOC).

We also do not take advantage of the and tags, which allow a web author to declare acro-nyms and abbreviations as follows: REM or Year 2000. These tags are not yet in common usage, but if they become more widely used, we would want our extraction algorithms to be able to extract acronyms and abbreviations from them. ACKNOWLEDGMENTS

This material is based on work supported in part by the National Science Foundation, Library of Congress and De-partment of Commerce under cooperative agreement num-bers EEC-9209623 and EIA-9820309. Any opinions, findings and conclusions or recommendations expressed in this material are the authors and do not necessarily reflect those of the sponsor.

We thank Mike Molloy for information about Acronym Finder, and Morris Hirsch for an early version of the con-textual algorithm. Thanks also to Don Byrd for his com-ments on a draft of this paper.

REFERENCES

1.Acronym Finder. https://www.wendangku.net/doc/e37501495.html,.

2.AltaVista. https://www.wendangku.net/doc/e37501495.html,.

3.The American Heritage College Dictionary, Third Edi-

tion. Boston: Houghton Mifflin Company, 1993.

4.Atomic Harvester.

https://www.wendangku.net/doc/e37501495.html,/atomic.htm.

5.Dog fanciers acronym list. http://mx.nsu.ru/FAQ/F-dogs-

acronym-list/Q0-0.html.

6. EmailSiphon is known by the evidence it leaves when it

crawls archives for email addresses, purportedly for

spamming purposes. See discussion in

https://www.wendangku.net/doc/e37501495.html,/list-moderators/9802.

7.Fellbaum, Christiane. WordNet: An Electronic Lexical

Database, Cambridge: MIT Press, 1998.

8.Giles, C. Lee, , Bollacker, Kurt D., and Lawrence, Steve.

CiteSeer An Automatic Citation Indexing System, in

Digital Libraries 98, New York: ACM Press, 1998, pp.

89-98.

9.Hearst, Marti. Automatic Acquisition of Hyponyms

from Large Text Corpora, in Proceedings of the Four-teenth International Conference on Computational

Linguistics (Nantes, France, July 1992).

10.Mad Cow disease list.

https://www.wendangku.net/doc/e37501495.html,/animalh/ bse/glossary.html.

11.MetaCrawler. https://www.wendangku.net/doc/e37501495.html,.

12.Molloy, Michael (Acronym Finder), personal commu-

nication. February, 2000.

13.Opaui Guide to Lists of Acronyms, Abbreviations, and

Initialisms (https://www.wendangku.net/doc/e37501495.html,.mx/~smarin/acro.html).

14.ResearchIndex. https://www.wendangku.net/doc/e37501495.html,.

15.Taghva, Kazem, and Gilbreth, Jeff. Recognizing Ac-

ronyms and their Definitions. Technical Report 95-03, ISRI (Information Science Research Institute) UNLV, June, 1995. https://www.wendangku.net/doc/e37501495.html,/ir/publica-

tions/Taghva95-03.ps

https://www.wendangku.net/doc/e37501495.html,ach, Daniel, ed. Text Mining Technology: Turning

Information into Knowledge. IBM White Paper, 1998.

https://www.wendangku.net/doc/e37501495.html,/data/miner/fortext/down load/whiteweb.html.

17.World Wide Web Acronym and Abbreviation Server

(WWWAAS). http://www.ucc.ie/cgi-bin/acronym. 18.Yeates, Stuart. Automatic extraction of acronyms from

text. In Proceedings of the Third New Zealand Com-

puter Science Research Students’ Conference. Hamil-

ton, New Zealand, April 1999, University of Waikato, pages 117-124. https://www.wendangku.net/doc/e37501495.html,/~syeates/-pubs/acroPaper.ps.gz

19.Yeates, Stuart, Bainbridge, David, and Witten, Ian.

Using Compression to identify acronyms in text.

Submitted to Data Compression Conference,

DCC2000.

福建省福州格致中学2014-2015学年高一上学期期中考试英语试题 Word版无答案

福州格致中学2014-2015学年第一学段 高一英语《必修一》模块考试 第二部分:英语知识运用 单项填空 从每题所给的A、B、C、D四个选项中,选出可以填入空白处的最佳答案,并在答题卡上将该选项涂黑。 21.In _______ distance is a tall tree which is said to have ______ history of over 500 years. A. the; a B. /; / C. a; a D. the; / 22.--- Look! The telephone is broken. Someone damaged it _______ purpose. --- That may be right. But perhaps it was broken ________ accident. A. on; on B. on; by C. by; by D. by; on 23.It might be very difficult to find _______ of the information. A. cause B. resource C. source D. course 24. Shall we go there by bus or by taxi? The ______ seems to be quicker. A. late B. latter C. later D. lately 25. After living in the United States for fifty years, he returned to the small town where he ______. A. brought up B. grew up C. gave up D. turned up 26. The engine of the ship was out of order and the bad weather ________ the helplessness of the crew(船员) at sea. A. added to B. added up to C. added up D. was added to 27. --- Daddy, I don’t want to go to Jimmy’s birthday. --- You had better go. ______ you make a promise, you have to keep it. A. Even B. Once C. Unless D. In case 28. --- Won’t you stay for lunch? --- No, thanks, I _______ my uncle at the airport at 10:30. A. met B. have met C. meet D. am meeting 29. ________ out for food, Some work in the nest as guards or workers. A. All the bees not go B. Both the bees don’t go C. Not all the bees go D. All the bees go 30. The old town has narrow streets and small houses _______ are built close to each other. A. they B. where C. what D. that 31. --- Don’t you know our town at all? --- No, It is the first time I _______ here. A. was B. am coming C. came D. have come 32. Can you make sure _____ the gold ring? A. where Alice had put B. where had Alice put C. where Alice has put D. where has Alice put 33. He decided to help the poor girl _______ were killed in the earthquake. A. which parents B. parents of them C. whose parents D. whom parents 34. The settlement is home to nearly to 1,000 people, many of _______ left their village homes for a better life in the city. A. whom B. which C. them D. who 35. ---We could invite John and Barbara to the Friday night party. --- Yes, _______ ? I’ll give them a call right now. A. why not B. what for C. why D. what 第二节完形填空(共20 小题;每小题一分,满分20分) 阅读下面短文,从短文后各题所给的四个选项(A,B,C,D)中选出可以填入相应空白处的最佳选项,并在答题卡上将该选项涂黑。 This little story I’m going to tell you happened when I was about 11 years old and I’II never forget it. I was at my friend Jenny’s __36__ after school one day, and we were doing (or not doing) homework.. __37__ I was there, Jenny’s mom came over to visit. I don’t remember her name__38__ what her face looked like. I just remember her hands, her voice and the__39__ she taught me. I can still see her hand __40__ for mine in our introduction. __41__ were so

TPO1 独立写作范文

TPO-1 Independent writing task At universities and colleges, sports and social activities are just as important as classes and libraries and should receive equal financial support. 范文为博主根据考生习作修订而成,为博主和考生的共同原创作品,转载请务必注明出处谢谢! 博主修订作文的原则为:忠实原文和充分考虑作者原有水平,简约、实用、易模仿。所以追求华丽文风的同学请绕道。 如果有同学需要作文批改润色服务,可联系博主 ———————————————————————————————————————————————— Some people claim that universities and colleges should not spend a great deal of money on sports and social activities, as their budgets should prioritize classes and libraries. However, as educational institutions, universities and colleges have an obligation to provide a wide range of knowledge to their students to help them develop academically and socially. More and more businesses require people who are not only specialists in their fields, but also skilled in social interaction. In the rapidly changing environments of science and the economy, no task can be done by a single person. Given the need for collaboration, social skills and team spirit are necessary qualities for any successful person. Clearly, sports and social activities in universities and colleges provide students with the perfect settings to practice their social skills. By playing football on a team or joining in a protest, for instance, students can develop a sense of belonging and experience team spirit, both of which may prove very important in building a career. Aside from fostering social skills and team spirit, sports and social activities can greatly improve students’ efficien cy. Students need not play sports or participate in social activities at the expense of their academic study. Indeed, for some students, playing sports and joining in social activities are good stress reducers. Even the most enthusiastic people feel tired if they concentrate on one thing for too long. Frustrated and tired people can do nothing well if they do not recover from their bad situations. Students who have been frustrated in their academic work would be well-advised to play some sports or join in social activities.

英语中的缩略词

相信许多人在书写英语、阅读英语文章或者进行英语翻译时,都会遇到缩写的情况。那么,英语中的缩写有什么规律吗?相信熟悉英语的人,会遇到各种各样、形式各异的缩写。你对英语中的缩写有把握吗?如果你需要在书写中运用缩略语,你知道其中的规律吗?本文就和大家详细聊一下英语中的缩写。通过总结分析,我们发现,大家对缩写存在不确定的情况主要集中在以下几个方面: 缩写需要用大写字母吗? 缩写需要用句号吗? 什么时候可以使用缩略符呢? 英语中的缩略通常分为几种,而缩略词的写法也通常取决于它到底属于缩略中的哪一类。那么,我们就来看看英语中的缩略词分类: ?Acronyms ?Contractions ?Initialisms ?Shortenings 英语中的缩略表达主要分为四大类,我们先来说第一种,Acronyms。 Acronyms are words formed from the initial letters of other words and pronounced as they are spelled, not as separate letters. Examples include: Acronyms就是指以各个组合词首字母组成的缩写词,简称首字母组合词。Acronym是最大特点是,按照组合的首字母进行拼写,而不是将各个首字母单独读出。

?Most acronyms can be written as capital letters or with only an initial capital letter. ?绝大多数首字母缩写词可以用大写字母表示,也可以仅仅首字母大写。 ?Some acronyms are so established that they are now ‘normal’words, generally used without conscious awareness of their original full form. These words should be written in lower-case letters. ?有些首字母缩写词已经被非常广泛的使用并从而被当做是‘常规’词汇,而人们在使用这类‘常规’的首字母缩写词时,往往不会想起它们最初的完整表达形式。那么,在这种情况下,这些词应该采用小写。比如, ?

常见的英文缩写

英文缩写 VCD 视频高密光盘 IT是指信息技术,即英文Information Technology 的缩写 BT是一种P2P共享软件,全名叫"BitTorrent",中文全称:"比特流"又名"变态下载" DIY是每个电脑爱好者熟悉的新名词,是英文Do It Yourself的首字母缩写,自己动手制作的意思,硬件爱好者也被俗称DIYer. OEM是英文Original Equipment Manufacturer的缩写,意思是原设备制造商。 BBS是英文Bulletin Board System的缩写,中文意思是电子公告板系统,现在国内统称做论坛。 XP,是英文Experience(体验)的缩写, 自从微软发布Office XP后,成为软件流行命名概念. 论坛上常见文章标有zt字样,新手不知所云,其实不过是"转帖"的拼音缩写而已. ps是什么意思?在网上,常用软件一般都用缩写代替photoshop简称ps,DreamWeaver简称dw. ID是英文IDentity的缩写,ID是身份标识号码的意思. 为了使Internet上的众多电脑主机在通信时能够相互识别,Internet上的每一台主机都分配有一个唯一的32位地址,该地址称为IP地址,也称作网际地址。IP地址由4个数组成,每个数可取值0~255,各数之间用一个点号“.” MSN 即MICROSOFT NETWORK, 是微软公司的一个门户站点. MSN作为互联网上最受欢迎的一个门户, 具备了为用户提供了在线调查、浏览和购买各种产品和服务的能力. DJ是DISCO JOCIKEY(唱片骑士)的英文缩写,以DISCO为主,DJ这两个字现在已经代表了最新、最劲、最毒、最HIGH的Muisc。 URL是英文Uniform Resoure Locator的缩写,即统一资源定位器,它是WWW网页的地址 OVA是英文录象带的缩写. MC的意思是Micphone Controller的意思,翻译差不多是“控制麦克风的人”。也可以理解为Rapper,很多Rap都在自己的艺名前面加上“MC”,比如台湾的MChotdog,香港的MCYan,美国的MC Hammer等。 CS是非常流行的网络游戏,中文名是反恐精英。 SOHO,是SMALLOFFICEHOMEOFFICER的简称,意思是“在家办公”。还有就是SOHO网啦! BANNER是横幅广告,logo是图标广告. FTP是英文File Transfer Protocol的缩写,即文本传输协议。 pm是什么意思 Private Messages 论坛短信 在英语中,bug表示“臭虫”的意思。但在电脑行业却把电脑内部发生的小故障也称为 “bug”,如程序运行不畅等,这种叫法也许与臭虫不无关系。有人猜测,之所以用bug,是因为它非常简洁明快。其次,臭虫也确实使人连休息也不得安宁,如同电脑中的小故障一样,它虽小,但麻烦还是很大的。 国家: PRC 中国: People's Republic of China国际组织、机构、公司: CAAC 中国民航: Civil Aviation Administration of China SARS 非典: Severe Acute Respiratory Syndrome BSE 疯牛病: Bovine Spongiform Encephalopathy BBC British Broadcasting Corporation英国广播公司 CNN美国有线新闻网络Cable News Network 考试: NMET 全国普通高等学校入学考试: National Matriculation Entrance Test CET 大学英语等级考试: College English Test PETS 全国公共英语等级考试: The Public English Test System TOEFL 托福: Test of English as a Foreign Language IELTS 雅思: International English Language Testing System GRE (Graduate Record Examination) 美国研究生入学考试 电子、通讯: IT 信息技术: Information Technology VCD 激光视盘: Video Compact Disc GPS 全球定位系统: Global Positioning System

春风拂面范文

春风拂面 每每想起老师,仿佛总有一股春风从面前拂过。它那样温暖,那样和煦,像润物无声的细雨,慢慢地催开了我心底那美丽的花儿。 老师,您是否还记得那次午休?同学们都伏在课桌上睡觉,后来外面下起了雨,阵阵凉凉的风吹进教室,吹乱了您的头发,也吹乱了同学的衣裳。您正想起身做点什么,突然您好像想起了什么,又坐了下来——噢!只见您脱下了您的高跟鞋,轻轻地走到窗户边,悄悄地把每一扇窗户都关上了。啊!老师,您是怕吵醒了那些可爱的熟睡的孩子呀!老师,您知道吗?那一刻,我觉得自己就像是一个孩子,生活在母亲的怀抱。那一刻,我觉得您就像一 股春风,悄悄从我们身边吹过。 老师,您可曾想起那个炎热的下午?我们在楼梯口相遇,我道了一句“老师好”,您甜甜一笑,走过来,轻轻地拉了拉我的衣领:“热不热?”突然,您发现我的脖子上那些红肿的小疱,您关切地问:“呀!这是怎么回事,蚊子咬的吗?”我点了点头。没想到第二天早上,您就给我带来了驱蚊花露水。那一刻,我又惊讶又感动。望着同学们羡慕的眼光,我的心底仿佛拂过一缕春风,温暖而幸福。 老师,您是否记得那次晨读?同学们都在教室里认真地读书,忽然,我感觉有个身影从我身旁走过。一抬头,我才发现桌子上多了一瓶牛奶,上面还有几个有趣的字“喝了长高噢”。再朝那个身影望去——乌黑笔直的秀发飘在肩后,我知道,是老师您。 老师!您知道吗?在住校的那段时间,我时常会记起您,我时常会抱着那花露水瓶子睡觉,我时常会穿着您送我的带着香味的衣裳…… 一阵春风轻轻拂过,我仿佛又看到了您那熟悉的身影。 春风拂面 岁月是爱人的,每年,即使春已逝去,也会在某一个时候给人以春风拂面的感动。 我平时喜欢写作文,语文老师就鼓励我向报刊投稿。“我?能行吗?”看着老师坚定的眼睛,我下定了决心。但是一次又一次的石沉大海,我又开始灰心了。 “来,文章见报了……”声音温和,是语文老师。似乎早有预感,庆幸却又疑惑。我已忘了自己还死僵僵地坐在座位上。老师满脸笑容地离开了。“真没礼貌!”身旁的男生嘀咕道。这才醒悟:我,就是一个忽视了最简单的礼貌的人。 仿佛为我的坏心情应景一般,风儿大了,调皮地飞驰而过,掠走了我的微笑。上课后,老师就就一些难题向我们指点迷津。教材第110页……”黑板上显现出一行醒目的大字。还没等老师说完,我就心急火燎地开始复制克隆,笔走龙蛇。不料被老师撞见,批评了一句:“做题时要看清题目,再去对照答案……当时,真是无地自容,还是科代表呢!名不副实。这才明白:我,就是一个冒冒失失的人。 下课的时候,狂风骤起,刹那间将我无奈的寂寞卷入云层。学习上的退步,甚至让我怀疑自己就是一个超低智商的人。于是,一次次地摔跤、失落、放弃、流泪。感觉生活就是一座幽深的城堡,欲离去却始终找不着方向。可是,在晚饭时,当妈妈把熟悉的饭盒递到我手中,接着是匆忙的问候,我忘记了吃饭,那一刻,忽然感觉到如春风拂面,幸福无比。 我总是这样,预先准备了对待苦难的从容,却没学会从苦难中体验快活,总是注重自己的感受,却忽视了身边一直伴随提醒我的亲人朋友。同学的良言规劝是幸福,老师的悉心开导是幸福,亲人馈赠的欣悦是幸福……这样的幸福太多太多。 一路上,被人提醒,如沐春风。 [点评]这篇文章在选材时注重了“小处着眼”,写生活中时时处处有人提醒:忽视礼节,冒冒失失,忘记吃饭,这些琐碎的生活片段,写出了对生活深层次的感悟,紧扣“提醒”,揭示主题:原来,生活中有了提醒,是一种莫大的幸福。 春风拂面 春风,在我心目中是最纯洁、最高尚的。它从容淡定、缥缈虚无,同时又是世间自由不羁的精神所在,亦如那些从历史里缓缓走来的身影…· 那不羁的风,我想,是李白吧!他有"飞流直下三千尺,疑是银河落九天"的气魄,他有"仰天大笑出门去,我辈岂是蓬蒿人"的潇洒,他有"安能摧眉折腰事权贵,使我不得开心颜"

福建省福州格致中学高考英语备考听力训练二十五试题 含答案

25. 第一部分听力(共两节。满分30分) 第一节(共5小题;每小题l.5分,满分7.5分) 听下面5段对话。每段对话后有一个小题,从题中所给的A、B、C三个选项中选出最佳选项,并标在试卷的相应位置。听完每段对话后,你都有l0秒钟的时间来回答有关小题和阅读下一小题。每段对话仅读一遍。 1.What is the probable relationship between the speakers? A. Father and daughter. B. Doctor and patient. C. Teacher and student. 2.Why can’t the machine work according to the woman? A.The power may have been cut off. B.There is something wrong with it. C.Both of the speakers can’t operate it. 3.What does the woman mean? A She is tired of keeping pets. B She wants to have a dog. C She won’t have a dog as a friend. 4.What can we learn from the conversation? A.The man will invite Mary to dinner. B The man will buy his daughter a gift. C.Mary has a lovely girl. 5.Which of the following does the woman like best? A. Fishing.13.Swimming.C.Climbing. 第二节(共15小题;每小题l.5分,满分22.5分) 听下面5段对话或独白。每段对话或独自后有几个小题,从题中所给的A、B、C 三个选项中选出最佳选项,并标在试卷的相应位置。听每段对话或独白前,你将有时间阅读各个小题,每小题5秒钟;听完后,各小题将给出5秒钟的作答时间。每段对话或独自读两遍。 听第6段材料。回答第6,7题。 6.What season is it now? A Spring. B Summer. C Autumn. 7.What will the speakers do? A. Drive to San Diego. B.Have take—away food in the park. C. See an outdoor movie. 听第7段材料。回答第8,9题。 8.What color is the tie the man is looking for? A. Green. B. Blue. C. Brown.

托福独立写作机经范文5篇

As the economy and technology develop at an incredible speed in today’s society, there are an increasing number of people believing that the most important problems affecting our society today could be solved within our lifetime. In my view, however, this is out of the question and the three most significant problems affecting the society, namely war, environmental destruction and disease, will still influence us and may not be entirely resolved forever. War is a problem afflicting humans since they took into being in the first place. Wars take place when different countries have conflicting benefits, which is unlikely to be eliminated as long as the boundaries between countries still exist. For instance, in the Middle East where water is extremely scarce, many countries are in conflict in contention for water resources. Since every country acts for the sake of its own benefit, it seems that people in the region are incapable of living together in harmony. Eternal world peace, therefore, is unlikely to be achieved in our lifetime. Environmental destruction emerged not long before but it is becoming increasingly severe and can not be ignored. With the large-scale utilization of fossil fuels and rapid development of industry, huge amounts of pollutants are being produced, contaminating the environment to a large degree. The disposal of these pollutants being a tough task, it is hardly possible that we can completely get rid of them within a short period of time. Furthermore, even if we can come up with optimal methods for dealing with these substances, there remains the problem that these methods may cost too much and obstruct the development of economy. Last but not least, the well-being of people around the world is threatened by a variety of diseases, ranging from AIDS to cancer, the cures for which have not been discovered yet. Diseases set off panics among humans, affect their normal life and leave people badly off with the high medical expenses. There are thousands of scientists devoting themselves to finding cure for diseases, but new types of diseases keep emerging and there is no eliminating all of them. In summary, the problems with the most significance today are going to be passed on to our offspring. To eliminate these problems thoroughly, there is still a long way to go.

【IT专家】提取单词的第一个字符以创建首字母缩略词

本文由我司收集整编,推荐下载,如有疑问,请与我司联系 提取单词的第一个字符以创建首字母缩略词 提取单词的第一个字符以创建首字母缩略词[英]Extract first character of word to create acronym How do i use split or stringtokenizer to get only the 1st character of each word to create an acronym? It would also include the ‘‘ symbol. And it isn’t case sensitive 我如何使用split 或stringtokenizer 来获取每个单词的第一个字符来创建首字母缩 略词?它还包括’‘符号。它不区分大小写 exmaple: Some Kind Of Long String --- SKOLS 某种长串--- SKOLS another Kind of Long String --- AKOLS 另一种长串--- AKOLS string string --- s s string string --- s s The reason for this is because i have a query that populates a table, and since the column name are 3 or more words each. it stretches the table, even with a scroll bar placed, 100+ columns with long names would make it look really long. So i would like to reduce space by using only acronyms and generating a legend. 原因是因为我有一个填充表的查询,因为列名每个都是3 个或更多的单词。它延伸 了桌子,即使放置了一个滚动条,100 多个长名称的列也会让它看起来很长。因此我想 通过只使用首字母缩略词并生成一个图例来减少空间。 2 First you need to split the String at either ““ or ““. 首先,您需要将字符串拆分为””或“”。 You can use the “split”method for String. docs.oracle/javase/1.4.2/docs/api/java/lang/String.html#split(https://www.wendangku.net/doc/e37501495.html,ng.String) 您可以对String 使用“split”方法。 docs.oracle/javase/1.4.2/docs/api/java/lang/String.html#split(https://www.wendangku.net/doc/e37501495.html,ng.String) The regular expression would be either space or ampersand. Then you would use the

EHS-(环境、健康、安全的英文首字母缩写)介绍

EHS (环境、健康、安全的英文首字母缩写) 作用 EHS方针是企业对其全部环境、职业健康安全行为的原则与意图的声明,体现了企业在环境、职业健康安全保护方面的总方向和基本承诺。因此可以说EHS方针是企业在环境、职业健康安全保护方面总的指导方向和行动原则,也反映最高管理者对环境、职业健康安全行为的一个总承诺。EHS方针也是企业环境、职业健康安全领域一切活动的驱动力,涉及所有为组织或代表组织工作的人员,并可为公众所获取。 一个积极的、切实可行的EHS方针,将为企业确定环境、职业健康安全管理方面总的指导方向和行动准则,并为建立更加具体的环境、职业健康安全目标指标提供一个总体框架。方案 EHS管理体系的目标指标是针对重要的环境因素、重大的危险因素或者需要控制的因素而制定的量化控制指标。目标指标可以是保持维持型的指标,如,控制年度工伤率在千分之几以下。也可以是改进提高型,如,将某种资源的利用率提高多少个百分点。管理方案是指实现目标指标的具体行动方案。 主要内容 1、工厂平面图、营业执照; 2、建筑安全合格证/消防安全合格证; 3、工厂应急程序(地震,火灾,化学品泄露,污水泄露); 4、消防疏散演习记录; 5、公司的健康安全委员会架构及健康安全政策; 6、急救员证/药物清单及使用记录;

7、注册安全工程师证书; 8、工伤处理程序及记录; 9、特种设备使用许可证/检验合格证; 10、特种作业操作证(电工、焊工、高处作业等)特种设备操作许可证(叉车驾驶员、锅炉操作工、电梯操作工、行车操作工等); 11、食堂卫生许可证/厨工健康证; 12、环保文件/环评报告/排污许可证; 13、危险废物转移单/危险废物运输商资格证/危险废物处理商资格证; 14、饮用水测试报告/车间空气噪音测试报告//污水测试报告; 15、工人平时常规体检报告/相关工人的职业病健康体检报告; 16、一些化学品的MSDS; 17、化学品储存记录等等。[1] 含义 EHS管理体系是环境管理体系(EMS)和职业健康安全管理体系(OHSAS)两体系的整合。环境、职业健康安全管理体系,简称EHS管理体系,EHS是环境Environment、健康Health、安全Safety的缩写。 承诺 答:一是对遵守适用EHS法律、法规及其他要求的承诺;二是对事故预防、保护员工安全健康的承诺;三是对持续改进的承诺。 管理体系

福建省福州格致中学(鼓山校区)高一英语上学期第五次月考(期末)试题

福州格致中学2015级高一学段第一学期质量评定 高一年级第五次月考英语试卷 时间:100分钟分值120分 ★祝考试顺利★(完型填空启用备用卷题号不同答题卡区域注意区别) 第Ⅰ卷选择题(共两部分,满分70分) 第一部分阅读理解(共两节,满分40分) 第一节、完形填空(共20小题;每小题1.5分,满分30分) 阅读下面短文,从短文后所给各题的四个选项(A、B、C和D)中,选出可以填入空白处的最佳选项。 Most of us are highly aware of various channels through which we can obtain information on food safety. A majority of us have shown much __36___ in experts and authorities and taken a(n) __37___ part in science activities organized by such experts. __38_, we do not always form an accurate picture When there is __39_news about one brand, our trust in all brands or similar products tends to be __40___ . We would doubt not only the brand in __41___ but also similar brands when a safety issue (问题)_42___ in the news. At the same time,our purchase __43___ change as food safety incidents occur. We are becoming _44___ confident in domestic(国内的)food companies ,for they have done too little in publishing and sharing food safety __45__ so far. As a result ,we would turn to __46__ brands more often. Food safety incidents in China have attracted a lot of attention. As a matter of fact ,we only have a very _47__ knowledge base on the issue. In developed countries,there have been relatively _48__ measures and response system toward safety issues. Therefore, the __49___ in those countries are less likely to become over-panicked and form serious _50 about all brands. Food companies should pay more attention to our insights, listen to our voices, _51___ the opinions of experts and authorities ,have effective strategies for response _52____ various media channels, and __53___ information in time and face the public honestly. In addition ,both the government and an independent third - party should have a role to play in providing information about food companies__54___ in raw material selection, production and distribution (流通) However, this might require several years and we still need to learn more about food __55___ 36. A. respect B.enthusiasm C. distrust D. confidence 37. A. slight B. broad C. active D. natural 38. A. However B. Consequently C. Meanwhile D. Furthermore 39. A. negative B. exciting C. detailed D. special 40. A. protected B. affected C. increased D. preserved

TPO31 独立写作范文

托福TPO31独立写作题目Question: Do you agree or disagree with the following statement? Because the world is changing so quickly, people now are less happy or less satisfied with their lives than people were in the past. Use specific reasons and examples to support your answer. 托福TPO31独立写作范文: In modern society, there is no denying that our paces speed up. A small proportion of people who cannot keep up with this trend or resist the world for the fittest, may hold a viewpoint that a mutable society leads people to less satisfaction and mitigation of sense of happiness since such a world might cause the alienation between good fellows and contamination of our environment; however, in my view, it is not the case. As a matter of fact, in a world changing more rapidly, people tend to become satisfied much more easily. First illustration which can demonstrate my opinion can be the trail of technological development. In the past five decades, millions of new inventions had appeared, entering our daily life, making it much more convenient. To name a good example, in 1896,Benz assembled the first three-wheel car in the world. Though it brought accidents and mortality to the world, we cannot deny the enjoyment of ferocious speed brought by cars, which made our jaunt convenient. We can simply drop by a old friend 100 miles, a distance taking at least half day before, away in one hour. Another good instance is the invention of smart phone. in 1972, Martin, the chief engineer in Moto devised the first portable telephone. Then, this innovation revolutionized out life. Just simply imagine what our life looked like without cell phones: people could not keep contact with their friends and exigent messages might be blocked. Then in 2007, Steve Jobs brought iPhone with Facetime to the world. The most significant aspect of Facetime is that this app allows us to communicate with actual image, not just voices. Then, how can we state that modern society steals the sense of fraternity among people? Second demonstration which can help me develop my view that a rapidly changing society makes people more satisfied is the use of new energy resources. We may still remember the Great Smog, one of the gravest air pollution accidents in our history due to the heavily gathered inversion layer of sulfur dioxide originated with burning of coal in 1950s, London. People died of respiratory diseases after inhalation of poisonous air. How can they live happily in a world brimmed with pollutions? But nowadays, we have developed new energy resources to tackle this problem such as hydropower and wind power. To name an example, in 2007, nuclear power plants generate 14% of the total electricity of the US, thus saving tons of coals and reducing the level of inhalable particles. Recently, governments and organizations have taken measures to support researches of renewable resources and to find substitutes for fossil fuels. Natural reserves and sanctuaries are established, places of interest and spots are refurbished, and factories and manufacturers with pestilent exhaust are shut down. People therefore are provided with much more opportunities to relax by enjoying the beauty of nature.

相关文档