当前位置：文档库 › USP-IBM-1 and USP-IBM-2 The ILP-based Systems for Lexical Sample WSD in SemEval

USP-IBM-1 and USP-IBM-2 The ILP-based Systems for Lexical Sample WSD in SemEval

USP-IBM-1and USP-IBM-2:The ILP-based Systems for Lexical Sample

WSD in SemEval-2007

Lucia Specia,Maria das Grac?as Volpe Nunes

ICMC-University of S?a o Paulo

Trabalhador S?a o-Carlense,400,S?a o Carlos,13560-970,Brazil

{lspecia,gracan}@https://www.wendangku.net/doc/5812421253.html,p.br

Ashwin Srinivasan,Ganesh Ramakrishnan

IBM India Research Laboratory

Block1,Indian Institute of Technology,New Delhi110016,India

{ashwin.srinivasan,ganramkr}@https://www.wendangku.net/doc/5812421253.html,

Abstract

We describe two systems participating of the

English Lexical Sample task in SemEval-

2007.The systems make use of Inductive

Logic Programming for supervised learning

in two different ways:(a)to build Word

Sense Disambiguation(WSD)models from

a rich set of background knowledge sources;

and(b)to build interesting features from

the same knowledge sources,which are then

used by a standard model-builder for WSD,

namely,Support Vector Machines.Both sys-

tems achieved comparable accuracy(0.851

and0.857),which outperforms considerably

the most frequent sense baseline(0.787).

1Introduction

Word Sense Disambiguation(WSD)aims to iden-tify the correct sense of ambiguous words in context. Results from the last edition of the Senseval com-petition(Mihalcea et al.,2004)have shown that,for supervised learning,the best accuracies are obtained with a combination of various types of features,to-gether with traditional machine learning algorithms based on feature-value vectors,such as Support Vec-tor Machines(SVMs)and Naive Bayes.While the features employed by these approaches are mostly considered to be“shallow”,that is,extracted from corpus or provided by shallow syntactic tools like part-of-speech taggers,it is generally thought that signi?cant progress in automatic WSD would re-quire a“deep”approach in which access to substan-tial body of linguistic and world knowledge could assist in resolving ambiguities.Although the ac-cess to large amounts of knowledge is now possi-ble due to the availability of lexicons like WordNet, parsers,etc.,the incorporation of such knowledge has been hampered by the limitations of the mod-elling techniques usually employed for https://www.wendangku.net/doc/5812421253.html,ing certain sources of information,mainly relational in-formation,is beyond the capabilities of such tech-niques,which are based on feature-value vectors. Arguably,Inductive Logic Programming(ILP)sys-tems provide an appropriate framework for dealing with such data:they make explicit provisions for the inclusion of background knowledge of any form;the richer representation language used,based on?rst-order logic,is powerful enough to capture contextual relationships;and the modelling is not restricted to being of a particular form(e.g.,classi?cation).

We describe the investigation of the use of ILP for WSD in the Lexical Sample task of SemEval-2007in two different ways:(a)the construction of models that can be used directly to disambiguate words;and(b)the construction of interesting fea-tures to be used by a standard feature-based algo-rithm,namely,SVMs,to build disambiguation mod-els.We call the systems resulting of the two differ-ent approaches“USP-IBM-1”and“USP-IBM-2”, respectively.The background knowledge is from10 different sources of information extracted from cor-pus,lexical resources and NLP tools.

In the rest of this paper we?rst present the spec-i?cation of ILP implementations that construct ILP models and features(Section2)and then describe the experimental evaluation on the SemEval-2007 Lexical Sample task data(Section3).

2Inductive Logic Programming

Inductive Logic Programming(ILP)(Muggleton, 1991)employs techniques from Machine Learning and Logic Programming to build?rst-order theo-ries or descriptions from examples and background knowledge,which are also represented by?rst-order clauses.Functionally,ILP can be characterised by two classes of programs.The?rst,predictive ILP, is concerned with constructing models(in this case, sets of rules)for discriminating accurately amongst positive and negative examples.The partial spec-i?cations provided by(Muggleton,1994)form the basis for deriving programs in this class:

?B is background knowledge consisting of a?-nite set of clauses={C1,C2,...}

?E is a?nite set of examples=E+∪E?where:–Positive Examples.E+={e1,e2, (i)

a non-empty set of de?nite clauses

–Negative Examples.E?={f2 (i)

a set of Horn clauses(this may be empty)

?H,the output of the algorithm given B and E, is acceptable if these conditions are met:

–Prior Satis?ability.B∪E?|=2

–Posterior Satis?ability.B∪H∪E?|=2

–Prior Necessity.B|=E+

–Posterior Suf?ciency.B∪H|=e1∧e2∧

...

The second category of ILP programs,descriptive ILP,is concerned with identifying relationships that hold amongst the background knowledge and exam-ples,without a view of discrimination.The partial speci?cations for programs in this class are based on the description in(Muggleton and Raedt,1994):?B is background knowledge

?E is a?nite set of examples(this may be empty)

?H,the output of the algorithm given B and E is acceptable if the following condition is met:

–Posterior Suf?ciency.B∪H∪E|=2

The intuition behind the idea of exploiting a feature-based model constructor that uses?rst-order features is that certain sources of structured infor-mation that cannot be represented by feature vectors can,by a process of“propositionalization”,be iden-ti?ed and converted in a way that they can be accom-modated in such vectors,allowing for traditional learning techniques to be employed.Essentially,this involve two steps:(1)a feature-construction step that identi?es all the features,that is,a set of clauses H,that are consistent with the constraints provided by the background knowledge B(descriptive ILP); and(2)a feature-selection step that retains some of the features based on their utility in classifying the examples,for example,each clause must entail at least one positive example(predictive ILP).In order to be used by SVMs,each clause h i in H is con-verted into a boolean feature f i that takes the value 1(or0)for any individual for which the body of the clause is true(if the body is false).Thus,the set of clauses H gives rise to a boolean vector for each individual in the set of examples.The fea-tures constructed may express conjunctions on dif-ferent knowledge sources.For example,the follow-ing boolean feature built from a clause for the verb “ask”tests whether the sentence contains the expres-sion“ask out”and the word“dinner”.More details on the speci?cations of predictive and descriptive ILP for WSD can be found in(Specia et al.,2007): f1(X)= 1expr(X,’ask out’)∧bag(X,dinner)=1 0otherwise

3Experiments

We investigate the performance of two kinds of ILP-based models for WSD:

1.ILP models(USP-IBM-1system):models con-

structed by an ILP system for predicting the correct sense of a word.

2.ILP-assisted models(USP-IBM-2system):

models constructed by SVMs for predicting the correct sense of a word that,in addition to ex-isting shallow features,use features built by an ILP system according to the speci?cation for feature construction in Section2.

The data for the English Lexical Sample task in SemEval-2007consists of65verbs and35nouns. Examples containing those words were extracted from the WSJ Penn Treebank II and Brown corpus. The number of training/test examples varies from 19/2to2,536/541(average=222.8/48.5).The senses of the examples were annotated according to OntoNotes tags,which are groupings of WordNet senses,and therefore are more coarse-grained.The number of senses used in the training examples for a given word varies from1to13(average=3.6). First-order clauses representing the following background knowledge sources,which were au-tomatically extracted from corpus and lexical re-sources or provided by NLP tools,were used to de-scribe the target words in both systems:

B1.Unigrams consisting of the5words to the right and left of the target word.

B2.5content words to the right and left of the target word.

B3.Part-of-speech tags of5words to the right and left of the target word.

B4.Syntactic relations with respect to the target word.If that word is a verb,subject and object syn-tactic relations are represented.If it is a noun,the representation includes the verb of which it is a sub-ject or object,and the verb/noun it modi?es.

B5.12collocations with respect to the target word:the target word itself,1st preposition to the right,1st and2nd words to the left and right,1st noun,1st adjective,and1st verb to the left and right. B6.A relative count of the overlapping words in the sense inventory de?nitions of each of the pos-sible senses of the target word and the words sur-rounding that target word in the sentence,according to the sense inventories provided.

B7.If the target word is a verb,its selectional restrictions,de?ned in terms of the semantic fea-tures of its arguments in the sentence,as given by LDOCE.WordNet relations are used to make the veri?cation more generic and a hierarchy of feature types is used to account for different levels of speci-?city in the restrictions.

B8.If the target word is a verb,the phrasal verbs possibly occurring in a sentence,according to the list of phrasal verbs given by dictionaries.

B9.Pairs of words in the sentence that occur fre-quently in the corpus related by verb-subject/object or subject/verb/object-modi?er relations.

B10.Bigrams consisting of adjacent words in a sentence occurring frequently in the corpus.

Of these10sources,B1–B6correspond to the so called“shallow features”,in the sense that they can be straightforwardly represented by feature vectors.

A feature vector representation of these sources is built to be used by the feature-based model construc-tor.Clausal de?nitions for B1–B10are directly used by the ILP system.

We use the Aleph ILP system(Srinivasan,1999) to construct disambiguation models in USP-IBM-1 and to construct features to be used in USP-IBM-2.Feature-based model construction in USP-IBM-2system is performed by a linear SVM(the SMO implementation in WEKA).

In the USP-IBM-1system,for each target word, equipped with examples and background knowl-edge de?nitions(B1–B10),Aleph constructs a set of clauses in line with the speci?cations for predic-tive ILP described in Section2.Positive examples are provided by the correct sense of the target word. Negative examples are generated automatically us-ing all the other senses.3-fold cross-validation on the training data was used to obtain unbiased esti-mates of the predictive accuracy of the models for a set of relevant parameters.The best average accura-cies were obtained with the greedy induction strat-egy,in conjunction with a minimal clause accuracy of2.The constructed clauses were used to predict the senses in the test data following the order of their production,in a decision-list like manner,with the addition to the end of a default rule assigning the majority sense for those cases which are not covered by any other rule.

In the USP-IBM-2system,for constructing the “good”features for each target word from B1–B10(the“ILP-based features”),we?rst selected,in Aleph,the clauses covering at least1positive exam-ple.3-fold cross-validation on the training data was performed in order to obtain the best model possi-ble using SVM with features in B1–B6and the ILP-based features.A feature selection method based on information gain with various percentages of fea-tures to be selected(1/64,...,1/2)was used,which resulted in different numbers of features for each tar-get word.

USP-IBM-1 Nouns0.882

Verbs0.817 All0.851sense(X,1):-satisfy