当前位置：文档库 › 机器学习趋势、前景与展望__ Machine learning Trends, perspectives, and prospects

机器学习趋势、前景与展望__ Machine learning Trends, perspectives, and prospects

Despite practical challenges,we are hopeful that informed discussions among policy-makers and the public about data and the capabilities of machine learning,will lead to insightful designs of programs and policies that can balance the goals of protecting privacy and ensuring fairness with those of reaping the benefits to scientific research and to individual and public health.Our commitments to privacy and fairness are evergreen,but our policy choices must adapt to advance them,and support new tech-niques for deepening our knowledge.

REFERENCES AND NOTES

1.M.De Choudhury,S.Counts,E.Horvitz,A.Hoff,in Proceedings

of International Conference on Weblogs and Social Media [Association for the Advancement of Artificial Intelligence (AAAI),Palo Alto,CA,2014].

2.J.S.Brownstein,C.C.Freifeld,L.C.Madoff,N.Engl.J.Med.

360,2153–2155(2009).

3.G.Eysenbach,J.Med.Internet Res.11,e11(2009).

4. D.A.Broniatowski,M.J.Paul,M.Dredze,PLOS ONE 8,e83672

(2013).

5. A.Sadilek,H.Kautz,V.Silenzio,in Proceedings of the

Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI,Palo Alto,CA,2012).

6.M.De Choudhury,S.Counts,E.Horvitz,in Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems (Association for Computing Machinery,New York,2013),pp.3267–3276.

7.R.W.White,R.Harpaz,N.H.Shah,W.DuMouchel,E.Horvitz,

Clin.Pharmacol.Ther.96,239–246(2014).

8.Samaritans Radar;https://www.wendangku.net/doc/0911582370.html,/how-we-can-help-you/

supporting-someone-online/samaritans-radar.

9.Shut down Samaritans Radar;http://bit.ly/Samaritans-after.10.U.S.Equal Employment Opportunity Commission (EEOC),29

Code of Federal Regulations (C.F.R.),1630.2(g)(2013).11.EEOC,29CFR 1635.3(c)(2013).

12.M.A.Rothstein,https://www.wendangku.net/doc/0911582370.html,w Med.Ethics 36,837–840(2008).13.Executive Office of the President,Big Data:Seizing

Opportunities,Preserving Values (White House,Washington,DC,2014);https://www.wendangku.net/doc/0911582370.html,/1TSOhiG.

14.Letter from Maneesha Mithal,FTC,to Reed Freeman,Morrison,

&Foerster LLP,Counsel for Netflix,2[closing letter](2010);https://www.wendangku.net/doc/0911582370.html,/1GCFyXR.

15.In re Facebook,Complaint,FTC File No.0923184(2012).16.FTC Staff Report,Mobile Privacy Disclosures:Building Trust Through

Transparency (FTC,Washington,DC,2013);https://www.wendangku.net/doc/0911582370.html,/1eNz8zr.17.FTC,Protecting Consumer Privacy in an Era of Rapid Change:

Recommendations for Businesses and Policymakers (FTC,Washington,DC,2012).

18.Directive 95/46/ec of the European Parliament and of The

Council of Europe,24October 1995.

19.L.Sweeney,Online ads roll the dice [blog];https://www.wendangku.net/doc/0911582370.html,/

1KgEcYg.

20.FTC,“Big data:A tool for inclusion or exclusion?”(workshop,

FTC,Washington,DC,2014);https://www.wendangku.net/doc/0911582370.html,/1SR65cv

21.FTC,Data Brokers:A Call for Transparency and Accountability

(FTC,Washington,DC,2014);https://www.wendangku.net/doc/0911582370.html,/1GCFoj5.

22.J.Podesta,“Big data and privacy:1year out ”[blog];http://bit.

ly/WHsePrivacy.

23.White House Council of Economic Advisers,Big Data and

Differential Pricing (White House,Washington,DC,2015).24.Executive Office of the President,Big Data and Differential Processing

(White House,Washington,DC,2015);https://www.wendangku.net/doc/0911582370.html,/1eNy7qR.25.Executive Office of the President,Big Data:Seizing

Opportunities,Preserving Values (White House,Washington,DC,2014);https://www.wendangku.net/doc/0911582370.html,/1TSOhiG.

26.President ’s Council of Advisors on Science and Technology

(PCAST),Big Data and Privacy:A Technological Perspective (White House,Washington,DC,2014);https://www.wendangku.net/doc/0911582370.html,/1C5ewNv.27.European Commission,Proposal for a Regulation of the European

Parliament and of the Council on the Protection of Individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation),COM(2012)11final (2012);http://bit.ly/1Lu5POv.

28.M.Schrems v.Facebook Ireland Limited ,§J.Unlawful data

transmission to the U.S.A.(“PRISM ”),?166and 167(2013);https://www.wendangku.net/doc/0911582370.html,/sk/sk_en.pdf.10.1126/science.aac4520

on two interrelated questions:How can one construct computer systems that auto-matically improve through experience?and What are the fundamental statistical-computational-information-theoretic laws that govern all learning systems,including computers,humans,and organizations?The study of machine learning is important both for addressing these fundamental scientific and engineering ques-tions and for the highly practical computer soft-ware it has produced and fielded across many applications.

Machine learning has progressed dramati-cally over the past two decades,from laboratory curiosity to a practical technology in widespread commercial use.Within artificial intelligence (AI),machine learning has emerged as the method of choice for developing practical software for computer vision,speech recognition,natural lan-guage processing,robot control,and other ap-plications.Many developers of AI systems now recognize that,for many applications,it can be far easier to train a system by showing it exam-ples of desired input-output behavior than to program it manually by anticipating the desired response for all possible inputs.The effect of ma-chine learning has also been felt broadly across computer science and across a range of indus-tries concerned with data-intensive issues,such as consumer services,the diagnosis of faults in complex systems,and the control of logistics chains.There has been a similarly broad range of effects across empirical sciences,from biology to cosmology to social science,as machine-learning methods have been developed to analyze high-throughput experimental data in novel ways.See Fig.1for a depiction of some recent areas of ap-plication of machine learning.

A learning problem can be defined as the problem of improving some measure of perform-type of training experience.For example,in learn-ing to detect credit-card fraud,the task is to as-sign a label of “fraud ”or “not fraud ”to any given credit-card transaction.The performance metric to be improved might be the accuracy of this fraud classifier,and the training experience might consist of a collection of historical credit-card transactions,each labeled in retrospect as fraud-ulent or not.Alternatively,one might define a different performance metric that assigns a higher penalty when “fraud ”is labeled “not fraud ”than when “not fraud ”is incorrectly labeled “fraud.”One might also define a different type of training experience —for example,by including unlab-eled credit-card transactions along with labeled examples.

A diverse array of machine-learning algorithms has been developed to cover the wide variety of data and problem types exhibited across differ-ent machine-learning problems (1,2).Conceptual-ly,machine-learning algorithms can be viewed as searching through a large space of candidate programs,guided by training experience,to find a program that optimizes the performance metric.Machine-learning algorithms vary greatly,in part by the way in which they represent candidate programs (e.g.,decision trees,mathematical func-tions,and general programming languages)and in part by the way in which they search through this space of programs (e.g.,optimization algorithms with well-understood convergence guarantees and evolutionary search methods that evaluate successive generations of randomly mutated pro-grams).Here,we focus on approaches that have been particularly successful to date.

Many algorithms focus on function approxi-mation problems,where the task is embodied in a function (e.g.,given an input transaction,out-put a “fraud ”or “not fraud ”label),and the learn-ing problem is to improve the accuracy of that function,with experience consisting of a sample of known input-output pairs of the function.In some cases,the function is represented explicit-ly as a parameterized functional form;in other cases,the function is implicit and obtained via a search process,a factorization,an optimization

SCIENCE https://www.wendangku.net/doc/0911582370.html, 17JULY 2015?VOL 349ISSUE 6245

255

Department of Electrical Engineering and Computer

Sciences,Department of Statistics,University of California,Berkeley,CA,USA.2Machine Learning Department,Carnegie Mellon University,Pittsburgh,PA,USA.

*Corresponding author.E-mail:jordan@https://www.wendangku.net/doc/0911582370.html, (M.I.J.);tom.mitchell@https://www.wendangku.net/doc/0911582370.html, (T.M.M.)

o n J u l y 23, 2015

w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m

procedure,or a simulation-based procedure.Even when implicit,the function generally depends on parameters or other tunable degrees of free-dom,and training corresponds to finding values for these parameters that optimize the perform-ance metric.

Whatever the learning algorithm,a key scien-tific and practical goal is to theoretically character-ize the capabilities of specific learning algorithms and the inherent difficulty of any given learning problem:How accurately can the algorithm learn from a particular type and volume of training data?How robust is the algorithm to errors in its modeling assumptions or to errors in the train-ing data?Given a learning problem with a given volume of training data,is it possible to design a successful algorithm or is this learning problem fundamentally intractable?Such theoretical char-acterizations of machine-learning algorithms and problems typically make use of the familiar frame-works of statistical decision theory and compu-tational complexity theory.In fact,attempts to characterize machine-learning algorithms the-oretically have led to blends of statistical and computational theory in which the goal is to simul-taneously characterize the sample complexity (how much data are required to learn accurately)and the computational complexity (how much computation is required)and to specify how these depend on features of the learning algorithm such as the representation it uses for what it learns (3–6).A specific form of computational analysis that has proved particularly useful in recent years has been that of optimization theory,with upper and lower bounds on rates of con-vergence of optimization procedures merging well with the formulation of machine-learning problems as the optimization of a performance metric (7,8).

As a field of study,machine learning sits at the crossroads of computer science,statistics and a variety of other disciplines concerned with auto-matic improvement over time,and inference and decision-making under uncertainty.Related dis-ciplines include the psychological study of human learning,the study of evolution,adaptive control theory,the study of educational practices,neuro-science,organizational behavior,and economics.Although the past decade has seen increased cross-talk with these other fields,we are just beginning to tap the potential synergies and the diversity of formalisms and experimental methods used across these multiple fields for studying systems that improve with experience.

Drivers of machine-learning progress The past decade has seen rapid growth in the ability of networked and mobile computing sys-tems to gather and transport vast amounts of data,a phenomenon often referred to as “Big Data.”The scientists and engineers who collect such data have often turned to machine learn-ing for solutions to the problem of obtaining useful insights,predictions,and decisions from such data sets.Indeed,the sheer size of the data makes it essential to develop scalable proce-dures that blend computational and statistical

256

17JULY 2015?VOL 349ISSUE 6245

https://www.wendangku.net/doc/0911582370.html, SCIENCE

sunglasses 0.52

antelope 0.68

Dialog

Visual Characters Semantics

Motion Syntax starfish 0.67

milk can 1.00

person 0.92

orange 0.73bird 0.78

lemon 0.86

bird 0.69

bird 0.95

isopod 0.55

Fig.1.Applications of machine learning.Machine learning is having a substantial effect on many areas of technology and science;examples of recent applied success stories include robotics and autonomous vehicle control (top left),speech processing and natural language processing (top right),neuroscience research (middle),and applications in computer vision (bottom).[The middle panel is adapted from (29).The images in the bottom panel are from the ImageNet database;object recognition annotation is by R.Girshick.]

C R E

D I T :I S T O C K /A K I N B O S T A N C I

ARTIFICIAL INTELLIGENCE

considerations,but the issue is more than the mere size of modern data sets;it is the granular,personalized nature of much of these data.Mo-bile devices and embedded computing permit large amounts of data to be gathered about in-dividual humans,and machine-learning algo-rithms can learn from these data to customize their services to the needs and circumstances of each individual.Moreover,these personalized services can be connected,so that an overall ser-vice emerges that takes advantage of the wealth and diversity of data from many individuals while still customizing to the needs and circum-stances of each.Instances of this trend toward capturing and mining large quantities of data to improve services and productivity can be found across many fields of commerce,science,and government.Historical medical records are used to discover which patients will respond best to which treatments;historical traffic data are used to improve traffic control and reduce con-gestion;historical crime data are used to help allocate local police to specific locations at spe-cific times;and large experimental data sets are captured and curated to accelerate progress in biology,astronomy,neuroscience,and other data-intensive empirical sciences.We appear to be at the beginning of a decades-long trend toward in-creasingly data-intensive,evidence-based decision-making across many aspects of science,commerce,and government.

With the increasing prominence of large-scale data in all areas of human endeavor has come a wave of new demands on the underlying machine-learning algorithms.For example,huge data sets require computationally tractable algorithms,high-ly personal data raise the need for algorithms that minimize privacy effects,and the availabil-ity of huge quantities of unlabeled data raises the challenge of designing learning algorithms to take advantage of it.The next sections survey

some of the effects of these demands on recent

work in machine-learning algorithms,theory,and practice.

Core methods and recent progress The most widely used machine-learning methods are supervised learning methods (1).Supervised learning systems,including spam classifiers of e-mail,face recognizers over images,and med-ical diagnosis systems for patients,all exemplify the function approximation problem discussed earlier,where the training data take the form of a collection of (x ,y )pairs and the goal is to produce a prediction y *in response to a query x *.The inputs x may be classical vectors or they may be more complex objects such as documents,images,DNA sequences,or graphs.Similarly,many different kinds of output y have been studied.Much progress has been made by focusing on the simple binary classification problem in which y takes on one of two values (for example,“spam ”or “not spam ”),but there has also been abun-dant research on problems such as multiclass classification (where y takes on one of K labels),multilabel classification (where y is labeled simul-taneously by several of the K labels),ranking problems (where y provides a partial order on some set),and general structured prediction problems (where y is a combinatorial object such as a graph,whose components may be required to satisfy some set of constraints).An example of the latter problem is part-of-speech tagging,where the goal is to simultaneously label every word in an input sentence x as being a noun,verb,or some other part of speech.Supervised learning also includes cases in which y has real-valued components or a mixture of discrete and real-valued components.

Supervised learning systems generally form their predictions via a learned mapping f (x ),which produces an output y for each input x (or a probability distribution over y given x ).Many different forms of mapping f exist,including decision trees,decision forests,logistic regres-sion,support vector machines,neural networks,kernel machines,and Bayesian classifiers (1).A variety of learning algorithms has been proposed to estimate these different types of mappings,and there are also generic procedures such as boost-ing and multiple kernel learning that combine the outputs of multiple learning algorithms.Procedures for learning f from data often make use of ideas from optimization theory or numer-ical analysis,with the specific form of machine-learning problems (e.g.,that the objective function or function to be integrated is often the sum over a large number of terms)driving innovations.This diversity of learning architectures and algorithms reflects the diverse needs of applications,with different architectures capturing different kinds of mathematical structures,offering different lev-els of amenability to post-hoc visualization and explanation,and providing varying trade-offs between computational complexity,the amount of data,and performance.

One high-impact area of progress in supervised learning in recent years involves deep networks,which are multilayer networks of threshold units,each of which computes some simple param-eterized function of its inputs (9,10).Deep learning systems make use of gradient-based optimiza-tion algorithms to adjust parameters throughout such a multilayered network based on errors at its output.Exploiting modern parallel comput-ing architectures,such as graphics processing units originally developed for video gaming,it has been possible to build deep learning sys-tems that contain billions of parameters and that can be trained on the very large collections of images,videos,and speech samples available on the Internet.Such large-scale deep learning systems have had a major effect in recent years in computer vision (11)and speech recognition (12),where they have yielded major improve-ments in performance over previous approaches

SCIENCE https://www.wendangku.net/doc/0911582370.html,

17JULY 2015?VOL 349ISSUE 6245

257

Input image Convolutional feature extraction 14 x 14 feature map

RNN with attention over image

Word by word generation

A A bird flying over a body of water

bird flying over a body of

water

LSTM

Fig.2.Automatic generation of text captions for images with deep networks.A convolutional neural network is trained to interpret images,and its output is then used by a recurrent neural network trained to generate a text caption (top).The sequence at the bottom shows the word-by-word focus of the network on different parts of input image while it generates the caption word-by-word.[Adapted with permission from (30)]

C R E

D I T :I S T O C K /C O R R

(see Fig.2).Deep network methods are being actively pursued in a variety of additional appli-cations from natural language translation to collaborative filtering.

The internal layers of deep networks can be viewed as providing learned representations of the input data.While much of the practical suc-cess in deep learning has come from supervised learning methods for discovering such repre-sentations,efforts have also been made to devel-op deep learning algorithms that discover useful representations of the input without the need for labeled training data (13).The general problem is referred to as unsupervised learning,a second paradigm in machine-learning research (2).

Broadly,unsupervised learning generally in-volves the analysis of unlabeled data under as-sumptions about structural properties of the data (e.g.,algebraic,combinatorial,or probabi-listic).For example,one can assume that data lie on a low-dimensional manifold and aim to identify that manifold explicitly from data.Di-mension reduction methods —including prin-cipal components analysis,manifold learning,factor analysis,random projections,and autoen-coders (1,2)—make different specific assump-tions regarding the underlying manifold (e.g.,that it is a linear subspace,a smooth nonlinear manifold,or a collection of submanifolds).An-other example of dimension reduction is the topic modeling framework depicted in Fig.3.A criterion function is defined that embodies these assumptions —often making use of general statistical principles such as maximum like-lihood,the method of moments,or Bayesian integration —and optimization or sampling algo-rithms are developed to optimize the criterion.As another example,clustering is the problem of finding a partition of the observed data (and a rule for predicting future data)in the absence of explicit labels indicating a desired partition.A wide range of clustering procedures has been developed,all based on specific assumptions regarding the nature of a “cluster.”In both clus-tering and dimension reduction,the concern with computational complexity is paramount,given that the goal is to exploit the particularly large data sets that are available if one dis-penses with supervised labels.

A third major machine-learning paradigm is reinforcement learning (14,15).Here,the infor-mation available in the training data is inter-mediate between supervised and unsupervised learning.Instead of training examples that in-dicate the correct output for a given input,the training data in reinforcement learning are as-sumed to provide only an indication as to whether an action is correct or not;if an action is incor-rect,there remains the problem of finding the correct action.More generally,in the setting of sequences of inputs,it is assumed that reward signals refer to the entire sequence;the assign-ment of credit or blame to individual actions in the sequence is not directly provided.Indeed,although simplified versions of reinforcement learning known as bandit problems are studied,where it is assumed that rewards are provided after each action,reinforcement learning problems typically involve a general control-theoretic setting in which the learning task is to learn a control strat-egy (a “policy ”)for an agent acting in an unknown dynamical environment,where that learned strat-egy is trained to chose actions for any given state,with the objective of maximizing its expected re-ward over time.The ties to research in control theory and operations research have increased over the years,with formulations such as Markov decision processes and partially observed Mar-kov decision processes providing points of con-tact (15,16).Reinforcement-learning algorithms generally make use of ideas that are familiar from the control-theory literature,such as policy iteration,value iteration,rollouts,and variance reduction,with innovations arising to address the specific needs of machine learning (e.g.,large-scale problems,few assumptions about the un-known dynamical environment,and the use of supervised learning architectures to represent policies).It is also worth noting the strong ties between reinforcement learning and many dec-ades of work on learning in psychology and neuroscience,one notable example being the use of reinforcement learning algorithms to pre-dict the response of dopaminergic neurons in monkeys learning to associate a stimulus light with subsequent sugar reward (17).

Although these three learning paradigms help to organize ideas,much current research involves blends across these categories.For example,semi-supervised learning makes use of unlabeled data to augment labeled data in a supervised learning context,and discriminative training blends ar-chitectures developed for unsupervised learning with optimization formulations that make use of labels.Model selection is the broad activity of using training data not only to fit a model but also to select from a family of models,and the fact that training data do not directly indicate

258

17JULY 2015?VOL 349ISSUE 6245

https://www.wendangku.net/doc/0911582370.html, SCIENCE

Topics

Documents

Topic proportions and assignments

gene dna

genetic .,,

0.040.020.01

life evolve organism .,,

0.020.010.01data number computer .,,

0.020.020.01brain neuron nerve .,,

0.040.020.01

genes

organism

organisms

survive?

genes

genetic

genomes

life.

computer Computer analysis

computational

numbers

predictions

genome

sequenced Fig.3.T opic models.T opic modeling is a methodology for analyzing documents,where a document is viewed as a collection of words,and the words in the document are viewed as being generated by an underlying set of topics (denoted by the colors in the figure).T opics are probability distributions across words (leftmost column),and each document is characterized by a probability distribution across topics (histogram).These distributions are inferred based on the analysis of a collection of documents and can be viewed to classify,index,and summarize the content of documents.[From (31).Copyright 2012,Association for Computing Machinery,Inc.Reprinted with permission]

C R E

D I T :I S T O C K /A K I N B O S T A N C I

ARTIFICIAL INTELLIGENCE

which model to use leads to the use of algo-rithms developed for bandit problems and to Bayesian optimization procedures.Active learn-ing arises when the learner is allowed to choose data points and query the trainer to request tar-geted information,such as the label of an other-wise unlabeled example.Causal modeling is the effort to go beyond simply discovering predictive relations among variables,to distinguish which variables causally influence others (e.g.,a high white-blood-cell count can predict the existence of an infection,but it is the infection that causes the high white-cell count).Many issues influence the design of learning algorithms across all of these paradigms,including whether data are available in batches or arrive sequentially over time,how data have been sampled,require-ments that learned models be interpretable by users,and robustness issues that arise when data do not fit prior modeling assumptions.Emerging trends

The field of machine learning is sufficiently young that it is still rapidly expanding,often by invent-ing new formalizations of machine-learning problems driven by practical applications.(An example is the development of recommendation systems,as described in Fig.4.)One major trend driving this expansion is a growing concern with the environment in which a machine-learning algorithm operates.The word “environment ”here refers in part to the computing architecture;whereas a classical machine-learning system in-volved a single program running on a single ma-chine,it is now common for machine-learning systems to be deployed in architectures that in-clude many thousands or ten of thousands of processors,such that communication constraints and issues of parallelism and distributed pro-cessing take center stage.Indeed,as depicted in Fig.5,machine-learning systems are increas-ingly taking the form of complex collections of software that run on large-scale parallel and dis-tributed computing platforms and provide a range of algorithms and services to data analysts.

The word “environment ”also refers to the source of the data,which ranges from a set of people who may have privacy or ownership con-cerns,to the analyst or decision-maker who may have certain requirements on a machine-learning system (for example,that its output be visual-izable),and to the social,legal,or political frame-work surrounding the deployment of a system.The environment also may include other machine-learning systems or other agents,and the overall collection of systems may be cooperative or ad-versarial.Broadly speaking,environments pro-vide various resources to a learning algorithm and place constraints on those resources.Increas-ingly,machine-learning researchers are formalizing these relationships,aiming to design algorithms that are provably effective in various environ-ments and explicitly allow users to express and control trade-offs among resources.

As an example of resource constraints,let us suppose that the data are provided by a set of individuals who wish to retain a degree of pri-vacy.Privacy can be formalized via the notion of “differential privacy,”which defines a probabi-listic channel between the data and the outside world such that an observer of the output of the channel cannot infer reliably whether particular individuals have supplied data or not (18).Clas-sical applications of differential privacy have involved insuring that queries (e.g.,“what is the maximum balance across a set of accounts?”)to a privatized database return an answer that is close to that returned on the nonprivate data.Recent research has brought differential privacy into contact with machine learning,where que-ries involve predictions or other inferential asser-tions (e.g.,“given the data I've seen so far,what is the probability that a new transaction is fraud-ulent?”)(19,20).Placing the overall design of a privacy-enhancing machine-learning system within a decision-theoretic framework provides users with a tuning knob whereby they can choose a desired level of privacy that takes into account the kinds of questions that will be asked of the data and their own personal utility for the an-swers.For example,a person may be willing to

reveal most of their genome in the context of research on a disease that runs in their family but may ask for more stringent protection if in-formation about their genome is being used to set insurance rates.

Communication is another resource that needs to be managed within the overall context of a distributed learning system.For example,data may be distributed across distinct physical loca-tions because their size does not allow them to be aggregated at a single site or because of ad-ministrative boundaries.In such a setting,we may wish to impose a bit-rate communication con-straint on the machine-learning algorithm.Solving the design problem under such a constraint will generally show how the performance of the learn-ing system degrades under decrease in commu-nication bandwidth,but it can also reveal how the performance improves as the number of dis-tributed sites (e.g.,machines or processors)in-creases,trading off these quantities against the amount of data (21,22).Much as in classical in-formation theory,this line of research aims at fundamental lower bounds on achievable per-formance and specific algorithms that achieve those lower bounds.

A major goal of this general line of research is to bring the kinds of statistical resources studied in machine learning (e.g.,number of data points,dimension of a parameter,and complexity of a hypothesis class)into contact with the classical computational resources of time and space.Such a bridge is present in the “probably approximately correct ”(PAC)learning framework,which studies the effect of adding a polynomial-time compu-tation constraint on this relationship among error rates,training data size,and other parameters of the learning algorithm (3).Recent advances in this line of research include various lower bounds that establish fundamental gaps in performance achievable in certain machine-learning prob-lems (e.g.,sparse regression and sparse princi-pal components analysis)via polynomial-time and exponential-time algorithms (23).The core of the problem,however,involves time-data trade-offs that are far from the polynomial/exponential boundary.The large data sets that are increas-ingly the norm require algorithms whose time and space requirements are linear or sublinear in the problem size (number of data points or num-ber of dimensions).Recent research focuses on methods such as subsampling,random projec-tions,and algorithm weakening to achieve scal-ability while retaining statistical control (24,25).The ultimate goal is to be able to supply time and space budgets to machine-learning systems in addition to accuracy requirements,with the system finding an operating point that allows such requirements to be realized.Opportunities and challenges

Despite its practical and commercial successes,machine learning remains a young field with many underexplored research opportunities.Some of these opportunities can be seen by con-trasting current machine-learning approaches to the types of learning we observe in naturally

SCIENCE https://www.wendangku.net/doc/0911582370.html,

17JULY 2015?VOL 349ISSUE 6245

259

Fig.4.Recommendation systems.A recommen-dation system is a machine-learning system that is based on data that indicate links between a set of a users (e.g.,people)and a set of items (e.g.,products).A link between a user and a product means that the user has indicated an interest in the product in some fashion (perhaps by purchas-ing that item in the past).The machine-learning prob-lem is to suggest other items to a given user that he or she may also be interested in,based on the data across all users.

C R E

D I T :I S T O C K /C O R R

animals,organizations,economies,and biological evolution.For example,whereas most machine-learning algorithms are targeted to learn one specific function or data model from one single data source,humans clearly learn many differ-ent skills and types of knowledge,from years of diverse training experience,supervised and unsupervised,in a simple-to-more-difficult se-quence (e.g.,learning to crawl,then walk,then run).This has led some researchers to begin exploring the question of how to construct com-puter lifelong or never-ending learners that op-erate nonstop for years,learning thousands of interrelated skills or functions within an over-all architecture that allows the system to im-prove its ability to learn one skill based on having learned another (26–28).Another aspect of the analogy to natural learning systems sug-gests the idea of team-based,mixed-initiative learning.For example,whereas current machine-learning systems typically operate in isolation to analyze the given data,people often work in teams to collect and analyze data (e.g.,biol-ogists have worked as teams to collect and an-alyze genomic data,bringing together diverse experiments and perspectives to make progress on this difficult problem).New machine-learning methods capable of working collaboratively with humans to jointly analyze complex data sets might bring together the abilities of machines to tease out subtle statistical regularities from massive data sets with the abilities of humans to draw on diverse background knowledge to gen-erate plausible explanations and suggest new hypotheses.Many theoretical results in machine learning apply to all learning systems,whether they are computer algorithms,animals,organ-izations,or natural evolution.As the field pro-gresses,we may see machine-learning theory and algorithms increasingly providing models for understanding learning in neural systems,

machine learning benefit from ongoing studies of these other types of learning systems.

As with any powerful technology,machine learning raises questions about which of its po-tential uses society should encourage and dis-courage.The push in recent years to collect new kinds of personal data,motivated by its eco-nomic value,leads to obvious privacy issues,as mentioned above.The increasing value of data also raises a second ethical issue:Who will have access to,and ownership of,online data,and who will reap its benefits?Currently,much data are collected by corporations for specific uses leading to improved profits,with little or no motive for data sharing.However,the potential benefits that society could realize,even from existing online data,would be considerable if those data were to be made available for public good.

To illustrate,consider one simple example of how society could benefit from data that is already online today by using this data to de-crease the risk of global pandemic spread from infectious diseases.By combining location data from online sources (e.g.,location data from cell phones,from credit-card transactions at retail outlets,and from security cameras in public places and private buildings)with online medical data (e.g.,emergency room admissions),it would be feasible today to implement a simple system to telephone individuals immediately if a person they were in close contact with yesterday was just admitted to the emergency room with an infec-tious disease,alerting them to the symptoms they should watch for and precautions they should take.Here,there is clearly a tension and trade-off between personal privacy and public health,and society at large needs to make the decision on how to make this trade-off.The larger point of this example,however,is that,although the data are already online,we do not currently have the laws,customs,culture,or mechanisms to enable

society to benefit from them,if it wishes to do so.In fact,much of these data are privately held and owned,even though they are data about each of us.Considerations such as these suggest that machine learning is likely to be one of the most transformative technologies of the 21st century.Although it is impossible to predict the future,it appears essential that society begin now to con-sider how to maximize its benefits.

REFERENCES

1.T.Hastie,R.Tibshirani,J.Friedman,The Elements of Statistical

Learning:Data Mining,Inference,and Prediction (Springer,New York,2011).

2.K.Murphy,Machine Learning:A Probabilistic Perspective

(MIT Press,Cambridge,MA,2012).

3.L.Valiant,Commun.ACM 27,1134–1142(1984).

4.V.Chandrasekaran,M.I.Jordan,Proc.Natl.Acad.Sci.U.S.A.

110,E1181–E1190(2013).

5.S.Decatur,O.Goldreich,D.Ron,SIAM https://www.wendangku.net/doc/0911582370.html,put.29,854–879

(2000).

6.S.Shalev-Shwartz,O.Shamir,E.Tromer,Using more data to

speed up training time,Proceedings of the Fifteenth Conference on Artificial Intelligence and Statistics ,Canary Islands,Spain,21to 23April,2012.

7.S.Boyd,N.Parikh,E.Chu,B.Peleato,J.Eckstein,in

Foundations and Trends in Machine Learning 3(Now Publishers,Boston,2011),pp.1–122.

8.S.Sra,S.Nowozin,S.Wright,Optimization for Machine

Learning (MIT Press,Cambridge,MA,2011).9.J.Schmidhuber,Neural Netw.61,85–117(2015).

10.Y.Bengio,in Foundations and Trends in Machine Learning 2

(Now Publishers,Boston,2009),pp.1–127.

11. A.Krizhevsky,I.Sutskever,G.Hinton,Adv.Neural Inf.Process.

Syst.25,1097–1105(2015).

12.G.Hinton et al .,IEEE Signal Process.Mag.29,82–97

(2012).

13.G.E.Hinton,R.R.Salakhutdinov,Science 313,504–507

(2006).

14.V.Mnih et al .,Nature 518,529–533(2015).

15.R.S.Sutton,A.G.Barto,Reinforcement Learning:

An Introduction (MIT Press,Cambridge,MA,1998).

16.E.Yaylali,J.S.Ivy,Partially observable MDPs (POMDPs):

Introduction and examples.Encyclopedia of Operations Research and Management Science (John Wiley,New York,2011).

17.W.Schultz,P.Dayan,P.R.Montague,Science 275,1593–1599

(1997).

18.C.Dwork,F.McSherry,K.Nissim,A.Smith,in Proceedings of

the Third Theory of Cryptography Conference ,New York,4to 7March 2006,pp.265–284.

19.A.Blum,K.Ligett,A.Roth,J.ACM 20,(2013).

20.J.Duchi,M.I.Jordan,J.Wainwright,J.ACM 61,1–57

(2014).

21.M.-F.Balcan,A.Blum,S.Fine,Y.Mansour,Distributed learning,

communication complexity and privacy.Proceedings of the 29th Conference on Computational Learning Theory ,Edinburgh,UK,26June to 1July 2012.

22.Y.Zhang,J.Duchi,M.Jordan,M.Wainwright,in Advances in

Neural Information Processing Systems 26,L.Bottou,

C.Burges,Z.Ghahramani,M.Welling,Eds.(Curran Associates,Red Hook,NY,2014),pp.1–23.

23.Q.Berthet,P.Rigollet,Ann.Stat.41,1780–1815(2013).

24.A.Kleiner,A.Talwalkar,P.Sarkar,M.I.Jordan,J.R.Stat.Soc.,

B 76,795–816(2014).

25.M.Mahoney,Found.Trends Machine Learn.3,123–224

(2011).

26.T.Mitchell et al .,Proceedings of the Twenty-Ninth Conference

on Artificial Intelligence (AAAI-15),25to 30January 2015,Austin,TX.

27.M.Taylor,P.Stone,J.Mach.Learn.Res.10,1633–1685

(2009).

28.S.Thrun,L.Pratt,Learning To Learn (Kluwer Academic Press,

Boston,1998).

29.L.Wehbe et al .,PLOS ONE 9,e112575(2014).

30.K.Xu et al .,Proceedings of the 32nd International Conference

on Machine Learning ,vol.37,Lille,France,6to 11July 2015,pp.2048–2057.

31.D.Blei,Commun.ACM 55,77–84(2012).

10.1126/science.aaa8415

260

17JULY 2015?VOL 349ISSUE 6245

https://www.wendangku.net/doc/0911582370.html, SCIENCE

Cancer genomics, energy debugging, smart buildings

In-house apps C R E D I T :I S T O C K /A K I N B O S T A N C I

ARTIFICIAL INTELLIGENCE

DOI: 10.1126/science.aaa8415

, 255 (2015);

349 Science M. I. Jordan and T. M. Mitchell Machine learning: Trends, perspectives, and prospects

This copy is for your personal, non-commercial use only.

clicking here.colleagues, clients, or customers by , you can order high-quality copies for your If you wish to distribute this article to others

here.following the guidelines can be obtained by Permission to republish or repurpose articles or portions of articles

): July 23, 2015 https://www.wendangku.net/doc/0911582370.html, (this information is current as of The following resources related to this article are available online at

https://www.wendangku.net/doc/0911582370.html,/content/349/6245/255.full.html version of this article at:

including high-resolution figures, can be found in the online Updated information and services, https://www.wendangku.net/doc/0911582370.html,/content/349/6245/255.full.html#related found at:

can be related to this article A list of selected additional articles on the Science Web sites https://www.wendangku.net/doc/0911582370.html,/content/349/6245/255.full.html#ref-list-1, 3 of which can be accessed free:

cites 17 articles This article https://www.wendangku.net/doc/0911582370.html,/content/349/6245/255.full.html#related-urls 1 articles hosted by HighWire Press; see:cited by This article has been

https://www.wendangku.net/doc/0911582370.html,/cgi/collection/comp_math Computers, Mathematics

subject collections:This article appears in the following registered trademark of AAAS.

is a Science 2015 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science o n J u l y 23, 2015

w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m