Computer Science Location: MEC 339
Add to Calendar 2019-11-08T11:00:00 2019-11-08T11:00:00 America/New_York Computer Science Colloquium - Steven Skiena Word and Graph Embeddings for Machine Learning Abstract: Distributed word embeddings (word2vec) provide a powerful way to reduce large text corpora to concise features readily applicable to a variety of problems in NLP and data science. I will introduce word embeddings, and review several of our recent efforts to apply them for natural language processing (NLP) including the Polyglot system for entity recognition, POS tagging, and sentiment analysis) for over 100 different languages.  MEC 339

Word and Graph Embeddings for Machine Learning

Abstract:

Distributed word embeddings (word2vec) provide a powerful way to reduce large text corpora to concise features readily applicable to a variety of problems in NLP and data science. I will introduce word embeddings, and review several of our recent efforts to apply them for natural language processing (NLP) including the Polyglot system for entity recognition, POS tagging, and sentiment analysis) for over 100 different languages. 

DeepWalk is an approach we have developed to construct vertex embeddings: vector representations of vertices which are applied to a very general class of problems in data mining and information retrieval.  DeepWalk exploits an appealing analogy between sentences as sequences of words and random walks as sequences of vertices to transfer deep learning (unsupervised feature learning) techniques from natural language processing to network analysis. DeepWalk has become extremely popular, having been cited by over 2000 research papers since its publication at KDD 2014. In this talk, I will introduce the notion of graph embeddings, explain how DeepWalk constructs them, and demonstrate why they make such powerful features for machine learning applications.

About the speaker: 

Biography: Steven Skiena is Distinguished Teaching Professor of Computer Science and Director of the Institute for AI-Driven Discovery and Innovation at Stony Brook University.  His research interests include data science, bioinformatics, and algorithms. He is the author of six books, including "The Algorithm Design Manual," "The Data Science Design Manual," and "Who's Bigger: Where Historical Figures Really Rank." 

Skiena received his B.S. in Computer Science from the University of Virginia (Wahoo-Wa!) and his Ph.D. in Computer Science from the University of Illinois in 1988. He is the author of over 150 technical papers.  He is a Fellow of the American Association for the Advancement of Science (AAAS), a former Fulbright scholar, and recipient of the ONR Young Investigator Award and the IEEE Computer Science and Engineer Teaching Award. More info is available at http://www.cs.stonybrook.edu/~skiena/.

 

Host: Vicente Ordonez-Roman