Comprehensive Course on Natural Language Processing

Slide Note
Embed
Share

This eighth-semester course in Computer Science & Engineering covers the fundamentals of Natural Language Processing (NLP) including basics, modeling techniques, syntactic and semantic parsing, information extraction, and machine translation. Prerequisites include knowledge of English grammar, theoretical foundations of Computer Science, and familiarity with Python and open-source tools. The course dives deep into NLP tasks, language models, part-of-speech tagging, sequence labeling, grammar formalisms, word sense disambiguation, named entity recognition, machine translation, and more. Text and reference books by renowned authors are recommended for further study. The course outcomes include the ability to differentiate various NLP tasks, understand ambiguity, and model and preprocess language effectively.


Uploaded on Oct 05, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Natural Language Processing [05 hours/week, 09 Credits] [Theory] Eighth Semester: Computer Science & Engineering Dr.M.B.Chandak hodcs@rknec.edu, www.mbchandak.com

  2. Course Contents The course is divided into following major components Basics of Natural Language Processing and Language modeling techniques. Syntactic and Semantic Parsing NLP: Applications: Information Extraction & Machine Translation Total units: 6 Unit 1 and 2: Basics and modeling techniques Unit 3 and 4: Syntactic and Semantic Parsing Unit 5 and 6: Information Extraction & Machine Translation

  3. Course Pre-requisite Basic knowledge of English Grammar Theoretical foundations of Computer Science [TOFCS] Extension of Language Processing Python and Open Source tools Active Class participation and Regularity

  4. Unitized course Unit-I: Introduction NLP tasks in syntax, semantics, and pragmatics. Key issues &Applications such as information extraction, question answering, and machine translation. The problem of ambiguity. The role of machine learning. Brief history of the field UNIT-II: N-gram Language Models Role of language models. Simple N-gram models. Estimating parameters and smoothing. Evaluating language models. Part Of Speech Tagging and Sequence Labeling Lexical syntax. Hidden Markov Models. Maximum Entropy models.

  5. Unitized course Unit-III: Grammar formalisms and tree banks. Efficient parsing for context-free grammars (CFGs). Statistical parsing and probabilistic CFGs (PCFGs). Lexicalized PCFGs. UNIT-IV: Lexical semantics and word-sense disambiguation. Compositional semantics. Semantic Role Labeling and Semantic Parsing. Unit - V Named entity recognition and relation extraction. IE using sequence labeling. Automatic summarization Subjectivity and sentiment analysis. Unit - VI Basic issues in MT. Statistical translation, word alignment, phrase-based translation, and synchronous grammars.

  6. Text and Reference books 1. D. Jurafsky and R. Martin; Speech and Language Processing; 2nd edition, Pearson Education, 2009. 2. 2. Allen and James; Natural Language Understanding; Second Edition, Benjamin/Cumming, 1995. Charniack & Eugene, Statistical Language Learning, MIT Press, 1993 3. Web Resources

  7. Course Outcomes CO 1 Course Outcome Ability to differentiate various NLP tasks and understand the problem of ambiguity. Ability to model and preprocess language Ability to perform syntactical different grammars. Ability to perform semantic parsing and word sense disambiguation. Ability to perform Information Machine translation. Unit Unit 1 2 3 Unit 2 Unit 3 parsing using 4 Unit 4 5 Extraction and Unit 5,6

  8. Grading Scheme: Internal Examination Total: 40 marks Three Test: Best TWO [15 x 2 = 30 marks] Generally Third test will be complex. 10 marks distribution: (i) Class participation: 03 marks [may include attendance] (ii)Assignment 1: 04 marks [Design/Coding] {After T1} (iii)Assignment 2: 03 marks [Objective/Coding] {After T2} (iv) Challenging problems: [Individual : 07: marks]

  9. Introduction: Basics Natural Language Processing (NLP) is the study of the computational treatment of natural (human) language. In other words, teaching computers how to understand (and generate) human language. It is field of Computer Science, Computational Linguistics. Natural language processing systems take strings of words (sentences) as their input and produce structured representations capturing the meaning of those strings as their output. The nature of this output depends heavily on the task at hand. Artificial Intelligence and

  10. Introduction: NLP tasks Processing Language is complex task Modular approach is followed Conferences: ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech Journals: Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and Management, ACM Transactions on Information Systems, ACM TALIP, ACM TSLP University centers: Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois, Michigan, UW, Maryland, etc. Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and many others Industrial research sites: Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs The ACL Anthology http://www.aclweb.org/anthology The ACL Anthology Network (AAN) http://clair.eecs.umich.edu/aan/index.php

  11. Why NLP is complex Natural language is extremely rich in form and structure, and very ambiguous. How to represent meaning, Which structures map to which meaning structures. One input can mean many different things. Ambiguity can be at different levels. Lexical (word level) ambiguity -- different meanings of words Syntactic ambiguity -- different ways to parse the sentence Interpreting partial information -- how to interpret pronouns Contextual information -- context of the sentence may affect the meaning of that sentence.

  12. Example: Ambiguity Consider Sentence: I made her duck Various levels of ambiguity How many different interpretations does this sentence have? What are the reasons for the ambiguity? The categories of knowledge of language can be thought of as ambiguity resolving components. How can each ambiguous piece be resolved? Does speech input make the sentence even more ambiguous? Yes deciding word boundaries

  13. Example: Ambiguity Some interpretations of : I made her duck. 1. I cooked duck for her. 2. I cooked duck belonging to her. 3. I created a toy duck which she owns. 4. I caused her to quickly lower her head or body. 5. I used magic and turned her into a duck. duck morphologically and syntactically ambiguous: noun or verb. her syntactically ambiguous: dative or possessive. make semantically ambiguous: cook or create. make syntactically ambiguous

  14. Example: Ambiguity Resolution part-of-speech tagging -- Deciding whether duck is verb or noun. word-sense disambiguation -- Deciding whether make is create or cook or action lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing Ambiguity resolution is possible by modeling language. For example:

  15. Language: Knowledge components Phonology concerns how words are related to the sounds for realization. Morphology concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. Syntax concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. Semantics concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning.

  16. Language: Knowledge components Pragmatics concerns how sentences are used in different situations and how use affects the interpretation of the sentence. Discourse concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. World Knowledge includes general knowledge about the world. What each language user must know about the other s beliefs and goals.

Related


More Related Content