Comprehensive Course on Natural Language Processing

Natural Language
Processing 
[05 hours/week, 09 Credits] [Theory]
Eighth Semester: Computer Science & Engineering
Dr.M.B.Chandak
, www.mbchandak.com
hodcs@rknec.edu
Course Contents
The course is divided into following major components
Basics of Natural Language Processing and Language modeling
techniques.
Syntactic and Semantic Parsing
NLP: Applications: Information Extraction & Machine Translation
Total units: 6
Unit 1 and 2: Basics and modeling techniques
Unit 3 and 4: Syntactic and Semantic Parsing
Unit 5 and 6: Information Extraction & Machine Translation
Course Pre-requisite
Basic knowledge of English Grammar
Theoretical foundations of Computer Science
[TOFCS]
Extension of Language Processing
Python and Open Source tools
Active Class participation and Regularity
Unitized course
Unit-I:
Introduction NLP tasks in syntax, semantics, and pragmatics. Key
issues &Applications such as information extraction, question
answering, and machine translation. The problem of ambiguity. The
role of machine learning. Brief history of the field
UNIT-II
:
N-gram Language Models Role of language models. Simple N-gram
models. Estimating parameters and smoothing. Evaluating language
models.  Part Of Speech Tagging and Sequence Labeling Lexical
syntax. Hidden Markov Models. Maximum Entropy models.
Unitized course
Unit-III:
Grammar formalisms and tree banks.  Efficient parsing for context-free
grammars (CFGs). Statistical parsing and probabilistic CFGs (PCFGs).
Lexicalized PCFGs.
UNIT-IV
:
Lexical semantics and word-sense disambiguation. Compositional
semantics. Semantic Role Labeling and Semantic Parsing.
Unit - V
Named entity recognition and relation extraction. IE using sequence
labeling. Automatic summarization Subjectivity and sentiment analysis.
Unit - VI
Basic issues in MT. Statistical translation, word alignment, phrase-based
translation, and synchronous grammars.
Text and Reference books
1.
D. Jurafsky and R. Martin; Speech and Language Processing; 2nd
edition, Pearson Education, 2009.
2.
2. Allen and James; Natural Language Understanding; Second
Edition, Benjamin/Cumming, 1995. Charniack & Eugene, Statistical
Language Learning, MIT Press, 1993
3.
Web Resources
Course Outcomes
Course Outcomes
Grading Scheme: Internal Examination
Total: 40 marks
Three Test: Best TWO [15 x 2 = 30 marks]
Generally Third test will be complex.
10 marks distribution:
(i) Class participation: 03 marks [may include attendance
]
(ii)Assignment – 1: 04 marks [Design/Coding] {After T1}
(iii)Assignment– 2: 03 marks [Objective/Coding] {After T2}
(iv) Challenging problems: [Individual : 07: marks]
Introduction: Basics
Natural Language Processing (NLP) is the study of the 
computational
treatment of natural (human) language.
In other words, teaching computers 
how to understand (and
generate) human language.
It is field of 
Computer Science, Artificial Intelligence and
Computational Linguistics.
Natural language processing systems take strings of words (sentences)
as their input and 
produce structured representations capturing the
meaning
 of those strings as their output. The nature of this output
depends heavily on the task at hand.
Introduction: NLP tasks
Processing Language is complex task
Modular approach is followed
Conferences
:
ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit,
ICSLP/Eurospeech
Journals
:
Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval,
Information Processing and Management, ACM Transactions on Information Systems,
ACM TALIP, ACM TSLP
University centers
:
Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois,
Michigan, UW, Maryland, etc.
Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and
many others
Industrial research sites
:
Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs
The ACL Anthology
http://www.aclweb.org/anthology
The ACL Anthology Network (AAN)
http://clair.eecs.umich.edu/aan/index.php
Why NLP is complex
Natural language is extremely rich in form and structure, and
very ambiguous
.
How to represent meaning,
Which structures map to which meaning structures.
One input can mean many different things. Ambiguity can be at
different levels.
Lexical (word level) ambiguity
  -- different meanings of words
Syntactic ambiguity
  --  different ways to parse the sentence
Interpreting partial information
  --  how to interpret pronouns
Contextual information
  --  context of the sentence may affect the meaning of
that sentence.
Example: Ambiguity
Consider Sentence: 
“I made her duck”
Various levels of ambiguity
How many different interpretations does this sentence have?
What are the reasons for the ambiguity?
The categories of knowledge of language can be thought of as
ambiguity resolving components.
How can each ambiguous piece be resolved?
Does speech input make the sentence even more ambiguous?
Yes – deciding word boundaries
Example: Ambiguity
Some interpretations of :   
I made her duck
.
1.
I cooked 
duck
 for her.
2.
I cooked 
duck
 belonging to her.
3.
I created a toy duck which she owns.
4.
I caused her to quickly lower her head or body.
5.
I used magic and turned her into a 
duck
.
duck
 – morphologically and syntactically ambiguous: 
noun or verb.
her
 – syntactically ambiguous: dative or possessive.
make
 – semantically ambiguous:  cook or create.
make
 – syntactically ambiguous
Example: Ambiguity Resolution
Ambiguity resolution is possible by modeling language. For example:
par
t-of-speech tagging
 -- 
Deciding whether 
duck
 is verb or noun.
word-sense disambiguation
 -- 
Deciding whether 
make
 is 
create
 or
cook
 or action
lexical disambiguation
 -- 
Resolution of part-of-speech and
word-sense ambiguities are two important kinds of lexical
disambiguation.
syntactic ambiguity
 -- 
her duck
 is an example of syntactic
ambiguity, and can be addressed by probabilistic parsing
Language: Knowledge components
Phonology
 concerns how words are related to the sounds for realization.
Morphology
concerns how words are constructed from more basic
meaning units called morphemes. A morpheme is the primitive unit of
meaning in a language.
Syntax
concerns how can be put together to form correct sentences and
determines what structural role each word plays in the sentence   and what
phrases are subparts of other phrases.
Semantics
concerns what words mean and how these meaning combine
in sentences to form sentence meaning. The study of context-independent
meaning.
Language: Knowledge components
Pragmatics
concerns how sentences are used in different situations
and how use affects the interpretation of the sentence.
Discourse
concerns how the immediately preceding sentences
affect the interpretation of the next sentence.
 
For example,
interpreting pronouns and interpreting the temporal aspects of the
information.
World Knowledge
includes general knowledge about the world.
What each language user must know about the other’s beliefs and
goals.
Slide Note
Embed
Share

This eighth-semester course in Computer Science & Engineering covers the fundamentals of Natural Language Processing (NLP) including basics, modeling techniques, syntactic and semantic parsing, information extraction, and machine translation. Prerequisites include knowledge of English grammar, theoretical foundations of Computer Science, and familiarity with Python and open-source tools. The course dives deep into NLP tasks, language models, part-of-speech tagging, sequence labeling, grammar formalisms, word sense disambiguation, named entity recognition, machine translation, and more. Text and reference books by renowned authors are recommended for further study. The course outcomes include the ability to differentiate various NLP tasks, understand ambiguity, and model and preprocess language effectively.

  • NLP
  • Computer Science
  • Language Processing
  • Course
  • Syntax

Uploaded on Oct 05, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Natural Language Processing [05 hours/week, 09 Credits] [Theory] Eighth Semester: Computer Science & Engineering Dr.M.B.Chandak hodcs@rknec.edu, www.mbchandak.com

  2. Course Contents The course is divided into following major components Basics of Natural Language Processing and Language modeling techniques. Syntactic and Semantic Parsing NLP: Applications: Information Extraction & Machine Translation Total units: 6 Unit 1 and 2: Basics and modeling techniques Unit 3 and 4: Syntactic and Semantic Parsing Unit 5 and 6: Information Extraction & Machine Translation

  3. Course Pre-requisite Basic knowledge of English Grammar Theoretical foundations of Computer Science [TOFCS] Extension of Language Processing Python and Open Source tools Active Class participation and Regularity

  4. Unitized course Unit-I: Introduction NLP tasks in syntax, semantics, and pragmatics. Key issues &Applications such as information extraction, question answering, and machine translation. The problem of ambiguity. The role of machine learning. Brief history of the field UNIT-II: N-gram Language Models Role of language models. Simple N-gram models. Estimating parameters and smoothing. Evaluating language models. Part Of Speech Tagging and Sequence Labeling Lexical syntax. Hidden Markov Models. Maximum Entropy models.

  5. Unitized course Unit-III: Grammar formalisms and tree banks. Efficient parsing for context-free grammars (CFGs). Statistical parsing and probabilistic CFGs (PCFGs). Lexicalized PCFGs. UNIT-IV: Lexical semantics and word-sense disambiguation. Compositional semantics. Semantic Role Labeling and Semantic Parsing. Unit - V Named entity recognition and relation extraction. IE using sequence labeling. Automatic summarization Subjectivity and sentiment analysis. Unit - VI Basic issues in MT. Statistical translation, word alignment, phrase-based translation, and synchronous grammars.

  6. Text and Reference books 1. D. Jurafsky and R. Martin; Speech and Language Processing; 2nd edition, Pearson Education, 2009. 2. 2. Allen and James; Natural Language Understanding; Second Edition, Benjamin/Cumming, 1995. Charniack & Eugene, Statistical Language Learning, MIT Press, 1993 3. Web Resources

  7. Course Outcomes CO 1 Course Outcome Ability to differentiate various NLP tasks and understand the problem of ambiguity. Ability to model and preprocess language Ability to perform syntactical different grammars. Ability to perform semantic parsing and word sense disambiguation. Ability to perform Information Machine translation. Unit Unit 1 2 3 Unit 2 Unit 3 parsing using 4 Unit 4 5 Extraction and Unit 5,6

  8. Grading Scheme: Internal Examination Total: 40 marks Three Test: Best TWO [15 x 2 = 30 marks] Generally Third test will be complex. 10 marks distribution: (i) Class participation: 03 marks [may include attendance] (ii)Assignment 1: 04 marks [Design/Coding] {After T1} (iii)Assignment 2: 03 marks [Objective/Coding] {After T2} (iv) Challenging problems: [Individual : 07: marks]

  9. Introduction: Basics Natural Language Processing (NLP) is the study of the computational treatment of natural (human) language. In other words, teaching computers how to understand (and generate) human language. It is field of Computer Science, Computational Linguistics. Natural language processing systems take strings of words (sentences) as their input and produce structured representations capturing the meaning of those strings as their output. The nature of this output depends heavily on the task at hand. Artificial Intelligence and

  10. Introduction: NLP tasks Processing Language is complex task Modular approach is followed Conferences: ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech Journals: Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and Management, ACM Transactions on Information Systems, ACM TALIP, ACM TSLP University centers: Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois, Michigan, UW, Maryland, etc. Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and many others Industrial research sites: Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs The ACL Anthology http://www.aclweb.org/anthology The ACL Anthology Network (AAN) http://clair.eecs.umich.edu/aan/index.php

  11. Why NLP is complex Natural language is extremely rich in form and structure, and very ambiguous. How to represent meaning, Which structures map to which meaning structures. One input can mean many different things. Ambiguity can be at different levels. Lexical (word level) ambiguity -- different meanings of words Syntactic ambiguity -- different ways to parse the sentence Interpreting partial information -- how to interpret pronouns Contextual information -- context of the sentence may affect the meaning of that sentence.

  12. Example: Ambiguity Consider Sentence: I made her duck Various levels of ambiguity How many different interpretations does this sentence have? What are the reasons for the ambiguity? The categories of knowledge of language can be thought of as ambiguity resolving components. How can each ambiguous piece be resolved? Does speech input make the sentence even more ambiguous? Yes deciding word boundaries

  13. Example: Ambiguity Some interpretations of : I made her duck. 1. I cooked duck for her. 2. I cooked duck belonging to her. 3. I created a toy duck which she owns. 4. I caused her to quickly lower her head or body. 5. I used magic and turned her into a duck. duck morphologically and syntactically ambiguous: noun or verb. her syntactically ambiguous: dative or possessive. make semantically ambiguous: cook or create. make syntactically ambiguous

  14. Example: Ambiguity Resolution part-of-speech tagging -- Deciding whether duck is verb or noun. word-sense disambiguation -- Deciding whether make is create or cook or action lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing Ambiguity resolution is possible by modeling language. For example:

  15. Language: Knowledge components Phonology concerns how words are related to the sounds for realization. Morphology concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. Syntax concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. Semantics concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning.

  16. Language: Knowledge components Pragmatics concerns how sentences are used in different situations and how use affects the interpretation of the sentence. Discourse concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. World Knowledge includes general knowledge about the world. What each language user must know about the other s beliefs and goals.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#