lecture 1: course introduction, setuppitt.edu/~naraehan/ling1330/lecture1.pdf · 2019-08-27 · 6...
TRANSCRIPT
Lecture 1:
Course Introduction, Setup
Ling 1330/2330 Computational Linguistics
Na-Rae Han, 8/27/2019
Objectives
Course Introduction
Student survey
Syllabus: http://www.pitt.edu/~naraehan/ling1330/index.html
CourseWeb
Computational linguistics: what is, what we will learn
Introduction to Python 3
Using Python Interactive Shell (ex. IDLE, IPython)
Setup, housekeeping
A warm-up: processing a string
8/27/2019 2
Language engineering applications
3
Language engineering is at the center of many essential tools and applications: Internet search engines (Google, yahoo, Bing...)
Automatic (machine) translation systems
Voice and speech recognition (ASR telephone systems)
Smart assistants (Alexa, Siri, Cortana, Google Assistant)
Automatic essay scoring systems (e-rater)
Spelling and grammatical error correction
... and many more.
Language engineering & AI
4
Computational linguistics: an interdisciplinary field dealing with modeling of natural language from a computational perspective.
A “natural language”?
Any real-world human language.
cf. constructed, artificial, or computer languages such as: formal (logic) language, computer-programming languages, Klingon, etc.
Areas: natural language processing (NLP), natural language understanding, speech processing and synthesis, dialogue systems…
part of AI (Artificial Intelligence)
Two types of architecture
5
Approach 1: rule-based, symbolic
A finite set of hand-written rules are composed by trained linguists
A system applies these rules in a deterministic fashion
that is a beautiful work → noun
my parents work too much → verb
“work is: (1) a noun if following an adjective or preposition, (2) a verb if following a plural noun.”
Logic and semantic relations such as entailment are programmed in.
Ex. If "Every X does Y" is true, "Some X does Y" is also true.
6
Approach 2: statistical, data-driven
First, a huge collection of human-annotated text (“corpus”) is built
An algorithm automatically extracts statistical patterns from the corpus:
If work follows an adjective (ADJ), it is a noun x% of the time.
If work follows a plural noun (NNS), it is a verb y% of the time.
To process a new text, a system applies these rules in a probabilisticfashion, making a prediction
ex: mountainous/ADJ work/?
→ NN (probability 0.8) vs. VB (0.2)
Known as machine learning
This is the current dominant approach
Fueled by BIG DATA, aka corpora
Two types of architecture
7
Corpus (pl: corpora): a collection of electronic text
Some corpora are just a collection of raw texts; others contain richer linguistic annotation (part-of-speech, syntactic structure, named entities, and more)
Is an essential resource for:
data-oriented linguistic research
language engineering
These days, corpora are subsumed under …
Data. (big, small, naturally occurring…)
Corpus
Role of linguists
8
What is the role of linguists in all of these?
Linguistic expertise intersecting with engineering is in high demand. Tasks include: Build linguistic infrastructures: grammars, lexicons, ontologies
(knowledge bank)
Design and manage linguistic annotation projects
Consult for quality enhancement (machine translation systems, search results, etc.)
Build NLP systems, train models
Set standards and guidelines for localization
Working knowledge of computational linguistics is essential
Computational linguistics
8/27/2019 9
What is the goal of computational linguistics?
Represent human language in terms of information processing.
Theoretical computational linguistics Investigates the computability of human language and linguistic
theories.
Formal language theory and automata theory.
Applied computational linguistics
The focus is on building (working) models of human language.
Natural language processing (NLP) and engineering (NLE), human language technologies (HLT), artificial intelligence (AI)
Often centers around building practical applications: spell checkers, machine translation systems, automatic grader, etc.
Two distinct flavors: symbolic vs. statistical approaches
Computational linguistics course
8/27/2019 10
Old LING 1330 course description:
The class was about theoretical and symbolic computational linguistics.
In both linguistics and computer science, we need to study languages and their grammar from a mathematical point of view. This course is an introduction to the mathematical theory of language and its applications. The first half will deal mainly with elements of the theory of automata and its relation to grammars. The second half will survey ways in which this theory can be applied to English grammar and to the design of programming languages. We will concentrate on syntax, but will also pay some attention to theories of meaning.
What we learn in this class
8/27/2019 11
Theoretical computational linguistics
Formal language theory
Automata, the Chomsky hierarchy
We will cover only the basic ideas.
Applied computational linguistics
Text and corpus processing
Statistical methods using NLTK
Examples of natural language processing applications:
Spelling correction, POS tagger, machine translation, text classification
How to do well in this course
8/27/2019 12
1. Be hard-working. If you are fairly new to programming, expect to put in 4-credit worth of
work.
2. Be detail-oriented. Do not skim for "important" stuff! You should learn everything.
Pay attention to instructions & details when doing homework.
3. Be organized. Class moves fast, and there are a lot of assignments.
4. Be persistent. Coding has frustrating moments. Keep at it!
5. Ask for help. When you had enough of hair-pulling, email/office hours.
Python: script vs. shell
1. Python script is a plain-text file with .py extension.
2. Python shell is an interactive environment where Python will respond to each individual Python command.
Valuable tool for learning!
IDLE shell(Python interpreter)
Python SCRIPT file
Following up a script in shell
8/27/2019 14
Variable name is available in shell for subsequent
commands
Every command you type in is stored as command history.
You can quickly bring up and edit previously entered commands using these shortcuts (IDLE): Windows: Alt+p / Alt+n
Mac: Ctrl+p / Ctrl+n (previous/next)
But these key combos are cumbersome! Let's remap to: (Up arrow: previous command)
(Down arrow: next command)
Command history
8/27/2019 15
>>> print('Hello, world!')
Hello, world!
>>> print('Hello, worl You can edit command line
Customize command history keys
8/27/2019 16
Changing command history key bindings to:
(Up arrow: previous command)
(Down arrow: next command)
Instructions:1. From the menu go to "Options → Configure IDLE"
2. Click "Keys" tab
3. Scroll down and click on the line starting with "history-next". Click button "Get New Keys for Selection"
4. Scroll down to find "Down Arrow" and click on it.
5. The new key is now set to "<Key-Down>". Press OK.
Arrows are commonly used in shell
8/27/2019 17
6. You are prompted to name your custom key scheme. Give any name.
7. Now repeat above process for "history-previous". But this time select "Up Arrow".
Housekeeping (1)
8/27/2019 18
New to programming/Python? Follow the setup below.
Use the Python.org (https://www.python.org/) distribution, version 3.7.
Windows installation & configuration (do BOTH!): http://www.pitt.edu/~naraehan/python3/getting_started_win_install.html
http://www.pitt.edu/~naraehan/python3/getting_started_win_config.html
Mac installation: http://www.pitt.edu/~naraehan/python3/getting_started_mac_install.html
➔ continued
Housekeeping (2)
8/27/2019 19
Designated folder for coding work:
Create a folder within Documents called “ling1330” or “ling2330”, and store all your coding work there.
So this folder will be:
C:\Users\yourname\Documents\ling1330 (Windows)
/Users/yourname/Documents/ling1330 (Mac)
Verify your default "current working directory":
NOT good: things like C:\Program Files (x86)\Python37-32
Good! sensible default CWD
Warm-up: fox in sox
8/27/2019 20
Let's practice string processing.
Download this template script: http://www.pitt.edu/~naraehan/ling1330/fox_in_sox.TEMPLATE.py
Complete [1] – [5].
10 minutes
Speed-coding fox_in_sox.py
8/27/2019 21
Video of me completing the script, in 5x speed!
http://www.pitt.edu/~naraehan/ling1330/code_in_shell.mp4
Watch closely! This may well be the most important video yet… It will change your life.
How to program efficiently
8/27/2019 22
The rookie way:
You are doing all your coding in the script window, blindly.
You only use the IDLE shell to verify the output of your script.
How to program efficiently
8/27/2019 23
The seasoned Python programmer way:
After a script is run, all the variables and the objects in it are still available in IDLE shell for you to tinker with.
Experimenting in SHELL is much quicker – instant feedback!
You should go back and forth between SCRIPT and SHELLwindows, successively building up your script.
Wrap-up
8/27/2019 24
Exercise #1 out
Due Thursday 9am, on CourseWeb
Next class (Thu):
Writing structured programs