practical text mining with perl 데이터베이스연구실 김 민 흠. 3.7 two text application...

20
Practical Text Mining With Perl 데데데데데데데데데 데 데 데

Upload: dwain-dennis

Post on 31-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

Practical Text Mining With Perl

데이터베이스연구실

김 민 흠

Page 2: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7 Two Text Application

This section discusses two applications, which are easy to program in Perl thanks to hashes.

The first illustrates an important property of most texts, one that has consequences later in this book.

The second develops some tools that are useful for certain types of word games.

Page 3: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

Program 3.2 A concordance program that finds matches for a regular expression. The file name, regex, and text extract radius are given as command line arguments.

Page 4: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

As discussed in section 2.4.2 and 2.4.3, hyphens and apostrophes cause problems.

Using program 3.2, we can find all instances of potentially problematic punctuation.

These cases enable us to decide how to handle the punctuation so that the words in the novel change as little as possible.

Page 5: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

First dashes

Command line argument 사용

C:\>perl 78ex.pl A_Christmas_Carol.txt -- 30

Page 6: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

Second single hypens.

C:\>perl 78ex.pl A_Christmas_Carol.txt “\w-\w” 30

Page 7: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

Third apostrophes.

Apostrophes are used for quotes within quotations as well as for possessive nouns.

The latter produces one ambiguity due to possessives of plural nouns ending in s for example, seven years'.

Another possible ambiguity is a contraction with an apostrophe at either the beginning or the end of a word.

Page 8: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.1 Zipf’s Law for A Chiristmas carol

Perl 78ex.pl A_Christmas_Carol.txt “\w’\W” 30 Perl 78ex.pl A_Christmas_Carol.txt “\W’\w” 30

Page 9: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

Program 3.3 This program counts the frequency of each word in A Christmas Carol. The output is sorted by decreasing frequencies.

CSV 파일 ( 쉼표구분 파일 )프린트 됩니다 .

Page 10: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.2.1 An Aid to Crossword Puzzles

가로세로 퍼즐에 맞는 단어를 찾음 . CROSSWD.TXT 가 라인당 하나의 단어를 가지고 있기때문에 REGEX 가 작동한다 . C:\>Perl 85ex.pl “^\w{2}j\w{2}n\w$” REGEX 에 ^ 과 $ 를 사용해서 7 문자를 표시 .

Page 11: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.2.2 word Anagram

아나그램 dictionary 를 만든다 . 알파벳순서로 정렬된 각각의 단어들로 기재되어 있다 .

예 ) bdac 는 abcd 의 index 를 문자열을 가지고 있다 .

Page 12: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.7.2.3 Finding Words in a Set of Letters

한 그룹뿐아니라 서브그룹도 고려 . 예 ) 8 개의 글자로 255 개의 subset 을 만들수 있음

Program 3.6 This program finds all words formed from subsets of a group of letters.

Page 13: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.1 References and Pointers

예 )

$wordref 가 the 의 메모리 위치를 저장

디레퍼런스 : 저장된 위치의 값을 검색하는 방법

레퍼런스앞에 $ 를 붙이거나 -> 를 사용

레퍼런스 : 변수등이 지정되어 있는 위치

Page 14: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.1 References and Pointers

레퍼런스를 사용하는 법 백슬러시 , 대괄호 [ ] (anonymous array) 배열이나 연상배열을 디레퍼런스 : 레퍼런스앞에 각각 @ 와 % 를 붙임

Page 15: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.1 References and Pointers

해시배열 ( 연상배열 ) => 은 , 대신에 사용 Anonymous 해시는 중괄호 사용

Page 16: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.2 Arrays of Arrays and Beyond

Arrays of Arrays Anonymous array 의 리스트

세가지 모두 동일한 표현

By putting $data[0] into @{ } this is dereferenced.

[ ] [ ] 사이에 arrow 를 포함

Page 17: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.2 Arrays of Arrays and Beyond

Code 3.31 $#data 는 @data 의 마지막 index 부여

Page 18: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.2 Arrays of Arrays and Beyond

Code 3.32

Page 19: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

3.8.2 Arrays of Arrays and Beyond

Code 3.33

Page 20: Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks

감사합니다