A Sad Pony Mathematician's blog: Let's read: Norvig's AI, chap 22, 23

Tuesday, October 30, 2018

Let's read: Norvig's AI, chap 22, 23

Continuing from last post.

Chapter 22-25 are about interactions with the environment:

22 is about reading and
23 is about writing.

Listening and speaking are not touched upon, but it's not a problem, considering that current technologies for speech-to-text and text-to-speech are pretty much perfect.

24 is about feeling (perception) and
25 is about moving (robotics).

Probabilistic Language

Languages are ambiguous, in meaning, spelling, etc, so everything should be probabilistic.

N-Grams

Given a corpus (a very large sequence of characters),

$P(c_{1:N}) =$ the frequency of a sequence of character sequence

$c_{1:N}$ . An n-gram model is a Markov chain of order

$n − 1$ .

$P(c_{n} = c | c_{1:(n-1)}) = P(c_{n} = c | c_{n-N+1 : n-1})$
which by Markov chain property gives

$P(c_{1:n}) = \prod_{i=1}^{n} P(c_{i} | c_{i-N+1:i-1})$ (We brush aside the edge case of what to do with

$c_i$ when

$i < 1$ . A cheap solution is to fill it with whitespaces.)

Bayesian language classification/text categorification

$l^* = \underset{l}{\operatorname{argmax}}P(l)P(c_{1:n} | l)$

$P(c_{1:n} | l) = \prod_{i=1}^{n} P(c_{i} | c_{i-N+1:i-1}; l)$
Can be used to do spam filtering, language recognition (hard to do with Nynorsk vs Bokmål though), etc.

Perplexitiy

Given a sequence of characters/words

$c_{1:n}$ , the raw number

$P(c_{1:n})$ , the probability that this particular sequence would be what you get if you pick a random one among all sequences of

$n$ characters/words, is usually too small to be numerically stable, though, and so we define perplexity.

$Perp(c_{1:n} ) = P(c_{1:n})^{-\frac{1}{n}}$
Its log is in a way, the average entropy of the sequence. A high entropy means high randomness (Pinkie Pie speech), a low entropy means low randomness (Applejack speech).

$\log(Prep(c_{1:n})) = -\frac{1}{n}\sum_{i=1}^{n} \log P(c_i|c_{i-N+1: i-1})$

Compression Classification

If a message is spam, it should compress well when appended to a corpus of spams. This is the idea behind Spam Filtering Using Statistical Data Compression Models (2006). It works quite well even with standard general-purpose compression algo, showing a relation between compression and intelligence.

Information Retrieval

Use a query language to get results from a corpus.
Boolean keywords model, BM25, PageRank, HITS, Jeopardy Watson...

$\text{Precision} = \frac{\text{relevant retrieved}}{\text{retrieved}}$

$\text{Recall} = \frac{\text{relevant retrieved}}{\text{relevant}}$
Note the similarity with type 1 and type 2 errors...

Machine reading: too much technical stuff I won't need, skipped.

A cool quote

Natural languages abhor absolute synonyms just as nature abhors a vacuum. ---- Lexical Semantics (1986), D. Alan Cruse

Chapter 23 is about speaking and writing

*nods nods*

The horse raced past the barn fell. ---- Lasts words of the horse raced past the barn.

Grammer, syntactics, semantics, etc. All of them are computational linguistics stuff that I won't need. Skipped.

S Bitch wreck ignition

Or speech recognition.

Back in 2009 when this book was written, an error rate of 10% was standard. Now there's human parity, as achieved by Microsoft in 2017.

A Sad Pony Mathematician's blog

Tuesday, October 30, 2018

Let's read: Norvig's AI, chap 22, 23

Probabilistic Language

N-Grams

Bayesian language classification/text categorification

Perplexitiy

Compression Classification

Information Retrieval

A cool quote

Chapter 23 is about speaking and writing

S Bitch wreck ignition

No comments:

Post a Comment

Let's Read: Neuropath (Bakker, 2009)

Report Abuse

Labels