Tuesday, October 30, 2018

Let's read: Norvig's AI, chap 22, 23

Continuing from last post.

Chapter 22-25 are about interactions with the environment:

  • 22 is about reading and 
  • 23 is about writing. 
Listening and speaking are not touched upon, but it's not a problem, considering that current technologies for speech-to-text and text-to-speech are pretty much perfect.
  • 24 is about feeling (perception) and 
  • 25 is about moving (robotics).

Probabilistic Language


Languages are ambiguous, in meaning, spelling, etc, so everything should be probabilistic.

N-Grams

Given a corpus (a very large sequence of characters), $P(c_{1:N}) =$ the frequency of a sequence of character sequence $c_{1:N}$. An n-gram model is a Markov chain of order $n − 1$.
\[P(c_{n} = c | c_{1:(n-1)}) = P(c_{n} = c | c_{n-N+1 : n-1})\]
which by Markov chain property gives
\[P(c_{1:n}) = \prod_{i=1}^{n} P(c_{i} | c_{i-N+1:i-1}) \](We brush aside the edge case of what to do with $c_i$ when $i < 1$. A cheap solution is to fill it with whitespaces.)

Bayesian language classification/text categorification

\[l^* = \underset{l}{\operatorname{argmax}}P(l)P(c_{1:n} | l) \]
\[P(c_{1:n} | l) = \prod_{i=1}^{n} P(c_{i} | c_{i-N+1:i-1}; l)\]
Can be used to do spam filtering, language recognition (hard to do with Nynorsk vs Bokmål though), etc.

Perplexitiy

Given a sequence of characters/words $c_{1:n}$, the raw number $P(c_{1:n})$, the probability that this particular sequence would be what you get if you pick a random one among all sequences of $n$ characters/words, is usually too small to be numerically stable, though, and so we define perplexity.
\[Perp(c_{1:n} ) = P(c_{1:n})^{-\frac{1}{n}}\]
Its log is in a way, the average entropy of the sequence. A high entropy means high randomness (Pinkie Pie speech), a low entropy means low randomness (Applejack speech).
\[\log(Prep(c_{1:n})) = -\frac{1}{n}\sum_{i=1}^{n} \log P(c_i|c_{i-N+1: i-1})\]

Compression Classification

If a message is spam, it should compress well when appended to a corpus of spams. This is the idea behind Spam Filtering Using Statistical Data Compression Models (2006). It works quite well even with standard general-purpose compression algo, showing a relation between compression and intelligence.

Information Retrieval

Use a query language to get results from a corpus.
Boolean keywords model, BM25, PageRank, HITS, Jeopardy Watson...

\[\text{Precision} = \frac{\text{relevant retrieved}}{\text{retrieved}}\]
\[\text{Recall} = \frac{\text{relevant retrieved}}{\text{relevant}}\]
Note the similarity with type 1 and type 2 errors...

Machine reading: too much technical stuff I won't need, skipped.

A cool quote

Natural languages abhor absolute synonyms just as nature abhors a vacuum. ---- Lexical Semantics (1986), D. Alan Cruse

Chapter 23 is about speaking and writing 

*nods nods*
The horse raced past the barn fell. ---- Lasts words of the horse raced past the barn.
Grammer, syntactics, semantics, etc. All of them are computational linguistics stuff that I won't need. Skipped.

S Bitch wreck ignition

Or speech recognition.

Back in 2009 when this book was written, an error rate of 10% was standard. Now there's human parity, as achieved by Microsoft in 2017.

No comments:

Post a Comment

Let's Read: Neuropath (Bakker, 2009)

Neuropath  (Bakker 2009) is a dramatic demonstration of the eliminative materialism worldview of the author R. Scott Bakker. It's very b...