CS Theory


subset of all strings over an alphabetcan be finite or infiniteexample: L ={all binary strings with even number of 0s}To define a language with a DFA, we can define language L as L(a) = {w|δ+(q0,w) is in F}

alphabet and strings

denoted by: Σa finite set of symbols (e.g. Σ = {0,1})string: finite sequence of symbols from ΣΣ* denotes the language of all strings with symbols in Σ.ε denotes the empty string, |ε| = 0

Deterministic Finite Automata

1) Q= finite sets of state2) Σ = finite set of input symbols3) δ = transition function s.t. δ: Q X Σ -> Q4) q0 = starting state5) F = set of final statesnotation: A = (Q,Σ,δq0,F)

Extended transition function

δ+ = takes state q and string w, and returns state pDefined by induction

Regular language

A language L is regular if L=L(A) for some finite automaton AAll finite languages are regularSuppose L is a subset of binary strings of length at most n. You could have a state for each string of length<n, and then make the accepting states be the ones in L. Then you can have one additional state to once you see more than n symbols.

Non-deterministic Finite Automata

Same with DFA, only difference is the transition function.δ : Q X Σ -> P(Q) P(Q) denotes the "power set", this includes the empty set øNFA with a diagram= can have 0, 1, or more arrows coming out from each state for each symbol

ε Transitions for NFAs

Moves done spontaneously, in between processing symbolsAcceptance: accepted by NFA is there exists at least one accepting. Difference with NFA is only the δ, Now the transition function also takes epsilonδ: Q X (Σ U {ε}) -> P(Q) The output of δ(q,ε) is the set of states that one can transition from q before processing any new symbols from the alphabet

ε-Closure of a state q

set of all states that can be reached by starting at q and following a path of ε transitionscomputed inductively: q is always in ECLOSE(q). if p is in ECLOSE(q), and there is an ε transition from p to r, then r also belongs to ECLOSE(q)

The subset construction

by treating sets of states of an NFA as states of a DFA, it's possible to convert any NFA to a a DFA

Regular Expressions

user-friendly" alternative to NFA notation


Main idea: consider an automata with RE labelling edges. The language accepted by this is the union over all paths from start to accepting state. Starting point: Assu``me it has 1 accepting state or make ε transitions from the old accepting states. Then, eliminate intermediate states one at a time by adjusting the REs labelling the transitions between the states. End result: RE labelling the transition between start and accept.

Algebraic property of union (+)

Taking a union of languages is1) Commutative: L U M = M U L for any languages L, M. So R+S = S+R for RE2) Associative: L U (M U N) = (L U M) U N, same with RE3) Has identity ϕ: L U ϕ = L. R + ϕ = R4) Idempotent: L U L = L, so R + R = R

Algebraic property of concatenation (.)

Taking a concatentation of lannguages is1) Associative: L.(M.N) = (L.M).N, so R.(S.T) = (R.S).T for RE2) Has identity {ε}: L.{ε} = L, so R.ε = R3) Has Annihilator ϕ: L.ϕ=ϕ ( you need an element from set L and ϕ, but nothing is in ϕ) so R.ϕ = ϕ

Proof of distributive law

L(M U N) = LM U LN so R(S+T) = RS + RTProof:Suppose a string w is in L(M U N). String can be written w = jk, j is in L and k is in M U N. If k is in M, then w is in LM. If k is in N, then w is in LN. Either way, the element is in LM U LN

Laws involving *

ϕ* = {ε}ε* = ε (R) = R*

Testing proposed laws for regular expression

E.g. (E+T) = (ET)Think of each expression as a variable and replace each variable by a distinct symbol. The statement becomes (a+b) = (ab). We can see that both sides describes all strings using symbols a,b.

Closure property of regular languages

union, concatenation, *, intersection, complementApplying this operation to objects in the set always yields an object that is also in the set.To prove that a property is closed, find a regular expression that corresponds to the operator. This is because RE = ε-NFA = NFA = DFA

Proof of closure under complement

Take a DFA that accepts the strings in L. Make each accepting state into a non-accepting state and vice versa. Now there's a DFA that accepts the strings in L complement.Closure under intersection also deals with complement: L n M = complement(complement L U complement M)Difference:L-M = L n complement M

Closure under reversal

two proofs:1) Start with a DFA for L. Reverse all the arcs in the transition diagram. Turn start state to accepting state. Make a new start state with ε-transitions to all the previously accepting states.2) base case: R(ε) = ε, R(ϕ) =ϕ and R(a) = aInduction, suppose R(S) and R(T) is true. If E = S+T, then R(E) = R(S) + R(T). If E = ST, then R(E) = R(T)R(S), and R(E) = R(E)


Homomorphism is a function on strings that substitutes a particular string for each symbol.If h is a homomorphosis on alphabet Σ, w = a1a2...an, then we apply h to each symbol of w and concatenate in order.h(w) = h(a1)h(a2)...h(an)Applying homomorphism to a LANGUAGE - apply it to the alphabet of the language.

The Pumping Lemma

Suppose the DFA has n states and its processing a string with length > n. Some states repeat (pigeon hole principle), which means there's a portion of the path that can be traveled again and again. String w = xyz, |xy|=<n, y!=ε, xy^kz ∈ L for all k.

Contrapositive of the pumping lemma

To prove a language is not regular if for EVERY n =>0, there is some w in L of length >=n s.t. for every way of writing w as w = xyz for |xy|<=n, y!=ε, such that xy^kz is NOT in L.

Closure property to prove a language is NOT regular

L = L1 ∩ L2. If we know L is not regular and L1 is regular => L2 is NOT regular. Proof: if L2 is regular, then L should be regular because closure under interserctions

String equivalence

Two strings x and y are distinguishable with respect to L if and only if there exists a string z s.t. one of xz, yz is in L and the other is not. This partitions strings into equivalence classes

Myhill-Nerode Theorem

Theorem: language L is regular if and only if L has finite index. Index of a language = number of equivalence classes = min #states in a DFA for L. Infinite index implies irregularity.

Proof of the Myhill-Nerode Theorem

Some set of string S = {x1,x2...} are pairwise distinguishable. Means for EVERY two distinct strings in S, there is a z s.t. one of xiz, xjz is in L and one is not. Then, processing xi and xj must lead to different states (for all i, j). So the DFA must have |S| different states. E.g. L = 0^n1^n | n>=1. Consider S = {0^n|n>=0}. Each pair of 0i and 0j is distinguishable by the string 1i because 0i1i is in L, and 0j1i is not .

Testing emptiness

Representation E of language L, to decide if L = the empty set:1) if E is DFA/NFA/ε-NFA, we just need to determine if there's a path from the start state to the accept state. Takes O(n^2) time for n=|states|2) if E is RE, we test: R = R1 + R2 is empty if R1 AND R2 are empty. R = R1R2 is empty if R1 OR R2 is empty. R* is never empty because it contains ε.

Testing universality

Take the complement and see if it's empty. For DFA, complement is O(n). For NFA, RE, taking complement is exponential.

Reducing DFAs

1) Eliminate any state that can't be reached from the start2) partition remaining states into equivalence classes3) Make new DFA with one state for each of these classes.Property: minimum number of states among all DFA that accept the same language.

Context Free Grammar (CFA)

Consists of:1) Terminal symbols (e.g. the alphabet, 0, 1)2) Variables - denoted by capital letters3) Start symbol denoted by S4) a set of production rules of the form:variable-< string of variables and terminals

Language of a CFG

Set of strings of terminals that can be derived from S. L = {w|S=>*w} notation means s.t. w can be derived from S. We can derive them leftmost/rightmost

Parse Trees

show grouping of terminals to substrings.How to construct:1. Each interior node is labeled by a variable in V. Root is the start symbol.2. Leaves are terminals. Children are the consequences of a production rule.3. Leaves form a string, called "yield". We take the leaves from left to right.

Ambiguity in grammars

CFG is ambiguous if there exists a string with two different parse trees. Source of ambiguity: 1) precedence of operators not respected2) Need to group like operators from left to rightSolution: produce different variable for different levels of binding

Sentential form

any string in (V U T)* that can be derived from S

Inherently Ambiguous Language

If we find 1 language that is unambigious, then L is unambiguous. L is inherently ambiguous if all of its grammars are ambiguos

Push Down Automata ( PDA)

It is like an ε-NFA with a stack. Stack = infinite memory used Last in - first out. Transition of a PDA: In one step, decides where to go based on 1)current state 2) current input symbol/ε 3)symbol on top of the stack. When PDA consumes the input symbol it moves on to the next state, pop top of the stack, have the option to push another string.In a transition Diagram: nodes/ovals correspond to the states, arrow to the start state, double circles accepting states. Arcs labeled by: A,B/C: input A, top of the stack B, push C.

2 Types of Acceptance in PDA

1) Emptying the stack. PDA stops and accepts when stack is empty. N(A) = {w∈Σ|(q0,w,Z0)|----*(p,ε,ε) for some p in Q}2) Final State acceptance {w∈Σ|(q0,w,Z0)|----*(p,ε,X), for some p in F}

Theorem for Empty Stack and Final State acceptance

L(A) = language described by PDA A that accepts final stateN(B) = PDA B that accepts by empty stack. Theorem: a language L = L(A) if and only if L = N(B) for some PDA B

Deterministic PDAs

For a non-deterministic PDA, δ(q,a,X) = set of pairs (p,α).Deterministic PDA, δ(q,a,X) = only one (p,α). If δ(q,a,X) is non-empty, that means δ(q,ε,X) = empty. We either have an epsilon or input symbol transitions- never both. The acceptance is by final state. It is weaker than PDAs. You can simlate a DFA with a DPDA that just ignores the stack.

DPDAs acceptance by empty stack vs final state

-Empty stack is weaker for DPDAs - once a DPDA empties a stack, it cannot do anything else-If string is accepted, then it can't accept any more extension, so empty stack is prefix free. E.g. {0}* is not prefix free.-L = N(P) for some DPDA P only if L is prefix-free

DPDA -> Unambiguous CFG

If L is accepted by a DPDA, then L has an unambiguous CFG.Proof: Construction from PDA to CFG yields an unambiguous grammar,

Inclusion of language classes

CFL = PDA > unambiguous CFL > DPDA> regular

Pumping Lemma for CFLs

For every CFL L, there exists an integer n s.t. for every z in L with |z|>=n, we can write uvwxy=z with |vwx|<=n, |vx|>0, s.t. for all K>=0, uv^kwx^ky is also in L. You can pump two substrings where at least one of them not empty.

Proof of Pumping Lemma for CFL

Take a CFG G for L. let m = #variables. Choose n =b^(m+2). (b= max length of a rhs of a production rule) Suppose z in L is of length at least n. Take a parse tree for z with minimum number of nodes. All internal nodes have degree<=b. If tree height = h, then #leaves <=b^h. #leaves>=|z|>=nheight of tree>=m+2 There exists a path in the tree of length>= m + 2. It repeats a variable in the last m+1 steps.