Security:
What is computer security:
collection of tools and techniques designed to protect data and
thwart hackers
Network security:
protecting data during transmission
No firm boundary between them:
computer viruses may first compromise computer security on an
end-host and then spread to other machines.
RFC 2828 definitions (Internet security glossary):
Security Threat:
A potential for violation of security, which exists when there is a
circumstance, capability, action, or event that could breach
security and cause harm. That is, a thread is a possible
danger that might exploit a vulnerability.
Attack:
An assault on system security that derives from an intelligent
threat; that is, an intelligent act that is a deliberate attempt
to evade security services and violate the security policy of the
system.
Components of computer security:
Confidentiality
Integrity
Availability
Differences with cryptography
Is absolute security possible?
Consider an example of a lock on your house to protect the contents.
Does it guarantee security?
Why or why not? What are the assumptions?
How do you tell the difference between a salesman selling a security product
versus a security practitioner/researcher?
Course outline:
We will discuss various building blocks for improving security and
for doing security analysis. We will take a look at various papers
in the area.
security spans cryptography, avoiding vulnerabilities in software and
hardware, network protocols,
Newsgroup: We will use phorum.eecs.umich.edu for course communication.
Crypto background:
Cryptography: greek words meaning "secret writing"
Cryptanalysis: art of breaking codes
Cryptosystem:
Defines a mapping from plaintexts to ciphertexts using a set of keys and
a converse mapping.
Formally, it is a 5-tuple (E, D, M, K, C), where M is
the set of plaintext messages, C is the set of ciphertext messages,
K is a set of keys, and E: MxK -> C (enciphering functions) and D: CxK -> M
(deciphering functions)
Example: Caesar cipher. Shifts letters
M = {sequence of roman letters}
C = {sequence of roman letters}
K = {i: i is between 0 and 25}
E = {Ek: k belongs to K and for all m in M, c = (m + k) mod 26}
D = {Ek: k belongs to K and for all c in C, m = (c - k) mod 26}
Goal: to protect plaintext.
Attacks to break a ciphertext:
-- Ciphertext only attack: only ciphertext available to the adversary.
-- Known plaintext attack: a plaintext/ciphertext pair available to the
the adversary.
-- Chosen plaintext attack: adversary can choose the plaintext and ask
for encrypted versions. Goal: find the key used.
Attacks on Casear cipher:
Known and chosen plaintext attack: Easy. Difference gives k.
Ciphertext only attack: Can use exhaustive search. Small key space.
Classical Cryptosystems:
They are also called single-key or symmetric cryptosystems.
Main property: Given a key k and Ek, Dk is simply the inverse of Ek.
Caesar cipher is a symmetric cryptosystem.
Substitution cipher:
A map from characters in the plaintext to the ciphertext.
Casear cipher is an example of a substition cipher.
Attack: frequency analysis
CipherText: KHOOR ZRUOG
Frequency of letters in encrypted text:
G = 0.1; H = 0.1; K = 0.1; O = 0.3; R = 0.2; U = 0.1; Z = 0.1
Compute cross-correlation with English letter frequencies p('A'),
p('B'), ....,p('Z') for various values of key i.
cross-correlation(i) = 0.1 x p ('G'-i) + 0.1 x p ('H'-i) ...
Test out large values of cross-correlation(i) (starting with maximum)
to see if they work.
For example, if cross-correlation(10) is large, it indicates that i
has a good chance of being 10.
If that does not work, try the next largest value.
(Of course, in this case, one could have simply tried all keys for
decryption since there are only 26 keys, ciphertext is small,
and decryption is fast!)
Side Note: Where does cross-correlation function come from? You may
remember Covariance from your probability class.
COV(X, Y) = AVERAGE(XY) - AVERAGE(X)AVERAGE(Y)
(X, Y are series like x1, x2, x3, ..., and y1, y2, y3, ...)
AVERAGE(XY) requires adding x1y1, x2y2, etc. We are essentially
maximizing this sum in our case, thereby maximizing covariance
One-time pad:
-------------
Choose a random bit string as the encryption key.
Ciphertext = plaintext XOR key
Plaintext = key XOR ciphertext
It turns out that as long as key is truly random, ciphertext
is also random -- no frequency analysis possible.
Provably impossible to break.
Why must it be used only once?
Stream ciphers versus block ciphers:
-----------------------------------
Block ciphers:
-------------
More common. Most cryptosystems use block ciphers.
They encrypt a fixed size block.
Typical block sizes: 64, 128, 256 bits.
If encrypting a bit string that is not a multiple of the block size,
it must be padded.
The pad must be removed upon decryption.
Formally:
Given a plaintext sequence b1.b2.....bn
Where each bi is a fixed-length bit string
and a key K
ciphertext for bi: ci = Ek(bi)
Ciphertext c for m = E(k,b0).E(k,b1).E(k.b2)....
Problem:
-- Padding issue
-- Frequency analysis possible on bi
-- Chosen/known plain-text attack can help
decipher some of the mappings.
Solution: Use position of bi to change the mapping
Padding:
(A good reference on this is http://www.di-mgt.com.au/cryptopad.html)
Suppose block size = 8 bytes
Suppose last block of data = 3 bytes
Method 1: Pad with zeros/spaces?
Only add pad if not exact pad size. Else don't add.
Problem?
(only works with data such as ASCII that does not
normally contain the special character)
-- Will not work with .exe or binary executable files.
Method 2: Pad with 0x80 followed by zero bytes.
(If 0x80 exists in input, then another 0x80 must be added).
This is pretty commonly used method.
Method 3: Pad with zeros, but make the last byte contain the # of padding bytes.
Better.
But, if the input is exact length, then must add another
block.
Method 4: Pad with bytes of all the same value as the # of padding bytes.
Similar to above.
On decrypting, make sure that the pad value is in range.
Also, check that all the padded bytes have the same value.
(This is what is recommended in standards and perhaps the most common
method)
Method 5: For small messages:
-- it may be better to use random padding instead of 0's
or pad length. The last byte would contain the length
of the pad.
Advantage: same message will get encrypted differently,
reducing the risk of frequency analysis.
(But, other techniques, such as initialization vectors can
achieve the same thing).
Dealing with frequency analysis
--------------------------------
-- Find a way to make encryption dependent on position of the data.
-- Should work for arbitrary length data
Basic Mode is called ECB mode or Electronic Codebook Mode.
Plaintext: chopped into blocks (padding last block if necessary)
Encrypt/decrypt each block separately.
CBC mode: (Cipherblock chaining mode)
C0 = E(k, P0 XOR IV)
C1 = E(k, P1 XOR C0)
...
C(i+1) = E(k, P(i+1) XOR C(i))
Notice that something is required to initialize. Could use all zeros.
But then, if P0 is the same for different messages, C0 will be the
same.
IV is often public, but should use a different value for different
messages.
Decryption:
P0 = D(k, C0) XOR IV
P1 = D(k, C1) XOR C0
...
Decryption of any Pi can be done without decrypting all the previous
blocks.
For maximum security in CBC mode, it is a good idea to protect
the IV as well. Why? (Think of an attack).