Slight Introduction to Number Theory

  • A prime number is an integer that can only be divided without remainder by positive and negative values of itself and 1. Prime numbers play a critical role both in number theory and in cryptography.
  • Discrete logarithms are fundamental to a number of public-key algorithms. Discrete logarithms are analogous to ordinary logarithms, but operate over modular arithmetic.
  • 1 is not considered as prime number.

Prime Numbers

  • Fundamental Theorem of Arithmetic:
    • Any integer a > 1 can be factorized in a unique way as: .
    • Where p1 < p2 < … < pt are prime numbers and where each is a positive integer.
  • The value of any given positive integer can be specified by simply listing all the nonzero exponents in the foregoing formulation: The integer 12 is represented by {a2 = 2, a3 = 1}.
  • Multiplication of two numbers is equivalent to adding the corresponding exponents. Given and . Define. We know that the integer k can be expressed as the product of powers of primes: . It follows that for all .
    • Example:

  • Any integer of the form pn can be divided only by an integer that is of a lesser or equal power of the same prime number pj with j
    n. Thus, we can say the following:
    • Given and . If a|b then for all p.

Fermat’s Theorem

  • Fermat’s theorem states the following:
    • If p is prime and a is a positive integer not divisible by p, then:
    • If p is prime and a is a positive integer, then: .
  • Euler’s Totient Function:
    • Euler’s totient function written as , defined as the number of positive integers less than n and relatively prime to n. By convention, .
    • It should be clear that for a prime number p, .
    • Now suppose that we have two prime numbers p and q, with . Then we can show that for .
      • .
  • Euler’s Theorem:
    • Euler’s theorem states that for every a and n that are relatively prime:

 

Discrete Logarithms

  • Discrete logarithms are fundamental to a number of public-key algorithms, including Diffie-Hellman key exchange and the digital signature algorithm (DSA).
  • More generally, we can say that the highest possible exponent to which a number can belong (mod n) is (n). If a number is of this order, it is referred to as a primitive root of n. The importance of this notion is that if a is a primitive root of n, then its powers are distinct (mod n) and are all relatively prime to n. In particular, for a prime number p, if a is a primitive root of p, then .
  • for any integer b and a primitive root a of prime number p, we can find a unique exponent i such that

    1

  • This exponent i is referred to as the discrete logarithm of the number b for the base a (mod p). We denote this value as dloga.p(b).
  • Properties of Discrete Logarithms:
    • dloga.p(1) = 0 -> a0mod n = 1
    • dloga.p(a) = 1 -> a1 mod n = a
    • dloga.p(XY) = dloga.p(X) + dloga.p(Y)
    • dloga.p(XY) = Y dloga.p(X).

The Least Mean-Square Algorithm

  • Least Mean Square (LMS) algorithm is online learning algorithm developed by Widrow and Hoff in 1960.
  • Rosenblatt’s perceptron was the first learning algorithm for solving linearly separable pattern-classification problem.
  • LMS algorithm was the first linear adaptive-filtering algorithm for solving problems such as prediction and communication-channel equalization.
  • Advantages behind LMS Algorithm:
    • Computationally Efficient: Its complexity is linear with respect to adjustable parameters.
    • Simple to code and easy to build.
    • Robust with respect to external disturbances.
  • Gauss-Newton Method makes a balance between computational complexity and convergence behavior.
  • Diagonal loading concept.
  • Stabilizer term concept.
  • A stochastic process has the Markov property if the conditional probability distribution of future states of the process depends only upon the present state; that is, given the present, the future does not depend on the past. A process with this property is called Markov process. The term strong Markov property is similar to this, except that the meaning of “present” is defined in terms of a certain type of random variable, which might be specified in terms of the outcomes of the stochastic process itself, known as a stopping time.
  • A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved state. An HMM can be considered as the simplest dynamic Bayesian network
  • Questions:
    • 128-In Gauss-Newton Method, Why we’ve get sum over squres of error signal?
      • To make the value positive.
    • Why not just get the sum of the error itself.
    • 128- “Linearize the dependence of e(i) on w”, Is That mean, trying to find a linear function that maps the dependence between e(i) and w?
    • 128-How equation 3.18 has linearized e(i) and w?
    • 129-Why they call it Jacobean matrix?
    • 129-How we’ve calculated equation 3.20
    • 129-What’s meant by nonsingular matrix multiplication?
    • 129-Last paragraph that’s talking about Jacobean matrix conditions.

Confidentiality using Symmetric Encryption

  • In a distributed environment, encryption devices can be placed to support:
    • Link encryption:
      • Each vulnerable communications link is equipped on both ends with an encryption device.
    • End-to-end encryption:
      • The encryption process is carried out at the two end systems.
  • Traffic padding in a countermeasure for traffic analysis attacks, which involves sending random bits during periods when no encrypted data are available for transmission.
  • Key distribution is the function that delivers a key to two parties who wish to exchange secure encrypted data.
  • Key distribution often involves the use of master keys, which are infrequently used and are long lasting, and session keys, which are generated and distributed for temporary use between two parties.
  • Confidentiality ensures that information is accessible only to those authorized to have access.
  • Authentication means verifying the identity of a user logging onto a network. Passwords, digital certificates, smart cards and biometrics can be used to prove the identity of the user to the network.

Placement of Encryption Function

  • This section examines the potential locations of security attacks and then looks at the two major approaches to encryption placement: link and end to end.

Potential Locations for Security Attacks

  • For active attacks, the attacker needs to gain physical control of a portion of the link and be able to insert and capture transmissions.
  • For a passive attack, the attacker merely needs to be able to observe transmissions.
  • Twisted pair and coaxial cable can be attacked using either:
    • Invasive taps:
      • Allows both active and passive attacks
    • Inductive devices:
      • That monitors electromagnetic emanations and helpful only for passive attacks.
  • Neither type of tap is as effective with optical fiber.
  • Active attacks on microwave and satellite are also possible, although they are more difficult technically and can be quite expensive.
  • An attack can take the form of attempts to modify the hardware or software, to gain access to the memory of the processor, or to monitor the electromagnetic emanations.
  • Thus, there are a large number of locations at which an attack can occur. Furthermore, for wide area communications, many of these locations are not under the physical control of the end user.
  • If encryption is to be used to counter these attacks, then we need to decide:
    • What to encrypt.
    • Where the encryption gear should be located.
  • As Figure 7.2 indicates, there are two fundamental alternatives: link encryption and end-to-end encryption:

  • In link encryption,
    • All traffic over all communications links is secured.
    • The message must be decrypted each time it enters a switch (such as a frame relay switch) because the switch must read the address (logical connection number) in the packet header in order to route the frame. Thus, the message is vulnerable at each switch. If working with a public network, the user has no control over the security of the nodes.
    • Many keys must be provided for more effectiveness.
    • For this strategy to be effective:
      • All the potential links in a path from source to destination must use link encryption.
      • Each pair of nodes that share a link should share a unique key, with a different key used on each link.
  • With end-to-end encryption,
    • The user data are secure.
    • However, the traffic pattern is not, because packet headers are transmitted in the clear.
    • On the other hand, end-to-end encryption does provide a degree of authentication.
      • If two end systems share an encryption key, then a recipient is assured that any message that it receives comes from the alleged sender, because only that sender shares the relevant key. Such authentication is not inherent in a link encryption scheme.
  • To achieve greater security, both link and end-to-end encryption are needed.

  • In terms of the Open Systems Interconnection (OSI) model, link encryption occurs at either the physical or link layers.
  • See figure below. Note that: The terms red and black are frequently used. Red data are sensitive or classified data in the clear. Black data are encrypted data

Traffic Confidentiality

  • Knowledge about the number and length of messages between nodes may enable an opponent to determine who is talking to whom.
  • Types of information that can be derived from a traffic analysis attack:
    • Identities of partners.
    • How frequently the partners are communicating.
    • Message pattern, message length, or quantity of messages that suggest important information is being exchanged.
    • The events that correlate with special conversations between particular partners.
  • Another concern related to traffic is the use of traffic patterns to create a covert channel.
  • A covert channel is a means of communication in a fashion unintended by the designers of the communications facility. Typically, the channel is used to transfer information in a way that violates a security policy.
  • Link Encryption approach to counter traffic analysis:
    • Traffic padding produces ciphertext output continuously, even in the absence of plaintext. A continuous random data stream is generated. When plaintext is available, it is encrypted and transmitted. When input plaintext is not present, random data are encrypted and transmitted. This makes it impossible for an attacker to distinguish between true data flow and padding and therefore impossible to deduce the amount of traffic.
  • End-To-End approach to counter traffic analysis:
    • One technique that might prove useful is to pad out data units to a uniform length at either the transport or application level.
    • In addition, null messages can be inserted randomly into the stream.
    • These tactics deny an opponent knowledge about the amount of data exchanged between end users and obscure the underlying traffic pattern.

Key Distribution

  • Key distribution technique, a term that refers to the means of delivering a key to two parties who wish to exchange data, without allowing others to see the key.
  • Ways of key distribution:
    • A can select a key and physically deliver it to B.
    • Third party can select the key and physically deliver it to A and B.
    • If A and B have previously and recently used a key, one party can transmit the new key to the other, encrypted using the old key.
    • If A and B each has an encrypted connection to a third party C, C can deliver a key on the encrypted links to A and B.

    Key Distribution Center

  • The use of a key distribution center (KDC) is based on the use of a hierarchy of keys.
  • At a minimum, two levels of keys are used (Figure 7.8).
  • Communication between end systems is encrypted using a temporary key, often referred to as a session key.
  • Typically, the session key is used for the duration of a logical connection, such as a frame relay connection or transport connection, and then discarded.
  • Each session key is obtained from the key distribution center over the same networking facilities used for end-user communication.
  • Accordingly, session keys are transmitted in encrypted form, using a master key that is shared by the key distribution center and an end system or user.
  • For each end system or user, there is a unique master key that it shares with the key distribution center.

  • A Key Distribution Scenario: page 212-214.
  • It is not necessary to limit the key distribution function to a single KDC. Indeed, for very large networks, it may not be practical to do so.
  • Advantages of hierarchical scheme:
    • Minimizes the effort involved in master key distribution, because most master keys are those shared by a local KDC with its local entities.
    • Limits the damage of a faulty or subverted KDC to its local area only.

     

    Session Key Life Time

  • The more frequently session keys are exchanged, the more secure they are.
  • On the other hand, the distribution of session keys delays the start of any exchange and places a burden on network capacity.
  • A security manager must try to balance these competing considerations in determining the lifetime of a particular session key.

Key Distribution

 

  • In figure 7.10 the security service is called “Session Security Module” (SSM).
  • The automated key distribution approach provides the flexibility and dynamic characteristics needed to allow a number of terminal users to access a number of hosts and for the hosts to exchange data with each other.
  • The use of a key distribution center imposes the requirement that the KDC be trusted and be protected from subversion. This requirement can be avoided if key distribution is fully decentralized. Although full decentralization is not practical for larger networks using symmetric encryption only, it may be useful within a local context.
  • A decentralized approach requires that each end system be able to communicate in a secure manner with all potential partner end systems for purposes of session key distribution.

Controlling Key Usage

  • Types of session keys:
    • Data-encrypting key, for general communication across a network.
    • PIN-encrypting key, for personal identification numbers (PINs) used in electronic funds transfer and point-of-sale applications.
    • File-encrypting key, for encrypting files stored in publicly accessible locations.
  • Extra 8 Tag bits in each 64-bit DES key that represents identification for key have the following interpretation:
    • One bit indicates whether the key is a session key or a master key.
    • One bit indicates whether the key can be used for encryption.
    • One bit indicates whether the key can be used for decryption.
    • The remaining bits are spares for future use.
  • Drawback of embedding key tag within key:
    • The tag length is limited to 8 bits, limiting its flexibility and functionality.
    • Because the tag is not transmitted in clear form, it can be used only at the point of decryption, limiting the ways in which key use can be controlled.
  • A more flexible scheme, referred to as the control vector. In this scheme, each session key has an associated control vector consisting of a number of fields that specify the uses and restrictions for that session key. The length of the control vector may vary.
  • The control vector is cryptographically coupled with the key at the time of key generation at the KDC.
  • Coupling/Decoupling processes are illustrated below:

  • Advantages of using vector control approach:
    • There is no restriction on length of the control vector, which enables arbitrarily complex controls to be imposed on key use.
    • The control vector is available in clear form at all stages of operation. Thus, control of key use can be exercised in multiple locations.

 

Random Number Generation

  • Usages of random numbers:
    • Reciprocal authentication.
    • Session key generation.
    • Generation of keys for the RSA public-key encryption algorithm.
  • Requirements for sequence of random number:
    • Randomness:
      • Randomness criteria:
        • Uniform distribution:
          • The distribution of numbers in the sequence should be uniform; that is, the frequency of occurrence of each of the numbers should be approximately the same.
        • Independence:
          • No one value in the sequence can be inferred from the others.
    • Unpredictability.
  • Pseudorandom numbers are resulting sequences will pass many reasonable tests of randomness.

Multiple Encryption and Triple DES

  • Multiple encryption is a technique in which an encryption algorithm is used multiple times. In the first instance, plaintext is converted to ciphertext using the encryption algorithm.
  • Triple DES makes use of three stages of the DES algorithm, using a total of two or three distinct keys.
  • A mode of operation is a technique for enhancing the effect of a cryptographic algorithm or adapting the algorithm for an application, such as applying a block cipher to a sequence of data blocks or a data stream.
  • Five modes of operation have been standardized for use with symmetric block ciphers such as DES and AES:
    • Electronic codebook mode.
    • Cipher block chaining mode.
    • Cipher feedback mode.
    • Output feedback mode.
    • Counter mode.
  • A stream cipher is a symmetric encryption algorithm in which ciphertext output is produced bit-by-bit or byte-by-byte from a stream of plaintext input. The most widely used such cipher is RC4.
  • Multiple Encryption Techniques:
    • Double DES.
      • Double encryption, and indeed any number of stages of multiple encryption with DES, would be useless because the result would be equivalent to a single encryption with a single 56-bit key.
    • Triple DES with 2 keys.
    • 3DES with 3 keys.
    • Given a known pair, (P, C), the attack proceeds as follows.
      • First, encrypt P for all 256 possible values of K1 Store these results in a table and then sort the table by the values of X.
      • Next, decrypt C using all 256 possible values of K2. As each decryption is produced, check the result against the table for a match.
      • If a match occurs, then test the two resulting keys against a new known plaintext-ciphertext pair. If the two keys produce the correct ciphertext, accept them as the correct keys.
  • As an alternative, Tuchman proposed a triple encryption method that uses only two keys. The function follows an encrypt-decrypt-encrypt (EDE) sequence.
  • Currently, there are no practical cryptanalytic attacks on 3DES.
  • Attacks on 3DES:
    • Attack #1:
      • First serious proposal came from Merkle and Hellman.
      • Their plan involves finding plaintext values that produce a first intermediate value of A = 0 and then using the meet-in-the-middle attack to determine the two keys.
      • The level of effort is 256, but the technique requires 256 chosen plaintext-ciphertext pairs, a number unlikely to be provided by the holder of the keys.
    • Attack #2:
      • This method is an improvement over the chosen-plaintext approach but requires more effort.
      • The attack is based on the observation that if we know A and C, then the problem reduces to that of an attack on double DES.
      • Of course, the attacker does not know A, even if P and C are known, as long as the two keys are unknown.
      • However, the attacker can choose a potential value of A and then try to find a known (P, C) pair that produces A. The attack proceeds as follows.
        • Obtain n (P, C) pairs. This is the known plaintext. Place these in a Table 1 sorted on the values of P.
        • Pick an arbitrary value a for A, and create a second table with entries defined in the following fashion. For each of the 256 possible keys K1 = i, calculate the plaintext value Pi that produces a.
          • For each Pi that matches an entry in Table 1, create an entry in Table 2 consisting of the K1 value and the value of B that is produced for the (P, C) pair from Table 1, assuming that value of K1.
          • At the end of this step, sort Table 2 on the values of B.
        • We now have a number of candidate values of K1 in Table 2 and are in a position to search for a value of K2. For each of the 256 possible keys K2 = j, calculate the second intermediate value for our chosen value of a.
        • At each step, look up Bj in Table 2. If there is a match, then the corresponding key i from Table 2 plus this value of j are candidate values for the unknown keys (K1, K2). Why? Because we have found a pair of keys (i, j) that produce a known (P, C) pair.
        • Test each candidate pair of keys (i, j) on a few other plaintext-ciphertext pairs. If a pair of keys produces the desired ciphertext, the task is complete. If no pair succeeds, repeat from step 1 with a new value of a.
  • Block Cipher Modes of Operation:

  • Electronic Codebook:
  • The ECB method is ideal for a short amount of data, such as an encryption key. Thus, if you want to transmit a DES key securely, ECB is the appropriate mode to use.
  • The most significant characteristic of ECB is that the same b-bit block of plaintext, if it appears more than once in the message, always produces the same ciphertext.
  • For lengthy messages, the ECB mode may not be secure. If the message is highly structured, it may be possible for a cryptanalyst to exploit these regularities.
  • Cipher Block Chaining Mode:
    • We would like to invent a technique in which the same plaintext block, if repeated, produces different ciphertext blocks. A simple way to satisfy this requirement is the cipher block chaining (CBC) mode.
    • To produce the first block of ciphertext, an initialization vector (IV) is XORed with the first block of plaintext. On decryption, the IV is XORed with the output of the decryption algorithm to recover the first block of plaintext. The IV is a data block that is that same size as the cipher block.
  • Cipher Feedback Mode:
    • It is possible to convert DES into a stream cipher, using either the cipher feedback (CFB) or the output feedback mode.
    • A stream cipher eliminates the need to pad a message to be an integral number of blocks.
    • It also can operate in real time. Thus, if a character stream is being transmitted, each character can be encrypted and transmitted immediately using a character-oriented stream cipher.
    • One desirable property of a stream cipher is that the ciphertext be of the same length as the plaintext.
  • Output Feedback Mode:
    • The output feedback (OFB) mode is similar in structure to that of CFB. As can be seen, it is the output of the encryption function that is fed back to the shift register in OFB, whereas in CFB the ciphertext unit is fed back to the shift register.
    • One advantage of the OFB method is that bit errors in transmission do not propagate.
    • The disadvantage of OFB is that it is more vulnerable to a message stream modification attack than is CFB.
  • Counter Mode:
    • A counter, equal to the plaintext block size is used.
    • For encryption, the counter is encrypted and then XORed with the plaintext block to produce the ciphertext block; there is no chaining.
    • Advantages of CTR Mode:
      • Hardware efficiency:
        • CTR mode can be done in parallel on multiple blocks of plaintext or ciphertext.
        • For the chaining modes, the algorithm must complete the computation on one block before beginning on the next block.
      • Software efficiency.
      • Prepossessing:
        • There’s an ability to prepare counter before plaintext enters the encryption/decryption process.
      • Random Access:
        • The ith block of plaintext or ciphertext can be processed in random-access fashion.
      • Provable security:
        • It can be shown that CTR is at least as secure as the other modes discussed in this article.
      • Simplicity:
        • Unlike ECB and CBC modes, CTR mode requires only the implementation of the encryption algorithm and not the decryption algorithm.

Stream Ciphers and RC4

  • Stream Cipher Structure:

  • The output of the generator, called a keystream, is combined one byte at a time with the plaintext stream using the bitwise exclusive-OR (XOR) operation.
  • The stream cipher is similar to the one-time pad. The difference is that a one-time pad uses a genuine random number stream, whereas a stream cipher uses a pseudorandom number stream.
  • Important design considerations for a stream cipher:
    • The encryption sequence should have a large period (generator period).
    • The keystream should approximate the properties of a true random number stream as close as possible.
    • To guard against brute-force attacks, the key needs to be sufficiently long (128-bits key length is desirable).
  • The primary advantage of a stream cipher is that stream ciphers are almost always faster and use far less code than do block ciphers.
  • For applications that require encryption/decryption of a stream of data, such as over a data communications channel or a browser/Web link, a stream cipher might be the better alternative.
  • For applications that deal with blocks of data, such as file transfer, e-mail, and database, block ciphers may be more appropriate.

The RC4 Algorithm

  • It is a variable key-size stream cipher with byte-oriented operations.
  • The algorithm is based on the use of a random permutation. Analysis shows that the period of the cipher is overwhelmingly likely to be greater than 10100.
  • RC4 is used in the SSL/TLS (Secure Sockets Layer/Transport Layer Security) standards that have been defined for communication between Web browsers and servers.
  • It is also used in the WEP (Wired Equivalent Privacy) protocol and the newer WiFi Protected Access (WPA) protocol that are part of the IEEE 802.11 wireless LAN standard.
  • Working Mechanism of RC4:
    • A variable-length key of from 1 to 256 bytes (8 to 2048 bits) is used to initialize a 256-byte state vector S, with elements S[0], S[1],…, S[255]. At all times. S contains a permutation of all 8-bit numbers from 0 through 255.
    • For encryption and decryption, a byte k (see Figure 6.8) is generated from S by selecting one of the 255 entries in a systematic fashion.
    • As each value of k is generated, the entries in S are once again permuted.
    • Once the S vector is initialized, the input key is no longer used. Stream generation involves cycling through all the elements of S[i], and, for each S[i], swapping S[i] with another byte in S according to a scheme dictated by the current configuration of S.

  • A number of papers have been published analyzing methods of attacking RC4 [e.g., [KNUD98], [MIST98], [FLUH00], [MANT01]). None of these approaches is practical against RC4 with a reasonable key length, such as 128 bits. A more serious problem is reported in [FLUH01]. The authors demonstrate that the WEP protocol, intended to provide confidentiality on 802.11 wireless LAN networks, is vulnerable to a particular attack approach. In essence, the problem is not with RC4 itself but the way in which keys are generated for use as input to RC4. This particular problem does not appear to be relevant to other applications using RC4 and can be remedied in WEP by changing the way in which keys are generated. This problem points out the difficulty in designing a secure system that involves both cryptographic functions and protocols that make use of them.

Advanced Encryption Standards

  • AES is a block cipher intended to replace DES for commercial applications. It uses a 128-bit block size and a key size of 128, 192, or 256 bits.
  • AES does not use a Feistel structure. Instead, each full round consists of four separate functions:
    • Byte substitution.
    • Permutation.
    • Arithmetic operations over a finite field.
    • XOR with a key.
  • Advantages of 3DES:
    • Its 168-bit key length overcomes the vulnerability to brute-force attack of DES.
    • Scrutiny on its encryption algorithm.
  • Disadvantages of 3DES:
    • No support for efficient code and have slow performance.
    • Uses 64-key bit size. For reasons of both efficiency and security, a larger block size is desirable.
  • Initial criteria of evaluating candidate security algorithms:
    • Security:
      • Refers to effort required to cryptanalyze an algorithm.
      • Aspects:
        • Actual security: compared to other algorithms (at the same key and block size).
        • Randomness: the extent of randomness in the algorithm output.
        • Soundness: of the math basis of the algorithm’s security.
    • Cost:
      • The algorithm should have high computational efficiency; so as to be usable is high speed applications.
      • Aspects:
        • Licensing requirements: the used sub-algorithms should be world-wide and non-exclusive.
        • Computational efficiency: speed of the algorithm from software hardware wise.
        • Memory requirements: memory required to implement the algorithm from software and hardware wise.
    • Algorithm and implementation characteristics:
      • Flexibility: flexibility can be evaluated from following:
        • Ability to accommodate additional key and block sizes.
        • Range platforms that the algorithm is acceptable.
        • The algorithm can be implemented as stream cipher, message authentication code generator, pseudorandom number generator, hashing algorithm etc…
      • Hardwar and software suitability: ability to be implemented efficiency on hardware and software.
      • Simplicity: relative simplicity of its design.
  • Advanced criteria of evaluating candidate algorithms:
    • General security:
      • Strength duo to known cryptographic (from math perspective) attacks such as differential and linear cryptanalysis.
    • Software implementations:
      • Execution speed, performance across variety of platforms, variation of speed with key-size.
    • Restricted-space environments.
    • Hardware implementations:
      • The amount of required hardware to implement the algorithm efficiency.
    • Attacks on implementations:
      • Strength duo to physical attacks during algorithm execution to gather information about quantities such as keys.
      • Examples of such algorithms are:
        • Time attacks.
        • Power analysis:
          • Observation that the power consumed by a smart card at any particular time during the cryptographic operation is related to the instruction being executed and to the data being processed.
          • For example, multiplication consumes more power than addition.
          • Writing 1s consumes more power than writing 0s.
    • Encryption versus decryption:
      • If the encryption and decryption differs, then extra space is needed for decryption.
    • Key agility:
      • Ability to change keys quickly and with a minimum of resources.
    • Other versatility and flexibility:
      • Parameter flexibility: ease of support for another key and block sizes and ease of increasing the number of rounds in order to cope with newly discovered attacks.
      • Implementation flexibility: possibility of optimizing cipher elements for particular environments.
    • Potential for instruction-level parallelism.

The AES Cipher

  • Block and key lengths can be independently specified to be: 128 (most common used), 192 or 256 bits.
  • Characteristics of AES Cipher:
    • Resistance against all known attacks.
    • Speed and code compactness over wide range of platforms.
    • Design simplicity.
  • Notes about AES structure
    • Not Feistel cipher and entire process data block in parallel during each round.
    • The provided key is expanded into an array of forty-four 32-bit words, w[i]. Four distinct words (128 bits) serve as a round key for each round.
    • Four different stages are used, one of permutation and three of substitution:
      • Substitute bytes: Uses an S-box to perform a byte-by-byte substitution of the block.
      • ShiftRows: A simple permutation.
      • MixColumns: A substitution that makes use of arithmetic GF(28)
      • AddRoundKey: A simple bitwise XOR of the current block with a portion of the expanded key.
    • Cipher Stages:
      • Initial round with AddRoundKeyStage
      • Followed by 9 rounds each one includes four stages.
      • Tenth round that uses three stages only.

 

  • Each stage is easily reversible.
  • The decryption algorithm is not similar to the encryption algorithm. This is a consequence of the particular structures of AES.
  • AES uses arithmetic in the finite field of GF(28), with irreversible polynomial
  • The MixColumns step along with the ShiftRows step is the primary source of diffusion in Rijndael cipher.
  • The expanded key can be seen as an array of 32-bit words (columns) numbered from 0 to 43.
  • For details about AES working mechanism see this video.
  • One of the prerequisites of the MixColumn stage is to know how to apply multiplication in arithmetic field GF(28).

Finite Fields

  • A field is a set of elements on which two arithmetic operations (addition and multiplication) have been defined and which has the properties of ordinary arithmetic, such as closure, associativity, commutativity, distributivity, and having both additive and multiplicative inverses
  • Modular arithmetic is a kind of integer arithmetic that reduces all numbers to one of a fixed set [0…n – 1] for some number n.
    • Any integer outside this range is reduced to one in this range by taking the remainder after division by n.
  • The greatest common divisor of two integers is the largest positive integer that exactly divides both integers.
  • A finite field is simply a field with a finite number of elements. It can be shown that the order of a finite field (number of elements in the field) must be a power of a prime pn, where n is a positive integer.
  • Finite fields of order p can be defined using arithmetic mod p.
  • Finite fields of order pn, for n > 1 can be defined using arithmetic over polynomials.
  • Advanced Encryption Standard (AES) and elliptic curve cryptography relies heavily on properties of finite fields.
  • Groups, rings, and fields are the fundamental elements of a branch of mathematics known as abstract algebra, or modern algebra.
  • In abstract algebra, we are concerned with sets on whose elements we can operate algebraically; that is, we can combine two elements of the set, perhaps in several ways, to obtain a third element of the set.
  • Groups:
    • A group G, is a set of elements with a binary operation, denoted by ·, that associates to each ordered pair (a, b) of elements in G an element (a · b) in G, such that the following axioms are obeyed:
      • (A1)Closure: If a and b belong to G, then a · b is also in G.
      • (A2)Associative: a · (b · c) = (a · b) · c for all a, b, c in G.
      • (A3)Identity element: There is an element e in G such that a · e = e · a = a for all a in G.
      • (A3)Inverse element: For each a in G there is an element a’ in G such that a · a’ = a’ · a = e
    • If a group has a finite number of elements, it is referred to as a finite group, and the order of the group is equal to the number of elements in the group. Otherwise, the group is an infinite group.
    • A group is said to be (A5)abelian if it satisfies the following additional condition:
      • Commutative: a · b = b · a for all a, b in G.
    • A group G is cyclic if every element of G is a power ak (k is an integer) of a fixed element a G. The element a is said to generate the group G, or to be a generator of G.
    • A cyclic group is always abelian, and may be finite or infinite.
    • The additive group of integers is an infinite cyclic group generated by the element 1. In this case, powers are interpreted additively, so that n is the nth power of 1.
  • Rings:
    • A ring R, sometimes denoted by {R, +, x}, is a set of elements with two binary operations, called addition and multiplication, such that for all a, b, c in R the following axioms are obeyed:
      • R is an abelian group with respect to addition; that is, R satisfies axioms A1 through A5. For the case of an additive group, we denote the identity element as 0 and the inverse of a as a.
      • (M1)Closure under multiplication: If a and b belong to R, then ab is also in R.
      • (M2)Associativity of multiplication: a(bc) = (ab)c for all a, b, c in R
      • (M3)Distributive laws:
        • a(b + c) = ab + ac for all a, b, c in R.
        • (a + b)c = ac + bc for all a, b, c in R.
    • A ring is said to be commutative if it satisfies the following additional condition:
      • (M4)Commutativity of multiplication: ab = ba for all a, b in R
    • Integral domain, is a commutative ring that obeys the following axioms:
      • (M5)Multiplicative identity: There is an element 1 in R such that a1 = 1a = a for all a in R
      • (M6)No zero divisors: If a, b in R and ab = 0, then either a = 0 or b = 0.
  • Field:
    • A field F, sometimes denoted by {F, +, x}, is a set of elements with two binary operations, called addition and multiplication, such that for all a, b, c in F the following axioms are obeyed:
      • F is an integral domain; that is, F satisfies axioms A1 through A5 and M1 through M6.
      • (M7)Multiplicative inverse: For each a in F, except 0, there is an element a-1 in F such that aa-1 = (a-1)a = 1.
    • Division is defined with the following rule: a/b = a(b-1).

  • 101-Modular Arithmetic Equation.
  • Remainder r is often referred to as a residue.
  • Two integers a and b are said to be congruent modulo n, if (a mod n) = (b mod n). This is written as a b (mod n)
  • We say that a nonzero b divides a if a = mb (b|a) for some m. That is, b divides a if there is no remainder on division. We say that b is a devisor of a.
  • The following relations hold:
    • If a|1, then a = ±1.
    • If a|b and b|a, then a = ±b
    • Any b 0 divides 0.
    • If b|g and b|h, then b|(mg + nh) for arbitrary integers m and n.
  • Properties of
    congruent modulo
    :
    • a b (mod n) if n|(a – b).
    • a b (mod n) implies b a (mod n).
    • a b (mod n) and b c (mod n) imply a c (mod n).
  • Properties of modulo arithmetic:
    • [(a mod n) (+/-/x) (b mod n)] mod n = (a (+/-/x) b) mod n.
  • 103-Finding result of modulus with exponentiation.
  • 104-Arithmetic modulo table
  • Additive Inverse (Negative) of integer x is y such that (x + y) mod n = 0.
  • Multiplicative Inverse (reciprocal) of integer x is y such that (x X y) mod n = 1.
  • Each integer in Zn (Zn = {0, 1, 2, …, (n-1)}) represents a residue class. We can label the residue classes modulo n as [0], [1], [2],…,[n – 1]
  • Of all the integers in a residue class, the smallest nonnegative integer is the one usually used to represent the residue class. Finding the smallest nonnegative integer to which k is congruent modulo n is called reducing k modulo n.
  • 105-Differences between modular arithmetic and ordinary arithmetic.
  • Two integers are relatively prime if their only common positive integer factor is 1.
  • Satisfying relatively prime condition will hold that there’ll be no many-to-one mapping to residues.
  • The positive integer c is said to be the greatest common divisor of a and b if:
    • c is a divisor of a and of b.
    • Any divisor of a and b is a divisor of c.
  • 108-Euclidian Algorithm for finding GCD.
  • It can be shown that the order of a finite field (number of elements in the field) must be a power of a prime pn, where n is a positive integer.
  • The finite field of order pn is generally written GF(pn); stands for Galois field.
  • For each there exists a such that
  • 111-Extended Euclidian Algorithm for finding GCD and Multiplicative Inverse.
  • Classes of polynomial arithmetic:
    • Ordinary polynomial arithmetic, using the basic rules of algebra.
    • Polynomial arithmetic in which the arithmetic on the coefficients is performed modulo p; that is, the coefficients are in GF(p).
    • Polynomial arithmetic in which the coefficients are in GF(p), and the polynomials are defined modulo a polynomial m(x) whose highest power is some integer n.
  • Ordinary Polynomial Arithmetic:
    • 113-Form.
    • Where the ai are elements of some designated set of numbers S, called the coefficient set.
    • A zeroth-degree polynomial is called a constant polynomial and is simply an element of the set of coefficients.
    • An nth-degree polynomial is said to be a monic polynomial if an = 1.
  • Polynomial Arithmetic with Coefficients in Zp:
    • Polynomial ring is
      polynomial in which the coefficients are elements of some field F.
    • 115-divison under this type of polynomials.
    • Equivalently, we can say that g(x) is a factor of f(x) or g(x) is a divisor of f(x) if g(x)|f(x).
    • A polynomial f(x) over a field F is called irreducible if and only if:
      • f(x) cannot be expressed as a product of two polynomials, both over F.
      • Both of degrees lower than that of f(x).
    • By analogy to integers, an irreducible polynomial is also called a prime polynomial.
    • 117-Steps to find GCD between 2 arithmetic polynomials.
    • 118-Example of finding GCD between 2 arithmetic polynomials.
  • For convenience and for implementation efficiency, we would like to work with integers that fit exactly into a given number of bits, with no wasted bit patterns. That is, we wish to work with integers in the range 0 through 2n 1, which fit into an n-bit word. (Illustrative example page # 119).
  • Modular Polynomial Arithmetic (MPA):
    • Definition of MPA:
      • Arithmetic follows the ordinary rules of polynomial arithmetic.
      • Arithmetic on coefficients in performed modulo p.
      • If multiplication results in a polynomial of degree than n – 1, then the polynomial is reduced modulo some irreducible polynomial m(x) of degree n. This is done by dividing on m(x) and keep the remainder
    • The AES uses arithmetic in the finite field of GF(28) with the irreducible polynomial
    • All finite fields of a given order are isomorphic; that’s, any two finite field structures of a given order have the same structure, but the representation, or labels, of the elements maybe different.
  • Extended Euclidian algorithm can be adapted to find the greatest common divisor of two polynomials.
  • Every polynomial in GF(2n) can be represented by an n-bit number.
  • Addition (subtraction) between two polynomials in GF(2n) corresponds to a bitwise XOR operation.
  • Questions:
    • What’s closure?
    • What are differences between associativity, commutativity?
    • Finite fields of order p can be defined using arithmetic mod p?
    • Finite fields of order pn, for n > 1 can be defined using arithmetic over polynomials.
    • 98-Why Sn is not abelian?
    • What’s meant by 2 binary operations?! And what’s the difference between 1 and 2 binary operations?
      • In essence, a field is a set in which we can do addition, subtraction, multiplication, and division without leaving the set.
      • What’s meant by: without leaving the set”?
    • Familiar examples of fields are the rational numbers, the real numbers, and the complex numbers. Note that the set of all integers is not a field, because not every element of the set has a multiplicative inverse; in fact, only the elements 1 and -1 have multiplicative inverses in the integers.
      • How only 1 and -1 have multiplicative inverse?! Also, 2 * ½ = 1.
    • 101- (q + 1)n > a what’s the usefulness of this condition?
    • Understanding properties of congruent modulo.
    • 105-explaining reducing k modulo n.
    • What are differences between modular arithmetic and ordinary arithmetic?
    • 106-strange reason of not accepting result of modulo arithmetic.
    • 108-Proof of Euclidian Algorithm.
    • It can be shown that the order of a finite field (number of elements in the field) must be a power of a prime pn, where n is a positive integer. Why it’s must?
    • Equations in page 112.
    • 111-How to calculate multiplicative inverse?
    • Examples of fields include the real numbers, rational numbers, and Zp for p prime. Note that the set of all integers is not a field and does not support polynomial division. How?!
    • 115-Thus, division is not exact over the set of integers. Why?!
    • 117-why f(x) is reducible?
    • 120-Explanation of example.

Model Building through Regression

 

 

  • Linear regression is a special form of the function approximation, to model a given set of random variables.
  • Linear regression is about finding relationship between set of random variables to be able to predict new value.
  • In linear regression we’ve the following scenario:
    • One of the random variables is considered to be of particular interest; that random variable is referred to as a dependent variable or response.
    • The remaining random variables are called independent variables, or regressors; their role is to explain or predict the statistical behavior of the response.
    • The dependence of the response on the regressors includes an additive error term, to account for uncertainties in the manner in which this dependence is formulated. This is called expectational error or explanational error.
  • Such model is called regress model.
  • In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
  • Classes of regression models:
    • Linear: The dependence of the response on the regressors is defined by linear function.
    • Non-Linear: The dependence of the response on the regressors is defined by non-linear function.
  • Methods for tracing the linear regression model:
    • Bayesian Theory:
      • To derive the maximum a posteriori estimate of the vector that parameterizes a linear regression model.
    • Method of Least Square:
      • This is the oldest parameter estimation procedure.
      • It was introduced by Gauss in the early part of 19th century.

  • Model order is common dimension between regressors and fixed weights.
  • Problem of interest here is stated as follows:
    • Given the joint statistics of the regressor X and the corresponding respinse D, estimate the unknown parameter vector w.
  • Maximum a posteriori (MAP) is more profound than maximum-likelihood (ML) estimation because:
    • MAP exploits all the conceivable information about parameter vector w.
    • ML relies solely on the observation model (d, x) and may therefore lead to a nonunique solution. To enforce uniqueness and stability the prior has to be incorporated into the formulation of the of the estimator.
  • Assumptions considered in parameter estimation process:
    • Samples are statistically independent and identically distributed.
      • Statistically independent meant that there’s no covariance matrix.
      • Identically distributed means that values are on diagonal.
    • Gaussianity.
      • The environment responsible for generation of the training sampls is Gaussions distribution.
    • Stationary.
      • The environment is stationary, which means that the parameter vector w is fixed, but unkown, throughout the N trails of the experiment.
  • 103 ->106-Finding MAP of weight vector.
  • In improving the stability of the maximum likelihood estimator through the use of regularization (i.e. the incorporation of prior knowledge), the resulting maximum a posteriori estimator becomes biased.
  • In short, we have a tradeoff between stability and bias.
  • Ordinary least square estimator is defined as the squared expectational errors summed over the experimental trails on the environment.
  • When Regularization Parameter (=0) that means we have complete confidence in the observation model exemplified by the training samples. At the other extreme if (=) that means we have no confidence at the observation model.
  • Regularized Least Square Solution (RLS) is referred as minimizing the cost function with respect to parameter w.
  • This solution is identical to MAP estimate solution.
  • The representation of a stochastic process by a linear model may be used for synthesis or analysis.
  • In synthesis, we generate a desired time series by assigning a formulated set of values to parameters of the model and feeding it with while noise (with zero mean and variance). The obtained model is called generative model.
    • Note: I bet that synthesis is similar to simulation; is that right or wrong?!
  • In analysis, we estimate the parameters of the model by processing a given time series of finite length, using the Bayesian approach or regularized method of least square.
  • In estimation, we need to pick best suitable model to process data. This is called model selection problem.
  • Minimum-Description-Length (MDL) principle is model selection method pioneered by Rissanen.
  • MDL was inspired from:
    • Kolmogorov Complexity Theory. Kolmogorov defined complexity as follows
      • “The algorithmic complexity of a data sequence is the length of the shortest binary computer program that prints out the sequence and then halts”.
    • Regularity itself may be identified with the “ability to compress”.
  • MDL states: “Given a set of hypothesis, and a data sequence d, we should try to find the particular hypothesis or some combination of hypothesis in, that compresses the data sequence d the most”.
  • MDL principle has many versions one of the oldest and simplistic versions is called simplistic two-part code MDL principle for probabilistic modeling.
    • By “simplistic” we mean that the codelengths under consideration are not determined in optimal fashion.
    • By “code” and “codelengths” used to pertain to the process of encoding the data sequence in the shortest or least redundant manner.
  • 111-in MDL we are interested to identify the model that best explains an unknown environment that is responsible for generating training sample , where xi is the stimulus and di is the corresponding response.
  • This above description is called model-order selection problem.
  • 111-Description of MDL selection process.
  • Attributes of MDL Principle:
    • The MDL principle implements a precise form of Occam’s razor, which states a preference for simple theories: “Accept the simplest explanation that fits data”.
    • The MDL principle is a consistent model selection estimator in the sense that it converges to the true model order as the sample size increases.
  • Questions:
    • 99-What is meant by stationary environment?
      • The statistics of the system don’t change with time.
    • 100-Why in joint statistics we need correlation matrix of regressors and variances of D?
      • To exploit the dependency of the response on the variances of the regressors and variances of D.
    • 100-Why X and D means are assumed to be zero?
      • To normalize data and put them in the same space.
    • 101-What’s meant by: P(w, d|x), P(w| d, x).
      • P(w, d|x): Joint Probability of w & d given x. Where x is given and w & d are unknown.
      • P(w| d, x): probability of w given joint probability of d and x. Where both d and x are known.
    • 101-Why we’ve made the series of these equations?
    • 101-How equation 2.5 is valid?
    • 102-What’s meant by equation 2.10, 2.11
    • 103-What’s meant by identically distributed? A life example.
      • Follows same distribution type.
    • 103-What’s meant by single-shot realization?
      • Instantiation value.
    • 103-Why expectational error is with zero mean? (Assumption #2)
    • 104-Equation 2.19; why there’s no division on the number of the elements?
      • Because the data are already normalized.
    • 107-Why the ½ appeared?
      • To simplify calculations and takes average of error.
    • 107-How OLSE is same as MLE?
    • Because they have the same cost equation.
    • 107-What’s the problem of “having distinct possibility of obtaining a solution that lacks uniqueness and stability”?
      • Because the estimation is only based on the observed data.
    • 107-How “This solution is identical to MAP estimate solution”?!
      • Because they have the same cost equation.
    • 110-Is MDL is similar to MLE?!
    • 110-What we mean by “p should be encoded” (last paragraph).
    • 111-What’s Occam’s razor theorem? (in short)