Project 2

Assign 2018-11-28   Due 2018-12-17

Huffman encoding/decoding

Problem description: Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.

The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character.

Steps to build Huffman Tree

The input is an array of unique characters along with their frequency of occurrences and the output is the Huffman Tree.

1. Create a leaf node for each unique character and build a min heap of all leaf nodes (The Min Heap is used as a priority queue. The value of the frequency field is used to compare two nodes in the min heap. Initially, the least frequent character is at the root.)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with the frequency equal to the sum of the two nodes’ frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.

4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

Example:

Character   Frequency    Huffman Code

   a            5                1100

   b            9                1101

   c           12                100

   d           13                101

   e           16                111

   f           45                0

Submit:

1. File to submit: CS206_studentID_name(CN).h

2. The following class definition is included in your header file. You should complete the methods of the class in the .cpp file.

class huffmanTree;
// node for binary tree
class btnode{
4.public:
char data; // character
int frequency; // frequency
string code; // code for the node: if node is left child, code is 0. right child is 1.
 btnode *lchild,*rchild;
.};
10.
class huffmanTree{
private:
13.    btnode *t; // t is root of the huffman tree
14.    btnode **pq;//priority queue
15.    int size_pq;
16.    
17.public:
18.    huffmanTree(int n);
19.    void generate_leaf_node(int n, char *arr, int *frequency);
20.    void insert_pq(btnode *tnode);// insert one node into priority queue
21.    btnode* extractMin_pq();// extract mininmum weight node from priority queue
22.    void display_pq();
23.    void generate_huffmanTree(); // generate huffman tree
24.    btnode* get_tree(); // get the huffman tree
25.    void huffman_code(btnode *tnode,string code); //display huffman code of  leaf nodes
26.
27.};

1. The description of the class huffmanTree is as follows:

a) The method “insert_pq(btnode *tnode)”is to insert one node into the priority queue “pq”.

Note: The min Heap is used as a priority queue. The priority queue “pq” should be a heap after one node is inserted.

b) The method “generate_leaf_node(int n, char *arr, int *frequency)” is to generate leaf nodes and they are inserted into the priority queue “pq” one by one.

c) The method “extractMin_pq()” is to extract the minimum frequency node from the priority queue “pq”.

Note: After removing the minimum, “pq” is still a min heap.

d) “display_pq()” is to output all elements of the priority queue “pq”.

e) “generate_huffmanTree()” is to generate the Huffman tree. “t” is the root of the tree.

f) “get_tree()” is to get the root “t” of the Huffman tree.

g) huffman_code(btnode *tnode,string code)” is to output Huffman codes of all leaf nodes. You can use a recursion or stack to get the Huffman codes. Also, the function could be overloaded.

2. Score

a) The method “display_pq()” can output “character”, “frequency” of each node in priority queue “pq”. (10%)

b) After the method “insert_pq(btnode *tnode)” is implemented, “pq” must be a heap. (20%)

c) extractMin_pq() extracts the minimum frequency node from “pq”. And the priority queue “pq” is still a heap. (30%)

d) The Huffman tree should be generated by “generate_huffmanTree()” correctly. “t” is the root of the tree. (20%)

e) Get Huffman codes of all characters correctly. (20%)

3. Submit by email:

a) Email subject : CS206_studentID_studentName(CN)

b) Send email to TA after attaching your files:

i. header file and source file

ii. README.txt

– What you have implemented and what you have not

– How much time you have spent and talk about your experience of the project

c) TA’s (Gao Mengyi) email address: 474330542@qq.com

4. Due date: 2018-12-17

a) No acceptance after 5 days.

b) 10% penalty per day up to 5 days.

5. The following file can be used to test your code.

int main(){
char arr1[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq1[] = { 5, 9, 12, 13, 16, 45 };
char arr2[] = { 's', 't', 'u', 'v', 'w', 'x'};
int freq2[] = { 5, 19, 12, 13, 16, 45 };

huffmanTree ht1(6),ht2(6);
 ht1.generate_leaf_node(6, arr1, freq1);
cout << "the first test priority queue is "<<endl;
 ht1.display_pq();
ht1.generate_huffmanTree();
 btnode *t1 = ht1.get_tree();
 cout << "the first test Huffman code is "<<endl;
string code1;
ht1.huffman_code(t1,code1);
// huffman_code could be overloaded
// ht1.huffman_code();

 ht2.generate_leaf_node(6, arr2, freq2);
 cout << "the second test priority queue is "<<endl;
ht2.display_pq();
 ht2.generate_huffmanTree();
btnode *t2 = ht2.get_tree();
    cout << "the second test Huffman code is "<<endl;
 string code2;
ht2.huffman_code(t2,code2);
// huffman_code could be overloaded
// ht2.huffman_code();
 return 0;
}