Huffmana kodēšanas algoritms

Šajā apmācībā jūs uzzināsiet, kā darbojas Huffman Coding. Jūs atradīsit arī Huffman kodēšanas piemērus C, C ++, Java un Python.

Huffman Coding ir datu saspiešanas paņēmiens, lai samazinātu tā lielumu, nezaudējot nevienu no detaļām. Vispirms to izstrādāja Deivids Hafmans.

Huffman kodēšana parasti ir noderīga, lai saspiestu datus, kuros bieži sastopamas rakstzīmes.

Kā darbojas Huffman Coding?

Pieņemsim, ka zemāk esošā virkne ir jānosūta pa tīklu.

Sākotnējā virkne

Katrs raksturs aizņem 8 bitus. Iepriekš minētajā virknē kopā ir 15 rakstzīmes. Tādējādi 8 * 15 = 120šīs virknes nosūtīšanai ir nepieciešami kopējie biti.

Izmantojot Huffman Coding tehniku, mēs varam saspiest virkni mazākā izmērā.

Hafmana kodēšana vispirms izveido koku, izmantojot rakstzīmes frekvences, un pēc tam ģenerē kodu katrai rakstzīmei.

Kad dati ir kodēti, tie ir jāatkodē. Dekodēšana tiek veikta, izmantojot to pašu koku.

Huffman kodēšana novērš jebkādas neskaidrības dekodēšanas procesā, izmantojot prefiksa koda jēdzienu, ti. kodam, kas saistīts ar rakstzīmi, nevajadzētu būt neviena cita koda prefiksā. Iepriekš izveidotais koks palīdz uzturēt īpašumu.

Huffman kodēšana tiek veikta ar šādu darbību palīdzību.

  1. Aprēķiniet virknes katras rakstzīmes biežumu. Stīgu biežums
  2. Kārtojiet rakstzīmes biežuma palielināšanas secībā. Tie tiek saglabāti prioritārajā rindā Q. Rakstzīmes sakārtotas atbilstoši biežumam
  3. Izveidojiet katru unikālo rakstzīmi kā lapu mezglu.
  4. Izveidojiet tukšu mezglu z. Piešķiriet minimālo frekvenci z kreisajam zīdainim un piešķiriet otro minimālo frekvenci z labajam zīdainim. Iestatiet z vērtību kā iepriekš minēto divu minimālo frekvenču summu. Vismazāko skaitļu summas iegūšana
  5. Noņemiet šīs divas minimālās frekvences no Q un pievienojiet summu frekvenču sarakstā (* apzīmē iekšējos mezglus attēlā iepriekš).
  6. Ievietojiet z mezglu kokā.
  7. Atkārtojiet 3. līdz 5. darbību visām rakstzīmēm. Atkārtojiet 3. līdz 5. darbību visām rakstzīmēm. Atkārtojiet 3. līdz 5. darbību visām rakstzīmēm.
  8. Katram mezglam bez lapas piešķiriet 0 kreisajai malai un 1 labajai malai. Piešķiriet 0 kreisajai malai un 1 labajai malai

Lai nosūtītu iepriekšminēto virkni tīklā, mums ir jānosūta koks, kā arī iepriekš minētais saspiestais kods. Kopējais lielums ir norādīts zemāk esošajā tabulā.

Raksturs Biežums Kods Izmērs
A 5 11 5 * 2 = 10
B 1 100 1 * 3 = 3
C 6 0 6 * 1 = 6
D 3 101 3 * 3 = 9
4 * 8 = 32 biti 15 biti 28 biti

Bez kodēšanas virknes kopējais izmērs bija 120 biti. Pēc kodēšanas izmērs tiek samazināts līdz 32 + 15 + 28 = 75.

Koda dekodēšana

Lai atšifrētu kodu, mēs varam paņemt kodu un šķērsot koku, lai atrastu rakstzīmi.

Ļaujiet dekodēt 101, mēs varam šķērsot sakni, kā parādīts attēlā zemāk.

Atkodēšana

Huffmana kodēšanas algoritms

izveidojiet prioritāru rindu Q, kas sastāv no katra unikālā rakstura. kārtot pēc tam to frekvenču augošā secībā. visām unikālajām rakstzīmēm: izveidojiet newNode minimālo vērtību no Q un piešķiriet to leftChild of newNode izvelciet minimālo vērtību no Q un piešķiriet to rightChild of newNode aprēķiniet šo divu minimālo vērtību summu un piešķiriet to newNode ieliktņa vērtībai šis newNode kokā atgriež rootNode

Python, Java un C / C ++ piemēri

Python Java C C ++
 # Huffman Coding in python string = 'BCAADDDCCACACAC' # Creating tree nodes class NodeTree(object): def __init__(self, left=None, right=None): self.left = left self.right = right def children(self): return (self.left, self.right) def nodes(self): return (self.left, self.right) def __str__(self): return '%s_%s' % (self.left, self.right) # Main function implementing huffman coding def huffman_code_tree(node, left=True, binString=''): if type(node) is str: return (node: binString) (l, r) = node.children() d = dict() d.update(huffman_code_tree(l, True, binString + '0')) d.update(huffman_code_tree(r, False, binString + '1')) return d # Calculating frequency freq = () for c in string: if c in freq: freq(c) += 1 else: freq(c) = 1 freq = sorted(freq.items(), key=lambda x: x(1), reverse=True) nodes = freq while len(nodes)> 1: (key1, c1) = nodes(-1) (key2, c2) = nodes(-2) nodes = nodes(:-2) node = NodeTree(key1, key2) nodes.append((node, c1 + c2)) nodes = sorted(nodes, key=lambda x: x(1), reverse=True) huffmanCode = huffman_code_tree(nodes(0)(0)) print(' Char | Huffman code ') print('----------------------') for (char, frequency) in freq: print(' %-4r |%12s' % (char, huffmanCode(char)))
 // Huffman Coding in Java import java.util.PriorityQueue; import java.util.Comparator; class HuffmanNode ( int item; char c; HuffmanNode left; HuffmanNode right; ) // For comparing the nodes class ImplementComparator implements Comparator ( public int compare(HuffmanNode x, HuffmanNode y) ( return x.item - y.item; ) ) // IMplementing the huffman algorithm public class Huffman ( public static void printCode(HuffmanNode root, String s) ( if (root.left == null && root.right == null && Character.isLetter(root.c)) ( System.out.println(root.c + " | " + s); return; ) printCode(root.left, s + "0"); printCode(root.right, s + "1"); ) public static void main(String() args) ( int n = 4; char() charArray = ( 'A', 'B', 'C', 'D' ); int() charfreq = ( 5, 1, 6, 3 ); PriorityQueue q = new PriorityQueue(n, new ImplementComparator()); for (int i = 0; i 1) ( HuffmanNode x = q.peek(); q.poll(); HuffmanNode y = q.peek(); q.poll(); HuffmanNode f = new HuffmanNode(); f.item = x.item + y.item; f.c = '-'; f.left = x; f.right = y; root = f; q.add(f); ) System.out.println(" Char | Huffman code "); System.out.println("--------------------"); printCode(root, ""); ) )
 // Huffman Coding in C #include #include #define MAX_TREE_HT 50 struct MinHNode ( char item; unsigned freq; struct MinHNode *left, *right; ); struct MinHeap ( unsigned size; unsigned capacity; struct MinHNode **array; ); // Create nodes struct MinHNode *newNode(char item, unsigned freq) ( struct MinHNode *temp = (struct MinHNode *)malloc(sizeof(struct MinHNode)); temp->left = temp->right = NULL; temp->item = item; temp->freq = freq; return temp; ) // Create min heap struct MinHeap *createMinH(unsigned capacity) ( struct MinHeap *minHeap = (struct MinHeap *)malloc(sizeof(struct MinHeap)); minHeap->size = 0; minHeap->capacity = capacity; minHeap->array = (struct MinHNode **)malloc(minHeap->capacity * sizeof(struct MinHNode *)); return minHeap; ) // Function to swap void swapMinHNode(struct MinHNode **a, struct MinHNode **b) ( struct MinHNode *t = *a; *a = *b; *b = t; ) // Heapify void minHeapify(struct MinHeap *minHeap, int idx) ( int smallest = idx; int left = 2 * idx + 1; int right = 2 * idx + 2; if (left size && minHeap->array(left)->freq array(smallest)->freq) smallest = left; if (right size && minHeap->array(right)->freq array(smallest)->freq) smallest = right; if (smallest != idx) ( swapMinHNode(&minHeap->array(smallest), &minHeap->array(idx)); minHeapify(minHeap, smallest); ) ) // Check if size if 1 int checkSizeOne(struct MinHeap *minHeap) ( return (minHeap->size == 1); ) // Extract min struct MinHNode *extractMin(struct MinHeap *minHeap) ( struct MinHNode *temp = minHeap->array(0); minHeap->array(0) = minHeap->array(minHeap->size - 1); --minHeap->size; minHeapify(minHeap, 0); return temp; ) // Insertion function void insertMinHeap(struct MinHeap *minHeap, struct MinHNode *minHeapNode) ( ++minHeap->size; int i = minHeap->size - 1; while (i && minHeapNode->freq array((i - 1) / 2)->freq) ( minHeap->array(i) = minHeap->array((i - 1) / 2); i = (i - 1) / 2; ) minHeap->array(i) = minHeapNode; ) void buildMinHeap(struct MinHeap *minHeap) ( int n = minHeap->size - 1; int i; for (i = (n - 1) / 2; i>= 0; --i) minHeapify(minHeap, i); ) int isLeaf(struct MinHNode *root) ( return !(root->left) && !(root->right); ) struct MinHeap *createAndBuildMinHeap(char item(), int freq(), int size) ( struct MinHeap *minHeap = createMinH(size); for (int i = 0; i array(i) = newNode(item(i), freq(i)); minHeap->size = size; buildMinHeap(minHeap); return minHeap; ) struct MinHNode *buildHuffmanTree(char item(), int freq(), int size) ( struct MinHNode *left, *right, *top; struct MinHeap *minHeap = createAndBuildMinHeap(item, freq, size); while (!checkSizeOne(minHeap)) ( left = extractMin(minHeap); right = extractMin(minHeap); top = newNode('$', left->freq + right->freq); top->left = left; top->right = right; insertMinHeap(minHeap, top); ) return extractMin(minHeap); ) void printHCodes(struct MinHNode *root, int arr(), int top) ( if (root->left) ( arr(top) = 0; printHCodes(root->left, arr, top + 1); ) if (root->right) ( arr(top) = 1; printHCodes(root->right, arr, top + 1); ) if (isLeaf(root)) ( printf(" %c | ", root->item); printArray(arr, top); ) ) // Wrapper function void HuffmanCodes(char item(), int freq(), int size) ( struct MinHNode *root = buildHuffmanTree(item, freq, size); int arr(MAX_TREE_HT), top = 0; printHCodes(root, arr, top); ) // Print the array void printArray(int arr(), int n) ( int i; for (i = 0; i < n; ++i) printf("%d", arr(i)); printf(""); ) int main() ( char arr() = ('A', 'B', 'C', 'D'); int freq() = (5, 1, 6, 3); int size = sizeof(arr) / sizeof(arr(0)); printf(" Char | Huffman code "); printf("--------------------"); HuffmanCodes(arr, freq, size); )
 // Huffman Coding in C++ #include using namespace std; #define MAX_TREE_HT 50 struct MinHNode ( unsigned freq; char item; struct MinHNode *left, *right; ); struct MinH ( unsigned size; unsigned capacity; struct MinHNode **array; ); // Creating Huffman tree node struct MinHNode *newNode(char item, unsigned freq) ( struct MinHNode *temp = (struct MinHNode *)malloc(sizeof(struct MinHNode)); temp->left = temp->right = NULL; temp->item = item; temp->freq = freq; return temp; ) // Create min heap using given capacity struct MinH *createMinH(unsigned capacity) ( struct MinH *minHeap = (struct MinH *)malloc(sizeof(struct MinH)); minHeap->size = 0; minHeap->capacity = capacity; minHeap->array = (struct MinHNode **)malloc(minHeap->capacity * sizeof(struct MinHNode *)); return minHeap; ) // Swap function void swapMinHNode(struct MinHNode **a, struct MinHNode **b) ( struct MinHNode *t = *a; *a = *b; *b = t; ) // Heapify void minHeapify(struct MinH *minHeap, int idx) ( int smallest = idx; int left = 2 * idx + 1; int right = 2 * idx + 2; if (left size && minHeap->array(left)->freq array(smallest)->freq) smallest = left; if (right size && minHeap->array(right)->freq array(smallest)->freq) smallest = right; if (smallest != idx) ( swapMinHNode(&minHeap->array(smallest), &minHeap->array(idx)); minHeapify(minHeap, smallest); ) ) // Check if size if 1 int checkSizeOne(struct MinH *minHeap) ( return (minHeap->size == 1); ) // Extract the min struct MinHNode *extractMin(struct MinH *minHeap) ( struct MinHNode *temp = minHeap->array(0); minHeap->array(0) = minHeap->array(minHeap->size - 1); --minHeap->size; minHeapify(minHeap, 0); return temp; ) // Insertion void insertMinHeap(struct MinH *minHeap, struct MinHNode *minHeapNode) ( ++minHeap->size; int i = minHeap->size - 1; while (i && minHeapNode->freq array((i - 1) / 2)->freq) ( minHeap->array(i) = minHeap->array((i - 1) / 2); i = (i - 1) / 2; ) minHeap->array(i) = minHeapNode; ) // BUild min heap void buildMinHeap(struct MinH *minHeap) ( int n = minHeap->size - 1; int i; for (i = (n - 1) / 2; i>= 0; --i) minHeapify(minHeap, i); ) int isLeaf(struct MinHNode *root) ( return !(root->left) && !(root->right); ) struct MinH *createAndBuildMinHeap(char item(), int freq(), int size) ( struct MinH *minHeap = createMinH(size); for (int i = 0; i array(i) = newNode(item(i), freq(i)); minHeap->size = size; buildMinHeap(minHeap); return minHeap; ) struct MinHNode *buildHfTree(char item(), int freq(), int size) ( struct MinHNode *left, *right, *top; struct MinH *minHeap = createAndBuildMinHeap(item, freq, size); while (!checkSizeOne(minHeap)) ( left = extractMin(minHeap); right = extractMin(minHeap); top = newNode('$', left->freq + right->freq); top->left = left; top->right = right; insertMinHeap(minHeap, top); ) return extractMin(minHeap); ) void printHCodes(struct MinHNode *root, int arr(), int top) ( if (root->left) ( arr(top) = 0; printHCodes(root->left, arr, top + 1); ) if (root->right) ( arr(top) = 1; printHCodes(root->right, arr, top + 1); ) if (isLeaf(root)) ( cout 

Huffman Coding Complexity

The time complexity for encoding each unique character based on its frequency is O(nlog n).

Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity is O(log n). Thus the overall complexity is O(nlog n).

Huffman Coding Applications

  • Huffman coding is used in conventional compression formats like GZIP, BZIP2, PKZIP, etc.
  • For text and fax transmissions.

Interesanti raksti...