插值搜尋演算法



插值搜尋是二分搜尋的一種改進變體。這種搜尋演算法基於所需值的探測位置。為了使該演算法正常工作,資料集合應以排序且均勻分佈的形式存在。

二分搜尋在時間複雜度方面相較於線性搜尋具有巨大優勢。線性搜尋的最壞情況複雜度為 Ο(n),而二分搜尋為 Ο(log n)。

在某些情況下,目標資料的儲存位置可以提前知道。例如,在電話簿中,如果我們想搜尋“Morpheus”的電話號碼。在這裡,線性搜尋甚至二分搜尋都會顯得緩慢,因為我們可以直接跳轉到儲存以'M'開頭的名稱的記憶體空間。

二分搜尋中的定位

在二分搜尋中,如果未找到所需資料,則其餘列表將被分成兩部分,較低部分和較高部分。搜尋將在其中一個部分進行。

Positioning_in_Binary_Search divided_in_two_parts positioning desired_data

即使資料已排序,二分搜尋也不會利用探測所需資料位置的優勢。

插值搜尋中的位置探測

插值搜尋透過計算探測位置來查詢特定項。最初,探測位置是集合中最中間項的位置。

Position_Probing_in_Interpolation_Search

probe_position

如果找到匹配項,則返回該項的索引。為了將列表分成兩部分,我們使用以下方法:

$$mid\, =\, Lo\, +\, \frac{\left ( Hi\, -\, Lo \right )\ast \left ( X\, -\, A\left [ Lo \right ] \right )}{A\left [ Hi \right ]\, -\, A\left [ Lo \right ]}$$

其中:

A = list
Lo = Lowest index of the list
Hi = Highest index of the list
A[n] = Value stored at index n in the list

如果中間項大於該項,則在中間項右側的子陣列中再次計算探測位置。否則,在中間項左側的子陣列中搜索該項。此過程也將在子陣列上繼續,直到子陣列的大小減小到零。

由於它是現有 BST 演算法的改進,因此我們提到了使用位置探測搜尋“目標”資料值索引的步驟:

1. Start searching data from middle of the list. 
2. If it is a match, return the index of the item, and exit. 
3. If it is not a match, probe position. 
4. Divide the list using probing formula and find the new middle. 
5. If data is greater than middle, search in higher sub-list. 
6. If data is smaller than middle, search in lower sub-list. 
7. Repeat until match.

虛擬碼

A → Array list
N → Size of A
X → Target Value

Procedure Interpolation_Search()

   Set Lo → 0
   Set Mid → -1
   Set Hi → N-1

   While X does not match
      if Lo equals to Hi OR A[Lo] equals to A[Hi]
         EXIT: Failure, Target not found
      end if

      Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

      if A[Mid] = X
         EXIT: Success, Target found at Mid
      else
         if A[Mid] < X
            Set Lo to Mid+1
         else if A[Mid] > X
            Set Hi to Mid-1
         end if
      end if
   End While
End Procedure

分析

插值搜尋演算法的執行時複雜度為 Ο(log (log n)),而在有利情況下,BST 的執行時複雜度為 Ο(log n)

示例

為了理解插值搜尋中涉及的逐步過程,讓我們看一個示例並圍繞它進行操作。

考慮下面給出的已排序元素陣列:

array_of_sorted_elements

讓我們搜尋元素 19。

解決方案

與二分搜尋不同,在這種方法中,中間點是使用以下公式選擇的:

$$mid\, =\, Lo\, +\, \frac{\left ( Hi\, -\, Lo \right )\ast \left ( X\, -\, A\left [ Lo \right ] \right )}{A\left [ Hi \right ]\, -\, A\left [ Lo \right ]}$$

因此,在此給定的陣列輸入中,

Lo = 0, A[Lo] = 10
Hi = 9, A[Hi] = 44
X = 19

應用公式查詢列表中的中間點,我們得到

$$mid\, =\, 0\, +\, \frac{\left ( 9\, -\, 0 \right )\ast \left ( 19\, -\, 10 \right )}{44\, -\, 10}$$

$$mid\, =\, \frac{9\ast 9}{34}$$

$$mid\, =\, \frac{81}{34}\,=\,2.38$$

由於 mid 是索引值,因此我們只考慮小數的整數部分。即,mid = 2。

at_index_2

將給定的鍵元素(即 19)與存在於中間索引處的元素進行比較,發現這兩個元素匹配。

因此,該元素位於索引 2 處。

實現

插值搜尋是二分搜尋的一種改進變體。這種搜尋演算法基於所需值的探測位置。為了使該演算法正常工作,資料集合應以排序且均勻分佈的形式存在。

#include<stdio.h>
#define MAX 10

// array of items on which linear search will be conducted.
int list[MAX] = { 10, 14, 19, 26, 27, 31, 33, 35, 42, 44 };
int interpolation_search(int data){
   int lo = 0;
   int hi = MAX - 1;
   int mid = -1;
   int comparisons = 1;
   int index = -1;
   while(lo <= hi) {
      printf("Comparison %d \n" , comparisons ) ;
      printf("lo : %d, list[%d] = %d\n", lo, lo, list[lo]);
      printf("hi : %d, list[%d] = %d\n", hi, hi, list[hi]);
      comparisons++;
      
      // probe the mid point
      mid = lo + (((double)(hi - lo) / (list[hi] - list[lo])) * (data - list[lo]));
      printf("mid = %d\n",mid);
      
      // data found
      if(list[mid] == data) {
         index = mid;
         break;
      } else {
         if(list[mid] < data) {
            
            // if data is larger, data is in upper half
            lo = mid + 1;
         } else {
            
            // if data is smaller, data is in lower half
            hi = mid - 1;
         }
      }
   }
   printf("Total comparisons made: %d", --comparisons);
   return index;
}
int main(){
   
   //find location of 33
   int location = interpolation_search(33);
   
   // if element was found
   if(location != -1)
      printf("\nElement found at location: %d" ,(location+1));
   else
      printf("Element not found.");
   return 0;
}

輸出

Comparison 1 
lo : 0, list[0] = 10
hi : 9, list[9] = 44
mid = 6
Total comparisons made: 1
Element found at location: 7
#include<iostream>
using namespace std;
#define MAX 10

// array of items on which linear search will be conducted.
int list[MAX] = { 10, 14, 19, 26, 27, 31, 33, 35, 42, 44 };
int interpolation_search(int data){
   int lo = 0;
   int hi = MAX - 1;
   int mid = -1;
   int comparisons = 1;
   int index = -1;
   while(lo <= hi) {
      cout << "Comparison " << comparisons << endl;
      cout << "lo : " << lo << " list[" << lo << "] = " << list[lo] << endl;
      cout << "hi : " << hi << " list[" << hi << "] = " << list[hi] << endl;
      comparisons++;
      
      // probe the mid point
      mid = lo + (((double)(hi - lo) / (list[hi] - list[lo])) * (data - list[lo]));
      cout << "mid = " << mid;
      
      // data found
      if(list[mid] == data) {
         index = mid;
         break;
      } else {
         if(list[mid] < data) {
            
            // if data is larger, data is in upper half
            lo = mid + 1;
         } else {
            
            // if data is smaller, data is in lower half
            hi = mid - 1;
         }
      }
   }
   cout << "\nTotal comparisons made: " << (--comparisons);
   return index;
}
int main(){
   
   //find location of 33
   int location = interpolation_search(33);
   
   // if element was found
   if(location != -1)
      cout << "\nElement found at location: " << (location+1);
   else
      cout << "Element not found.";
   return 0;
}

輸出

Comparison 1
lo : 0 list[0] = 10
hi : 9 list[9] = 44
mid = 6
Total comparisons made: 1
Element found at location: 7
import java.io.*;
public class InterpolationSearch {
   static int interpolation_search(int data, int[] list) {
      int lo = 0;
      int hi = list.length - 1;
      int mid = -1;
      int comparisons = 1;
      int index = -1;
      while(lo <= hi) {
         System.out.println("Comparison " + comparisons);
         System.out.println("lo : " + lo + " list[" + lo + "] = " + list[lo]);
         System.out.println("hi : " + hi + " list[" + hi + "] = " + list[hi]);
         comparisons++;
         
         // probe the mid point
         mid = lo + (((hi - lo) * (data - list[lo])) / (list[hi] - list[lo]));
         System.out.println("mid = " + mid);
         
         // data found
         if(list[mid] == data) {
            index = mid;
            break;
         } else {
            if(list[mid] < data) {
               
               // if data is larger, data is in upper half
               lo = mid + 1;
            } else {
               
               // if data is smaller, data is in lower half
               hi = mid - 1;
            }
         }
      }
      System.out.println("Total comparisons made: " + (--comparisons));
      return index;
   }
   public static void main(String args[]) {
      int[] list = { 10, 14, 19, 26, 27, 31, 33, 35, 42, 44 };
      
      //find location of 33
      int location = interpolation_search(33, list);
      
      // if element was found
      if(location != -1)
         System.out.println("Element found at location: " + (location+1));
      else
         System.out.println("Element not found.");
   }
}

輸出

Comparison 1
lo : 0 list[0] = 10
hi : 9 list[9] = 44
mid = 6
Total comparisons made: 1
Element found at location: 7
def interpolation_search( data, arr):
   lo = 0
   hi = len(arr) - 1
   mid = -1
   comparisons = 1
   index = -1
   while(lo <= hi):
      print("Comparison ", comparisons)
      print("lo : ", lo)
      print("list[", lo, "] = ")
      print(arr[lo])
      print("hi : ", hi)
      print("list[", hi, "] = ")
      print(arr[hi])
      comparisons = comparisons + 1

      #probe the mid point
      mid = lo + (((hi - lo) * (data - arr[lo])) // (arr[hi] - arr[lo]))
      print("mid = ", mid)

      #data found
      if(arr[mid] == data):
         index = mid
         break
      else:
         if(arr[mid] < data):
            
            #if data is larger, data is in upper half
            lo = mid + 1
         else:

            #if data is smaller, data is in lower half
            hi = mid - 1
   print("Total comparisons made: ")
   print(comparisons-1)
   return index

arr = [10, 14, 19, 26, 27, 31, 33, 35, 42, 44]
#find location of 33
location = interpolation_search(33, arr)

#if element was found
if(location != -1):
   print("Element found at location: ", (location+1))
else:
   print("Element not found.")

輸出

Comparison  1
lo :  0
list[ 0 ] = 
10
hi :  9
list[ 9 ] = 
44
mid =  6
Total comparisons made: 
1
Element found at location:  7
廣告