如何在 Python 函式中消除重複行？

Python 伺服器端程式設計程式設計

在本文中，我們將討論如何在 Python 中刪除重複的多行。如果檔案很小並且只有幾行，則可以手動執行刪除重複行的過程。但是，在處理大型檔案時，Python 可以提供幫助。

使用檔案處理方法

Python 具有用於建立、開啟和關閉檔案的內建方法，這使得處理檔案變得更容易。Python 還允許在檔案開啟時執行多種檔案操作，例如讀取、寫入和追加資料。

要從 Python 文字檔案或函式中刪除重複行，我們使用 Python 中的檔案處理方法。文字檔案或函式必須與包含 Python 程式的 .py 檔案位於同一目錄中。

演算法

以下是消除 Python 函式中重複行的步驟

由於我們只需要讀取此檔案的內容，因此首先以只讀模式開啟輸入檔案。
現在，要將內容寫入此檔案，請以寫入模式開啟輸出檔案。
逐行讀取輸入檔案，然後檢查輸出檔案以檢視是否已寫入任何與該行類似的行。
如果沒有，則將此行新增到輸出檔案並在集合中儲存該行的雜湊值。我們不會檢查和儲存整行，而是檢查每行的雜湊值。在處理大型檔案時，這更有效並且佔用更少的空間。
如果雜湊值已新增到集合中，則跳過該行。
完成後，輸出檔案將包含輸入檔案中的每一行，而沒有任何重複。

在這裡，輸入檔案即“File.txt”包含以下資料：

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

示例

以下是如何消除 Python 函式中重複行的示例：

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

輸出

我們可以在以下輸出中看到，輸入檔案中的所有重複行都已從輸出檔案中刪除，輸出檔案包含如下所示的唯一資料：

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

示例

以下是另一個消除 Python 函式中重複行的示例：

# path of the input and output files
# Create the output file in write mode
OutFile = open('C:\Users\Lenovo\Downloads\Work TP\pre.txt',"w")
11
# Create an input file in read mode
InFile = open('C:\Users\Lenovo\Downloads\Work TP\File.txt', "r")
# holding the line which is already seen
lines_present = set()
# iterate every line present in the file
for l in InFile:
   # check whether the lines are unique
   if l not in lines_present:
      # writing all the unique lines in the output file
      OutFile.write(l)
      # adding unique lines in the lines_present
      lines_present.add(l)
# closing the output text files
OutFile.close()
InFile.close()

輸出

我們可以在以下輸出中看到，輸入檔案中的所有重複行都已從輸出檔案中刪除，輸出檔案包含如下所示的唯一資料

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Sarika Singh

更新於： 2023-09-09

5K+ 瀏覽量

開啟你的職業生涯

透過完成課程獲得認證

開始學習

如何在 Python 函式中消除重複行？

使用檔案處理方法

演算法

示例

輸出

示例

輸出

開啟你的 職業生涯

開啟你的職業生涯