如何使用MMAP函式提高Python的檔案讀取效能?
簡介...
MMAP,即記憶體對映,當對映到檔案時,它使用作業系統的虛擬記憶體直接訪問檔案系統上的資料,而不是使用普通的I/O函式訪問資料。從而提高了I/O效能,因為它不需要為每次訪問都進行單獨的系統呼叫,也不需要在緩衝區之間複製資料。
事實上,任何在記憶體中的東西,例如在記憶體中建立的SQLlite資料庫,其效能都比磁碟上的資料庫要好。
記憶體對映檔案可以根據需要被視為可變字串或類檔案物件。
MMAP支援許多方法,例如close()、flush()、read()、readline()、seek()、tell()、write(),並且可以很好地與切片操作甚至正則表示式一起使用。
操作方法...
1. 假設有一個包含以下內容的文字檔案。您可以透過使用Google搜尋示例文字獲得此文字。將這些內容複製到input.txt檔案中。
Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.
Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his.
Te nam tempor posidonium scripserit, eam mundi reprimique dissentias ne. Vim te soleat offendit democritum. Nam an diam elaboraret, quaeque dissentias an has. Autem legendos dignissim ad vis, sea ex amet petentium reprehendunt, inermis constituam philosophia ne mel. Esse noster lobortis usu ne.
Nec reque postea urbanitas ut, mea in nulla invidunt ocurreret. Ei duo iuvaret numquam. Ferri nemore audire te est, mel et detracto noluisse. Nec eu habeo justo, id pro posse apeirian volutpat. Mea sonet quaestio ne.
Atqui quaeque alienum te vim. Graeco aliquip liberavisse pro ut. Te similique reformidans usu, te mundi aliquando ius. Meis scripta minimum quo no, meis prima fabellas eu eam, laoreet delicata forensibus ut vim. Et quo vocibus mediocritatem, atqui summo an eam.
2. 我們將使用mmap()函式建立一個記憶體對映檔案。我們可以透過檔案物件的fileno()方法或os.open()來傳遞檔名。
注意:使用者有責任在呼叫mmap()之前開啟檔案,並在之後關閉它。
mmap()的第二個引數是以位元組為單位的大小,表示要對映的檔案的哪一部分。如果值為0,則對映整個檔案。還有一個額外的引數可以使用,即ACCESS_READ用於只讀訪問,ACCESS_WRITE用於直寫訪問,ACCESS_COPY用於寫時複製訪問。
import mmap input_text = """Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam. Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his. Te nam tempor posidonium scripserit, eam mundi reprimique dissentias ne. Vim te soleat offendit democritum. Nam an diam elaboraret, quaeque dissentias an has. Autem legendos dignissim ad vis, sea ex amet petentium reprehendunt, inermis constituam philosophia ne mel. Esse noster lobortis usu ne. Nec reque postea urbanitas ut, mea in nulla invidunt ocurreret. Ei duo iuvaret numquam. Ferri nemore audire te est, mel et detracto noluisse. Nec eu habeo justo, id pro posse apeirian volutpat. Mea sonet quaestio ne. Atqui quaeque alienum te vim. Graeco aliquip liberavisse pro ut. Te similique reformidans usu, te mundi aliquando ius. Meis scripta minimum quo no, meis prima fabellas eu eam, laoreet delicata forensibus ut vim. Et quo vocibus mediocritatem, atqui summo an eam. """ # create a inout file with some text input_file = 'input.txt' f = open(input_file, "w+") f.write(input_text) f.close() #Open the file in read mode with open(input_file, 'r') as f: with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as m: print(f"Output \n*** Output first 5 bytes of the {input_file} is {m.read(5)} ") print(f"*** Output Next 10 bytes of the {input_file} is {m.read(10)} ")
輸出
*** Output first 5 bytes of the input.txt is b'Lorem' *** Output Next 10 bytes of the input.txt is b' ipsum dol'
3. 我們已經讀取了檔案並將其對映到記憶體中,並使用.read()讀取了前5個位元組。因此,在第一次讀取之後,檔案指標向前移動了10個位元組。現在,如果您再進行一次讀取,例如read(10)位元組,它將為您提供第6-15個位元組。
4. 要設定記憶體對映檔案以進行更新,請在對映它之前以'r+'(而不是'w')開啟它。
我將透過一個例子向您展示如何在原地修改部分行。
import mmap import shutil input_file = 'input.txt' input_copy = input_file.replace('input','input_copy') # Make a Copy of the file just to make sure original is un-modified. shutil.copyfile(input_file,input_copy) # word word = b'ipsum' # modified word modified_word = word[::-1] # Open the file to receive updates with open(input_copy, 'r+') as f: with mmap.mmap(f.fileno(), 0) as m: print(f"output \n *** Line before updates \n {m.readline().rstrip()}") # Rewind using seek m.seek(0) # find the word and reverse it loc = m.find(word) m[loc:loc + len(word)] = modified_word m.flush() # Rewind using seek m.seek(0) print(f" \n *** Line after updates \n {m.readline().rstrip()}") f.seek(0) print(f" \n *** Final file \n {f.readline().rstrip()}")
輸出
*** Line before updates b'Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.' *** Line after updates b'Lorem muspi dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.' *** Final file Lorem muspi dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.
5. 單詞“ipsum”在第一行的中間被替換為記憶體和檔案中。
6. 如果由於任何原因您想檢視記憶體中的更改並且不想更新磁碟上的檔案,請使用ACCESS_COPY。
import mmap import shutil input_file = 'input.txt' input_copy = input_file.replace('input','input_copy') # Make a Copy of the file just to make sure original is un-modified. shutil.copyfile(input_file,input_copy) # word word = b'ipsum' # modified word modified_word = word[::-1] # Open the file to receive updates with open(input_copy, 'r+') as f: with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_COPY) as m: print(f"output \n *** Line before updates \n {m.readline().rstrip()}") # Rewind using seek m.seek(0) # find the word and reverse it loc = m.find(word) m[loc:loc + len(word)] = modified_word m.flush() # Rewind using seek m.seek(0) print(f" \n *** Line after updates \n {m.readline().rstrip()}") f.seek(0) print(f" \n *** Final file \n {f.readline().rstrip()}")
輸出
*** Line before updates b'Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.' *** Line after updates b'Lorem muspi dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.' *** Final file Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.
7. 觀察輸入和輸出的內容沒有改變,而更改只應用於記憶體中。