如何使用 Python 爬取媒體檔案?


簡介

在實際企業的商業環境中,大多數資料可能不會儲存在文字或 Excel 檔案中。諸如 Oracle、SQL Server、PostgreSQL 和 MySQL 等基於 SQL 的關係資料庫廣泛使用,並且許多備用資料庫已經變得非常流行。

資料庫的選擇通常取決於應用程式的效能、資料完整性和可擴充套件性需求。

如何操作

在此示例中,我們將介紹如何建立 sqlite3 資料庫。sqllite 在預設情況下與 Python 安裝一起安裝,並且不需要任何進一步的安裝。如果您不確定,請嘗試以下操作。我們還將匯入 Pandas。

將資料從 SQL 載入到 DataFrame 是相當直接的,而 pandas 有一些函式可以簡化此過程。

import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")

輸出

2.6.0

輸出

# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"Output \n *** Customers info -\n {customers}")

輸出

*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})

print(f"Output \n *** orders info -\n {orders}")

輸出

*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

輸出

# frame an sql to fetch the data.
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""

輸出

# run the sql.
pd.read_sql_query(q, con=conn)

示例

7. 將它們全部組合在一起。

import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")
# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})

print(f"*** Customers info -\n {customers}")

# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})

print(f"*** orders info -\n {orders}")

# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

# frame an sql to fetch the data.
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;

"""

# run the sql.
pd.read_sql_query(q, con=conn)

輸出

2.6.0
*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
customerID firstName productQuantity
____________________________________
0      a1         Person1     3
1 c1 Person3 2
2 d1 Person4 1

更新於:09-11-2020

102 次觀看

開始您的職業

完成課程獲得認證

立即開始
廣告