MySQL - 處理重複資料



資料庫中的表或結果集通常包含重複記錄。雖然通常允許重複,但在某些情況下需要防止重複。在這種情況下,必須從資料庫表中識別並刪除重複記錄。

處理 MySQL 重複資料的重要性

處理資料庫中重複資料的原因有很多。主要原因之一是,組織資料庫中存在重複資料會導致邏輯錯誤。此外,我們需要處理冗餘資料以防止以下後果:

  • 重複資料佔用儲存空間,降低資料庫使用效率並增加儲存成本。
  • 處理重複記錄會消耗額外的資源,從而增加維護資料庫的總成本。
  • 資料庫中的重複資料會導致資料邏輯錯誤,影響儲存資訊的完整性和可靠性。

防止重複條目

您可以使用表中具有相應欄位的主鍵 (PRIMARY KEY)唯一 (UNIQUE) 索引來防止將重複記錄條目插入表中。

示例

下表不包含此類索引或主鍵,因此它允許first_namelast_name 的重複記錄。

CREATE TABLE CUSTOMERS (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

為防止在此表中建立具有相同 first_name 和 last_name 值的多個記錄,請向其定義中新增主鍵 (PRIMARY KEY)。執行此操作時,還需要將索引列宣告為NOT NULL,因為主鍵 (PRIMARY KEY) 不允許NULL 值:

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

使用 INSERT IGNORE 查詢:

表中存在唯一索引通常會在嘗試插入與索引列中現有記錄重複的記錄時導致錯誤。

為了處理這種情況而不產生錯誤,您可以使用“INSERT IGNORE”命令。當記錄不是重複記錄時,MySQL 會照常插入它。但是,如果記錄是重複的,“IGNORE”關鍵字會指示 MySQL 丟棄它,而不會產生錯誤。

提供的示例不會導致錯誤,並且它還確保不會插入重複記錄:

INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');

我們將獲得以下輸出:

Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 2  Duplicates: 1  Warnings: 1

使用 REPLACE 查詢:

考慮使用 REPLACE 命令代替 INSERT 命令。處理新記錄時,它會像 INSERT 一樣插入。但是,如果它是重複的,新記錄將替換舊記錄。

REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');

以下是上述程式碼的輸出:

Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

您在 INSERT IGNORE 和 REPLACE 命令之間的選擇應取決於您希望實現的特定重複處理行為。INSERT IGNORE 命令保留第一組重複記錄並丟棄其餘記錄。另一方面,REPLACE 命令保留最後一組重複項並刪除任何較早的例項。

使用 UNIQUE 約束:

強制唯一性的另一種方法是向表新增唯一 (UNIQUE) 索引,而不是主鍵 (PRIMARY KEY):

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

計數和識別重複項

您可以使用 COUNT 函式和 GROUP BY 子句根據特定列計數和識別重複記錄。

示例

以下是計算表中 first_name 和 last_name 的重複記錄的查詢:

SELECT COUNT(*) as repetitions, last_name, first_name
FROM CUSTOMERS
GROUP BY last_name, first_name
HAVING repetitions > 1;

此查詢將返回 CUSTOMERS 表中所有重複記錄的列表。一般來說,要識別重複的值集,請遵循以下步驟。

  • 確定哪些列可能包含重複值。

  • 將這些列包含在列選擇列表中,以及 COUNT(*)。

  • 還在 GROUP BY 子句中列出這些列。

  • 應用 HAVING 子句以透過要求組計數大於一來過濾唯一值。

從查詢結果中消除重複項

您可以將 DISTINCT 命令與 SELECT 語句一起使用來查詢表中可用的唯一記錄。

SELECT DISTINCT last_name, first_name
FROM CUSTOMERS
ORDER BY last_name;

除了DISTINCT命令,還可以新增GROUP BY子句,指定要選擇的列。這種方法可以消除重複項,只檢索指定列中唯一的值組合。

SELECT last_name, first_name
FROM CUSTOMERS
GROUP BY (last_name, first_name);

使用表替換刪除重複項

如果表中存在重複記錄,並且想要刪除表中的所有重複記錄,請按照以下步驟操作:

CREATE TABLE tmp AS 
SELECT DISTINCT last_name, first_name, sex
FROM CUSTOMERS;

DROP TABLE CUSTOMERS;
ALTER TABLE tmp RENAME TO CUSTOMERS;

使用客戶端程式處理重複項

我們也可以使用客戶端程式來處理重複資料。

語法

要透過PHP程式處理重複值,需要使用mysqli函式query()執行“INSERT IGNORE”語句,如下所示:

$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
$mysqli->query($sql);

要透過JavaScript程式處理重複值,需要使用mysql2庫的query()函式執行“INSERT IGNORE”語句,如下所示:

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql);

要透過Java程式處理重複值,需要使用JDBC函式execute()執行“INSERT IGNORE”語句,如下所示:

String sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
statement.execute(sql);

要透過Python程式處理重複值,需要使用MySQL Connector/Pythonexecute()函式執行“INSERT IGNORE”語句,如下所示:

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
cursorObj.execute(sql)

示例

以下是程式示例:

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); $sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; if($mysqli->query($sql)){ printf("Table created successfully...!\n"); } //let's insert some records $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } //now lets insert duplicate record with IGNORE keyword $sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...!\n"); } $sql = "SELECT * from person_tbl"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf("\n"); } } //lets insert a duplicate record $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...!\n"); } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

輸出

獲得的輸出結果如下所示:

Table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Duplicate record inserted successfully using IGNORE keyword...!
Table records:
First Name: Thomas, Last name: Jay, Sex:
First Name: Smith, Last name: John, Sex:
PHP Fatal error:  Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D:\test\handlingduplicates.php:48    
var mysql = require('mysql2');
var con = mysql.createConnection({
host:"localhost",
user:"root",
password:"password"
});

 //Connecting to MySQL
 con.connect(function(err) {
 if (err) throw err;
//   console.log("Connected successfully...!");
//   console.log("--------------------------");
 sql = "USE TUTORIALS";
 con.query(sql);
 //create table
 sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
 con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Table created successfully....!");
    });
//now let's insert some records
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("First record inserted successfully...!");
    });
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Second record inserted successfully...!");
    });
//now lets insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record with IGNORE keyword")
    if (err) throw err;
    console.log("Duplicate record inserted successfully with the help of IGNORE keyword");
    });
//lets insert a duplicate record
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record");
    if (err) throw err;
    console.log("You can't insert the duplicate record because columns are primary key");
    });
sql = "SELECT * FROM person_tbl";
con.query(sql, function(err, result){
    console.log("Table records(with ID auto_increment sequence).");
    if (err) throw err;
    console.log(result);
    });
});   

輸出

獲得的輸出結果如下所示:

Table created successfully....!
First record inserted successfully...!
Second record inserted successfully...!
Insert duplicate record with IGNORE keyword
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record
Insert duplicate record
D:\test1\duplicate.js:43
    if (err) throw err;              ^

Error: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class HandleDuplicates {
   public static void main(String[] args) {
      String url = "jdbc:mysql://:3306/TUTORIALS";
      String user = "root";
      String password = "password";
      ResultSet rs;
      try {
         Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            //create table
            String sql = "CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
            st.execute(sql);
            System.out.println("Table created successfully....!");
            //let's insert some records
            String sql1 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql1);
            System.out.println("Record inserted successfully...!");
            String sql2 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql2);
            System.out.println("Duplicate record inserted successfully...!");
            //lets print the table records
            String sql3 = "SELECT * FROM PERSON_TBL";
            rs = st.executeQuery(sql3);
            System.out.println("Table records: ");
            while(rs.next()) {
               String fname = rs.getString("first_name");
               String lname = rs.getString("last_name");
               System.out.println("First name: " + fname + ", Last name: " + lname);
            }
      }catch(Exception e) {
         e.printStackTrace();
      }
   }
}   

輸出

獲得的輸出結果如下所示:

Table created successfully....!
Record inserted successfully...!
Duplicate record inserted successfully...!
Table records: 
First name: Thomas, Last name: Jay
import mysql.connector
# Connecting to MySQL
con = mysql.connector.connect(
    host="localhost",
    user="root",
    password="password",
    database="tut"
)
# Creating a cursor object
cursorObj = con.cursor()
# Creating the table
create_table_query = """
CREATE TABLE person_tbl (
    first_name CHAR(20) NOT NULL,
    last_name CHAR(20) NOT NULL,
    sex CHAR(10),
    PRIMARY KEY (last_name, first_name)
)
"""
cursorObj.execute(create_table_query)
print("Table 'person_tbl' is created successfully!")
# Inserting some records
first_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("First record inserted successfully!")
cursorObj.execute(first_record)

second_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"
print("Second record inserted successfully!")
cursorObj.execute(second_record)
# Insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("Duplicate record inserted successfully with the help of IGNORE keyword")
cursorObj.execute(sql)
# Insert a duplicate record (this will throw an error)
try:
    cursorObj.execute("INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')")
    print("Insert duplicate record")
except mysql.connector.Error as err:
    print("Insert duplicate record error:", err)
con.commit()
# Retrieving records
cursorObj.execute("SELECT * FROM person_tbl")
records = cursorObj.fetchall()
# Printing the records
print("Table records.")
for record in records:
    print(record)
# Closing the connection
cursorObj.close()
con.close()   

輸出

獲得的輸出結果如下所示:

First record inserted successfully!
Second record inserted successfully!
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record error: 1062 (23000): Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
Table records.
('Thomas', 'Jay', None)
('Smith', 'John', None)
廣告