Delete duplicate rows from table using join

I have two tables that contain the state (state_table) and city (city_table) of countries.

The city table has state_id to associate it with state_table

Both tables already have data in it.

Now the problem

The city table contains several city records in one state. And other cities may or may not have the same city name.

For example: cityone will have 5 cases in the city table with stateone and 2 occurrences with statetwo

So, how do I write a query to save one city for each state and delete the rest?

The scheme follows

CREATE TABLE IF NOT EXISTS `city_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `state_id` int(11) NOT NULL, `city` varchar(25) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; CREATE TABLE IF NOT EXISTS `state_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `state` varchar(15) NOT NULL, `country_id` smallint(5) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; 

This is sample data.

 id state_id city 1 1 city_one 2 1 city_two 3 1 city_one 4 1 city_two 5 2 city_one 6 3 city_three 7 3 city_one 8 3 city_three 9 4 city_four 10 4 city_five 

The original table has 152,451 rows

+4
source share
2 answers

If you want to delete a duplicated city with the same state_id (duplicate records), you can do this by grouping them with city and state_id and using the MIN or MAX function:

Before deleting the query, your table looked like

 | ID | STATE_ID | CITY | ------------------------------ | 1 | 1 | city_one | | 2 | 1 | city_two | | 3 | 1 | city_one | | 4 | 1 | city_two | | 5 | 2 | city_one | | 6 | 3 | city_three | | 7 | 3 | city_one | | 8 | 3 | city_three | | 9 | 4 | city_four | | 10 | 4 | city_five | 

You can use the following query to remove duplicate entries:

 DELETE city_table FROM city_table LEFT JOIN (SELECT MIN(id) AS IDs FROM city_table GROUP BY city,state_id )A ON city_table.ID = A.IDs WHERE A.ids IS NULL; 

After applying the above query, your table will look like this:

 | ID | STATE_ID | CITY | ------------------------------ | 1 | 1 | city_one | | 2 | 1 | city_two | | 5 | 2 | city_one | | 6 | 3 | city_three | | 7 | 3 | city_one | | 9 | 4 | city_four | | 10 | 4 | city_five | 

See this SQLFiddle

See DELETE MySQL Syntax for more details .

+1
source
 DELETE FROM city_table WHERE id NOT IN (SELECT MIN(id) FROM city_table GROUP BY state_id, city) 

If you find this query too slow, you can create a temporary table and save the output of the subquery in it, and then crop the original table and fill its contents. This is a bit of a dirty decision, since you will need to set the values ​​of the auto_increment column.

-1
source

All Articles