Replace the value of one column with a value from another column in MySQL

Let’s say that you need to import a large set of student data together with the campus (string), faculty (string) and course information (string).

Your main table, let’s call it `students` must have ID links to the separate tables, `campuses` (ID), `faculties` (ID) and `qualifications` (ID).

Your problem: you received the data in a single flat file, where each student record is in a line – thus, non-normalized data. How do you go about this? Here is a way:

  1. First import each separate sub-table (if I may call it that), for example, `qualifications`. This will give each string entry a primary key and name (depending on the fields and setup of your table).
  2. Next, in your `students` table, set the field types of the foreign keys to the a string variant (or type matching your data; of course, these are not keys yet at this point – currently just regular columns in the table).
  3. Now import the entire flat file (including the text values for the columns that need to be referencing separate tables.
  4. Update the `students` table with the appropriate ID from its sub-table, for example, `qualifications` [Query follows below].
  5. Once the string values are replaced by IDs, simply set the field types for these IDs in the `student` table to the same type as the sub-table’s ID (typically INT).
  6. Finally set your foreign keys if you need to.

Here is the query you will need to do replace the values (as explained in Step 4 above) by matching with the string value for the sub-table in the `students` table with the `name` field in the sub-table:

UPDATE `students` s
SET s.qualification_id =
(SELECT `id` FROM `qualifications` q WHERE q.name = s.qualification_id)

Do this for each separate sub-table, and you’re done.

Select all columns except one from a MySQL table

I am not sure how often you would stumble upon this problem, however, I did recently, and after some research, found a nifty way to overcome it.

Here is the scenario:

  • You want to import a large CSV file into a database table.
  • The files have potential to be very large, therefore you do not want to get rid of the old data before you have imported the new data successfully into a temporary table.

How do you do this?

  • You import your data into a temporary table, let’s call it `tempdata`. This will take some time with large files, especially if you are processing each line before inserting.
  • Once the import is successful, you simply and copy from the `tempdata` to the `realdata` table by doing a simple query: INSERT INTO `realdata` SELECT * FROM `tempdata`. While this would work when there are no keys defined, you run into problems when you have UNIQUE or PRIMARY keys defined, because the key columns in the `tempdata` and `realdata` tables will overlap.

Oops. Now what?

You have two possible solutions:

  • Select only the columns that you want to import from `tempdata` and import into the same columns into `realdata`, or
  • You can be a little adventurous if you do not know beforehand what all the column names are (for example when you create tables on the fly).

Here is how you bring the adventure into your SQL mission:

First, you need to get the columns from the `realdata` table from the INFORMATION SCHEMA table:

SELECT `column_name` FROM information_schema.columns WHERE `table_name` = '" . $table_from . "' and column_name != 'id'

Now, this is assuming that your primary key or unique key’s name is “id”.

The result of this query will then contain all the columns in the `realdata` table except the ID column.

Now, with some PHP magic, you build up your query using these names and some string concatenation, and you can simply do this:

INSERT INTO `realdata` (**COLSTRING**) SELECT **COLSTRING** FROM `tempdata`

Voila!

Finding duplicate records in a MySQL table

So, you imported a large CSV file. You do not have a UNIQUE index on one of the fields, for example, the “student_number” field. Your client asks you to find a list of the duplicates. What do you do?

You write an SQL query, of course!

Assuming we want to de-duplicate the entire student database from the “students” table, by finding duplicates of the field “student_number”. Here is how you go about it:

First you write the INNER query:

SELECT `student_number` FROM `students` GROUP BY `student_number` HAVING COUNT(`id`) > 1

You now assign the alias “duplicate” to this query:

(SELECT `student_number` FROM `students` GROUP BY `student_number` HAVING COUNT(`id`) > 1) duplicate

Next, you write the OUTER query, and specify the match fields:

SELECT `first_name`, `last_name`, students.student_number FROM `students`
INNER JOIN __________ ON students.student_number = duplicate.student_number

Finally, you drop in the INNER query into the _____ in the OUTER query, leaving you with:

SELECT `first_name`, `last_name`, students.student_number FROM `students`
INNER JOIN (SELECT `student_number` FROM `students` GROUP BY `student_number` HAVING COUNT(`id`) > 1) duplicate ON students.student_number = duplicate.student_number

The OUTER query contain the following elements:

* The fields that you want to show when displaying the records (first_name, last_name)
* The field that you want to de-duplicate (student_number)

The INNER query does the actual magic by joining the same table with itself on the student_number field, finding all student_numbers that have more than one ID.

There you have it. Simple, right?