Let’s say that you need to import a large set of student data together with the campus (string), faculty (string) and course information (string).
Your main table, let’s call it `students` must have ID links to the separate tables, `campuses` (ID), `faculties` (ID) and `qualifications` (ID).
Your problem: you received the data in a single flat file, where each student record is in a line – thus, non-normalized data. How do you go about this? Here is a way:
- First import each separate sub-table (if I may call it that), for example, `qualifications`. This will give each string entry a primary key and name (depending on the fields and setup of your table).
- Next, in your `students` table, set the field types of the foreign keys to the a string variant (or type matching your data; of course, these are not keys yet at this point – currently just regular columns in the table).
- Now import the entire flat file (including the text values for the columns that need to be referencing separate tables.
- Update the `students` table with the appropriate ID from its sub-table, for example, `qualifications` [Query follows below].
- Once the string values are replaced by IDs, simply set the field types for these IDs in the `student` table to the same type as the sub-table’s ID (typically INT).
- Finally set your foreign keys if you need to.
Here is the query you will need to do replace the values (as explained in Step 4 above) by matching with the string value for the sub-table in the `students` table with the `name` field in the sub-table:
UPDATE `students` s
SET s.qualification_id =
(SELECT `id` FROM `qualifications` q WHERE q.name = s.qualification_id)
Do this for each separate sub-table, and you’re done.