Recently I was tasked with evaluating several possible solutions for a SQL Server search engine to search a database of students. For a proof of concept, I had a small snapshot of approximately 200,000 students. I needed to expand this data to simulate a much larger data set of 1,750,000 students for performance and load testing.
Using a Common Table Expression to retrieve the students in random order, I combined data from multiple random students.
In the example below, I was able to “mix” two random students’ first and last names and student IDs. To combine more students, simply repeat the join to studentCTE.
WITH studentCTE (FirstName, LastName, StudentId, R)
SELECT FirstName, LastName, StudentId, ROW_NUMBER() over (order by NEWID()) AS R
INSERT INTO dbo.Students (Id, FirstName, LastName, StudentId )
SELECT NewId(), FirstName, LastName, StudentId
WHEN a.R%2 = 0 THEN a.LastName
WHEN a.R%2 = 0 THEN b.FirstName
, SUBSTRING(a.StudentId, 1,3) + SUBSTRING(c.StudentId, 4,2) + SUBSTRING(b.StudentId,6,4) [StudentId]
FROM studentCTE a
inner join studentCTE b
on a.R + 1 = b.R
inner join studentCTE c
on b.R + 1 = c.R
WHERE randomStudents.StudentId NOT IN (SELECT StudentId FROM Students)
Using his method, I was able to greatly expand my available data starting from a snapshot of real student data. In the case of my client, I combined phone area codes, phone numbers (with randomized line number), email domains and address information to provide a greater set of test data.