Reducing the database to size

Question

Reducing the database to size

Say that you have a database that has been serving the company for 10 years. It has a size of 500 GB, has many tables, stored procedures and triggers.

Now say that you want to create an abbreviated version of the database for use as a test bench for use in integration testing, and for individual testers and developers, deploy game instances.

In general terms, how would you define this task?

In case that matters, I mean the SQL Server 2008 database.

Edit: remove "unit testing" because, of course, unit tests should not test db integration

+7

sql database

Ben Apr 12 '11 at 15:21

source share

6 answers

How about viewing a transaction log file? Make sure you back up the source database.

USE db; GO -- Truncate the log by changing the database recovery model to SIMPLE. ALTER DATABASE db SET RECOVERY SIMPLE; GO -- Shrink the truncated log file to 1 MB. DBCC SHRINKFILE (db_log, 1); GO -- Reset the database recovery model. ALTER DATABASE db SET RECOVERY FULL; GO

I also found great success in rebuilding indexes and defragmenting .

Tara Keezer published this, he proved that he helps us work with the database: Thanks Tara Kizer, if you read it!

 -- required table IF OBJECT_ID('DefragmentIndexes') IS NULL CREATE TABLE DefragmentIndexes ( DatabaseName nvarchar(100) NOT NULL, SchemaName nvarchar(100) NOT NULL, TableName nvarchar(100) NOT NULL, IndexName nvarchar(100) NOT NULL, DefragmentDate datetime NOT NULL, PercentFragmented decimal(4, 2) NOT NULL, CONSTRAINT PK_DefragmentIndexes PRIMARY KEY CLUSTERED ( DatabaseName, SchemaName, TableName, IndexName, DefragmentDate ) ) GO IF OBJECT_ID(N'[dbo].[isp_ALTER_INDEX]') IS NOT NULL DROP PROC [dbo].[isp_ALTER_INDEX] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ------------------------------------------------------------------------------------------- -- OBJECT NAME : isp_ALTER_INDEX -- -- AUTHOR : Tara Kizer -- -- INPUTS : @dbName - name of the database -- @statsMode - LIMITED, SAMPLED or DETAILED -- @defragType - REORGANIZE (INDEXDEFRAG) or REBUILD (DBREINDEX) -- @minFragPercent - minimum fragmentation level -- @maxFragPercent - maximum fragmentation level -- @minRowCount - minimum row count -- @logHistory - whether or not to log what got defragmented -- @sortInTempdb - whether or not to sort the index in tempdb; -- recommended if your tempdb is optimized (see BOL for details) -- -- OUTPUTS : None -- -- DEPENDENCIES : DefragmentIndexes, sys.dm_db_index_physical_stats, sys.objects, sys.schemas, -- sys.indexes, sys.partitions, sys.indexes, sys.index_columns, INFORMATION_SCHEMA.COLUMNS -- -- DESCRIPTION : Defragments indexes /* EXEC isp_ALTER_INDEX @dbName = 'QHOSClient1', @statsMode = 'SAMPLED', @defragType = 'REBUILD', @minFragPercent = 10, @maxFragPercent = 100, @minRowCount = 1000, @logHistory = 1, @sortInTempdb = 1 */ /* http://weblogs.sqlteam.com/tarad/archive/2009/03/27/Defragmenting-Indexes-in-SQL-Server-2005Again.aspx Bug Fix - added SET QUOTED_IDENTIFIER ON to the script Feature - added logging feature http://weblogs.sqlteam.com/tarad/archive/2009/06/23/DefragmentingRebuilding-Indexes-in-SQL-Server-2005.aspx Bug Fix - initialized @lobData to 0 for each pass through the loop Bug Fix - checked for LOB data in included columns of non-clustered indexes Feature - added SORT_IN_TEMPB option http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx Bug Fix - added index_level = 0 to sys.dm_db_index_physical_stats query http://weblogs.sqlteam.com/tarad/archive/2009/11/03/DefragmentingRebuilding-Indexes-in-SQL-Server-2005-and-2008Again.aspx Bug Fix - for SQL Server 2008, @indexType could be 'XML INDEX' or 'PRIMARY XML INDEX' for XML indexes */ ------------------------------------------------------------------------------------------- CREATE PROC [dbo].[isp_ALTER_INDEX] ( @dbName sysname, @statsMode varchar(8) = 'SAMPLED', @defragType varchar(10) = 'REORGANIZE', @minFragPercent int = 25, @maxFragPercent int = 100, @minRowCount int = 0, @logHistory bit = 0, @sortInTempdb bit = 0 ) AS SET NOCOUNT ON IF @statsMode NOT IN ('LIMITED', 'SAMPLED', 'DETAILED') BEGIN RAISERROR('@statsMode must be LIMITED, SAMPLED or DETAILED', 16, 1) RETURN END IF @defragType NOT IN ('REORGANIZE', 'REBUILD') BEGIN RAISERROR('@defragType must be REORGANIZE or REBUILD', 16, 1) RETURN END DECLARE @i int, @objectId int, @objectName sysname, @indexId int, @indexName sysname, @schemaName sysname, @partitionNumber int, @partitionCount int, @sql nvarchar(4000), @edition int, @parmDef nvarchar(500), @allocUnitType nvarchar(60), @indexType nvarchar(60), @online bit, @disabled bit, @dataType nvarchar(128), @charMaxLen int, @allowPageLocks bit, @lobData bit, @fragPercent float SELECT @edition = CONVERT(int, SERVERPROPERTY('EngineEdition')) SELECT IDENTITY(int, 1, 1) AS FragIndexId, [object_id] AS ObjectId, index_id AS IndexId, avg_fragmentation_in_percent AS FragPercent, record_count AS RecordCount, partition_number AS PartitionNumber, index_type_desc AS IndexType, alloc_unit_type_desc AS AllocUnitType INTO #FragIndex FROM sys.dm_db_index_physical_stats (DB_ID(@dbName), NULL, NULL, NULL, @statsMode) WHERE avg_fragmentation_in_percent > @minFragPercent AND avg_fragmentation_in_percent < @maxFragPercent AND index_id > 0 AND index_level = 0 ORDER BY ObjectId -- LIMITED does not include data for record_count IF @statsMode IN ('SAMPLED', 'DETAILED') DELETE FROM #FragIndex WHERE RecordCount < @minRowCount SELECT @i = MIN(FragIndexId) FROM #FragIndex SELECT @objectId = ObjectId, @indexId = IndexId, @fragPercent = FragPercent, @partitionNumber = PartitionNumber, @indexType = IndexType, @allocUnitType = AllocUnitType FROM #FragIndex WHERE FragIndexId = @i WHILE @@ROWCOUNT <> 0 BEGIN -- get the table and schema names for the index SET @sql = ' SELECT @objectName = o.[name], @schemaName = s.[name] FROM ' + QUOTENAME(@dbName) + '.sys.objects o JOIN ' + QUOTENAME(@dbName) + '.sys.schemas s ON s.schema_id = o.schema_id WHERE o.[object_id] = @objectId' SET @parmDef = N'@objectId int, @objectName sysname OUTPUT, @schemaName sysname OUTPUT' EXEC sp_executesql @sql, @parmDef, @objectId = @objectId, @objectName = @objectName OUTPUT, @schemaName = @schemaName OUTPUT -- get index information SET @sql = ' SELECT @indexName = [name], @disabled = is_disabled, @allowPageLocks = allow_page_locks FROM ' + QUOTENAME(@dbName) + '.sys.indexes WHERE [object_id] = @objectId AND index_id = @indexId' SET @parmDef = N' @objectId int, @indexId int, @indexName sysname OUTPUT, @disabled bit OUTPUT, @allowPageLocks bit OUTPUT' EXEC sp_executesql @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, @indexName = @indexName OUTPUT, @disabled = @disabled OUTPUT, @allowPageLocks = @allowPageLocks OUTPUT SET @lobData = 0 -- for clustered indexes, check for columns in the table that use a LOB data type IF @indexType = 'CLUSTERED INDEX' BEGIN -- CHARACTER_MAXIMUM_LENGTH column will equal -1 for max size or xml SET @sql = ' SELECT @lobData = 1 FROM ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c WHERE TABLE_SCHEMA = @schemaName AND TABLE_NAME = @objectName AND (DATA_TYPE IN (''text'', ''ntext'', ''image'') OR CHARACTER_MAXIMUM_LENGTH = -1)' SET @parmDef = N'@schemaName sysname, @objectName sysname, @lobData bit OUTPUT' EXEC sp_executesql @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, @lobData = @lobData OUTPUT END -- for non-clustered indexes, check for LOB data type in the included columns ELSE IF @indexType = 'NONCLUSTERED INDEX' BEGIN SET @sql = ' SELECT @lobData = 1 FROM ' + QUOTENAME(@dbName) + '.sys.indexes i JOIN ' + QUOTENAME(@dbName) + '.sys.index_columns ic ON i.object_id = ic.object_id JOIN ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c ON ic.column_id = c.ORDINAL_POSITION WHERE c.TABLE_SCHEMA = @schemaName AND c.TABLE_NAME = @objectName AND i.name = @indexName AND ic.is_included_column = 1 AND (c.DATA_TYPE IN (''text'', ''ntext'', ''image'') OR c.CHARACTER_MAXIMUM_LENGTH = -1)' SET @parmDef = N'@schemaName sysname, @objectName sysname, @indexName sysname, @lobData bit OUTPUT' EXEC sp_executesql @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, @indexName = @indexName, @lobData = @lobData OUTPUT END -- get partition information for the index SET @sql = ' SELECT @partitionCount = COUNT(*) FROM ' + QUOTENAME(@dbName) + '.sys.partitions WHERE [object_id] = @objectId AND index_id = @indexId' SET @parmDef = N'@objectId int, @indexId int, @partitionCount int OUTPUT' EXEC sp_executesql @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, @partitionCount = @partitionCount OUTPUT -- Developer and Enterprise have the ONLINE = ON option for REBUILD. -- Indexes, including indexes on global temp tables, can be rebuilt online with the following exceptions: -- disabled indexes, XML indexes, indexes on local temp tables, partitioned indexes, -- clustered indexes if the underlying table contains LOB data types (text, ntext, image, varchar(max), -- nvarchar(max), varbinary(max) or xml), and -- nonclustered indexes that are defined with LOB data type columns. -- When reoganizing and page locks is disabled for the index, we'll switch to rebuild later on, -- so we need to get setup with the proper online option. IF @edition = 3 AND (@defragType = 'REBUILD' OR (@defragType = 'REORGANIZE' AND @allowPageLocks = 0)) BEGIN SET @online = CASE WHEN @indexType IN ('XML INDEX', 'PRIMARY XML INDEX') THEN 0 WHEN @indexType = 'NONCLUSTERED INDEX' AND @allocUnitType = 'LOB_DATA' THEN 0 WHEN @lobData = 1 THEN 0 WHEN @disabled = 1 THEN 0 WHEN @partitionCount > 1 THEN 0 ELSE 1 END END ELSE SET @online = 0 -- build the ALTER INDEX statement SET @sql = 'ALTER INDEX ' + QUOTENAME(@indexName) + ' ON ' + QUOTENAME(@dbName) + '.' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@objectName) + CASE WHEN @defragType = ' REORGANIZE' AND @allowPageLocks = 0 THEN ' REBUILD' ELSE ' ' + @defragType END -- WITH options IF @online = 1 OR @sortInTempdb = 1 BEGIN SET @sql = @sql + ' WITH (' + CASE WHEN @online = 1 AND @sortInTempdb = 1 THEN 'ONLINE = ON, SORT_IN_TEMPDB = ON' WHEN @online = 1 AND @sortInTempdb = 0 THEN 'ONLINE = ON' WHEN @online = 0 AND @sortInTempdb = 1 THEN 'SORT_IN_TEMPDB = ON' END + ')' END IF @partitionCount > 1 AND @disabled = 0 AND @indexType <> 'XML INDEX' SET @sql = @sql + ' PARTITION = ' + CAST(@partitionNumber AS varchar(10)) -- run the ALTER INDEX statement EXEC (@SQL) -- log some information into a history table IF @logHistory = 1 INSERT INTO DefragmentIndexes (DatabaseName, SchemaName, TableName, IndexName, DefragmentDate, PercentFragmented) VALUES(@dbName, @schemaName, @objectName, @indexName, GETDATE(), @fragPercent) SELECT @i = MIN(FragIndexId) FROM #FragIndex WHERE FragIndexId > @i SELECT @objectId = ObjectId, @indexId = IndexId, @fragPercent = FragPercent, @partitionNumber = PartitionNumber, @indexType = IndexType, @allocUnitType = AllocUnitType FROM #FragIndex WHERE FragIndexId = @i END GO

The original post is here:

http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx

Besides index recovery and defragmentation, you can simply delete or get rid of data. If you have int / bigints as PK, this will allow you to reload PK with DBCC CHECKIDENT(tablename, value) .

You can use ALTER INDEX ALL ON MyTable REBUILD to rebuild indexes in your table.

+3

Jonh Apr 12 '11 at 15:30

source share

I would have a script database including tables, indexes, triggers and stored procedures. Then create a new empty database with this script. Now you can add data to the database as needed for your integration tests.

You can use tools like http://code.google.com/p/ndbunit/ to load test data so that the data is part of the test and will be deleted once the test ends. I would also run the tests in SQL Express on the local developer computer, so the tests do not fail if several developers are working at the same time.

+2

JTWebMan Apr 12 '11 at 15:58

source share

Check http://jailer.sourceforge.net/ out. This is a tool that can extract a subset of data from a database, storing it sequentially. I did not use it myself, but I had in mind.

+2

CarlosZ Apr 12 '11 at 20:10

source share

Under no circumstances would I allow developers to evolve against a smaller database when it has to work on one such size. You will have problems that only arise when it comes to the product, and this is a stupid idea. Queries that work well on small datasets are not those queries that work well on large datasets. The time it takes to read queries that cannot be run in production is one of the reasons it’s foolish to let developers work with a small data set.

+1

Hlgem Apr 12 '11 at 16:45

source share

In my opinion, subsets of "real data" should not be used for unit tests. Unit tests must be independent of the original contents of the database. They must create the data necessary for a particular test, perform the test, and then delete the data. Alternatively, the entire test should be in a transaction that returns at the end.

If you do not, your tests will fail if someone decides to delete or modify the data on which they depend, and you will spend a huge amount of time trying to figure out why your tests suddenly crashed.

For a QA system or integration, you should be able to create a subset of your data based on your knowledge of table relationships.

+1

John saunders Apr 12 '11 at 19:54

source share

Jay · Accepted Answer · 2011-04-12T16:19:19+0000

If your tables are made up of unrelated data, you can simply select X random records from each table. I suppose the problem is that these tables are NOT disconnected, so if, say, table A includes a foreign key link in table B, and you just pulled out 10% of the records from table A and 10% of the records from table B, you will have a whole bunch of invalid links from A to B.

I do not know a general solution to this problem. It depends on the exact structure of your database. I often find that my databases consist of a small number of “central” tables that have many links from other tables. That is, I usually find that I have, say, an order table, and then there is an Order Line table that points to Order, and a Customer table that Order points to, and a delivery table that points to Order, or maybe , on Order Line, etc., but everything seems to be centered around the Order. In this case, you can randomly select a certain number of order records, then find all Clients for these orders, all order lines for these orders, etc. I usually also have several “code search” tables, for example, a list of all codes “order status”, another list of all codes “customer type”, etc. They are usually small, so I just copy them completely.

If your database is bigger ... unrelated ... than that, i.e. if she doesn’t have clear centers and there is a maze of interconnections, it can be much more complicated. I think the same principle will apply. Select SOME starting point, select some records, then write down all the records associated with these records, etc.

Reducing the database to size

More articles: