Wednesday, March 7, 2012

routine to archive records

Howdy,
I'm working on a routine to archive records. It is
the basic copy a massive amount of records from table
a to table b and then delete the records copied from
table a. I plan to do this using a trigger for delete
on table a. Neither table has a clustered index tho
they both have some non clustered indexes.
Please look at the outline and let me know if this is
a good plan. Is there anything that I might do to
improve? I know that the trigger will affect the
delete query performance ... but I'm not sure how best
to make sure that the deleted records are copied to
the archive database.
First I'm going to create a following trigger on
the source table Active_DB.dbo.Important_Records
CREATE TRIGGER ArchiveRecord ON [dbo].[Important_Records]
FOR DELETE
AS
INSERT INTO Archive_DB.dbo.Important_Records
(...columns...)
SELECT ...columns...
FROM deleted
The actual delete routine will be something like:
USE Archive_DB
ALTER DATABASE Archive_DB
SET RECOVERY SIMPLE
--DISABLE TRANSACTION LOG BACKUP JOB
USE Active_DB
ALTER DATABASE Active_DB
SET RECOVERY SIMPLE
--DISABLE TRANSACTION LOG BACKUP JOB
DECLARE @.DayCount int
SET @.DayCount = 576 --(for example)
DECLARE @.RowsDeleted int
SET @.RowsDeleted = -1
SET ROWCOUNT 1000
WHILE @.RowsDeleted <> 0
BEGIN
DELETE
FROM dbo.Important_Records
WHERE EXISTS (SELECT OCCURRED WITH (NOLOCK) FROM
dbo.Important_Records WHERE < (getdate() - @.DayCount))
SET @.RowsDeleted = @.@.ROWCOUNT
END
END
SET ROWCOUNT 0
--sp_updatestats @.resample = 'resample'
CHECKPOINT
--ENABLE TRANSACTION LOG BACKUP JOB
ALTER DATABASE Active_DB
SET RECOVERY BULK_LOGGED
USE Archive_DB
--sp_updatestats @.resample = 'resample'
CHECKPOINT
--ENABLE TRANSACTION LOG BACKUP JOB
ALTER DATABASE Archive_DB
SET RECOVERY BULK_LOGGED
--Full Backup of Active_DB
--Full Backup of Archive_DBA faster way would be:
1. bcp the data out to a flat file
2. delete the data in batches (10000 rows each would be fine)
3. bulk insert the data in in batches
-oj
<NTuser_Man@.msn.com> wrote in message
news:1125450831.524829.115380@.z14g2000cwz.googlegroups.com...
> Howdy,
> I'm working on a routine to archive records. It is
> the basic copy a massive amount of records from table
> a to table b and then delete the records copied from
> table a. I plan to do this using a trigger for delete
> on table a. Neither table has a clustered index tho
> they both have some non clustered indexes.
> Please look at the outline and let me know if this is
> a good plan. Is there anything that I might do to
> improve? I know that the trigger will affect the
> delete query performance ... but I'm not sure how best
> to make sure that the deleted records are copied to
> the archive database.
> First I'm going to create a following trigger on
> the source table Active_DB.dbo.Important_Records
> CREATE TRIGGER ArchiveRecord ON [dbo].[Important_Records]
> FOR DELETE
> AS
> INSERT INTO Archive_DB.dbo.Important_Records
> (...columns...)
> SELECT ...columns...
> FROM deleted
> The actual delete routine will be something like:
> USE Archive_DB
> ALTER DATABASE Archive_DB
> SET RECOVERY SIMPLE
> --DISABLE TRANSACTION LOG BACKUP JOB
> USE Active_DB
> ALTER DATABASE Active_DB
> SET RECOVERY SIMPLE
> --DISABLE TRANSACTION LOG BACKUP JOB
> DECLARE @.DayCount int
> SET @.DayCount = 576 --(for example)
> DECLARE @.RowsDeleted int
> SET @.RowsDeleted = -1
> SET ROWCOUNT 1000
> WHILE @.RowsDeleted <> 0
> BEGIN
> DELETE
> FROM dbo.Important_Records
> WHERE EXISTS (SELECT OCCURRED WITH (NOLOCK) FROM
> dbo.Important_Records WHERE < (getdate() - @.DayCount))
> SET @.RowsDeleted = @.@.ROWCOUNT
> END
> END
> SET ROWCOUNT 0
> --sp_updatestats @.resample = 'resample'
> CHECKPOINT
> --ENABLE TRANSACTION LOG BACKUP JOB
> ALTER DATABASE Active_DB
> SET RECOVERY BULK_LOGGED
> USE Archive_DB
> --sp_updatestats @.resample = 'resample'
> CHECKPOINT
> --ENABLE TRANSACTION LOG BACKUP JOB
> ALTER DATABASE Archive_DB
> SET RECOVERY BULK_LOGGED
> --Full Backup of Active_DB
> --Full Backup of Archive_DB
>|||oj, thanks for the advice.
I haven't used bcp. This morning I experimented with the northwind db
to get a feel for bcp. I would like to know how can I ensure that the
records I bcp from the dbo.Important_Records are the very same records
that I delete from dbo.Important_Records? That is the concern that
leads me to want to use a trigger for delete.
oj wrote:
> A faster way would be:
> 1. bcp the data out to a flat file
> 2. delete the data in batches (10000 rows each would be fine)
> 3. bulk insert the data in in batches
> --
> -oj
>
> <NTuser_Man@.msn.com> wrote in message
> news:1125450831.524829.115380@.z14g2000cwz.googlegroups.com...|||bcp can take a query. That would be the way to extract only the desired
rows.
e.g.
bcp "select * from Northwind..Orders where OrderID<10250" queryout
"c:\Orders.txt" -T -w
-oj
<NTuser_Man@.msn.com> wrote in message
news:1125500222.595819.253710@.z14g2000cwz.googlegroups.com...
> oj, thanks for the advice.
> I haven't used bcp. This morning I experimented with the northwind db
> to get a feel for bcp. I would like to know how can I ensure that the
> records I bcp from the dbo.Important_Records are the very same records
> that I delete from dbo.Important_Records? That is the concern that
> leads me to want to use a trigger for delete.
> oj wrote:
>|||Thanks again for the input.
I have the bcp part working ... it copies data to text files and then
uploads to the destination beautifully.
But I'm having a performance issue on the delete query ( appended below
). The select statement runs very quickly and uses a clustered index
s. But when the delete action is held up by sorts on three non
clustered indexes. The sorts account for 76% of the query cost.
I'm wondering if it would make sense to drop the non clustered indexes
before running the delete. The table itself has 28 million records so
dropping and rebuilding the non clustered indexes might be quite a
chore in itself. My other thought was to modify the non clustered
indexes to include the clustered index ...
Query:
CREATE PROCEDURE dbo.DeleteImportant_Record
@.DayCount int
AS
SET NOCOUNT ON
SET ROWCOUNT 1000
WHILE 1=1
BEGIN
DELETE
FROM dbo.Important_Record
WHERE EXISTS (SELECT OCCURRED FROM dbo.Important_Record WITH
(NOLOCK) WHERE RecordDate < (getdate() - @.DayCount))
IF @.@.ROWCOUNT = 0
BREAK
END
SET ROWCOUNT 0
SET NOCOUNT OFF
GO
oj wrote:
> bcp can take a query. That would be the way to extract only the desired
> rows.
> e.g.
> bcp "select * from Northwind..Orders where OrderID<10250" queryout
> "c:\Orders.txt" -T -w|||There is really no need for the subquery. Try:
declare @.dt datetime
set @.dt=(getdate() - @.DayCount)
WHILE 1=1
BEGIN
DELETE
FROM dbo.Important_Record
WHERE RecordDate < @.dt
IF @.@.ROWCOUNT = 0
BREAK
END
-oj
<NTuser_Man@.msn.com> wrote in message
news:1125524729.117694.268840@.g49g2000cwa.googlegroups.com...
> Thanks again for the input.
> I have the bcp part working ... it copies data to text files and then
> uploads to the destination beautifully.
> But I'm having a performance issue on the delete query ( appended below
> ). The select statement runs very quickly and uses a clustered index
> s. But when the delete action is held up by sorts on three non
> clustered indexes. The sorts account for 76% of the query cost.
> I'm wondering if it would make sense to drop the non clustered indexes
> before running the delete. The table itself has 28 million records so
> dropping and rebuilding the non clustered indexes might be quite a
> chore in itself. My other thought was to modify the non clustered
> indexes to include the clustered index ...
>
> Query:
> CREATE PROCEDURE dbo.DeleteImportant_Record
> @.DayCount int
> AS
> SET NOCOUNT ON
> SET ROWCOUNT 1000
> WHILE 1=1
> BEGIN
> DELETE
> FROM dbo.Important_Record
> WHERE EXISTS (SELECT OCCURRED FROM dbo.Important_Record WITH
> (NOLOCK) WHERE RecordDate < (getdate() - @.DayCount))
> IF @.@.ROWCOUNT = 0
> BREAK
> END
> SET ROWCOUNT 0
> SET NOCOUNT OFF
>
> GO
>
> oj wrote:
>|||Hey, thanks for all the help. Now the query deletes over 10,000
records/minute. Now I have only 27188456 recods to go.
oj wrote:
> There is really no need for the subquery. Try:
> declare @.dt datetime
> set @.dt=(getdate() - @.DayCount)
> WHILE 1=1
> BEGIN
> DELETE
> FROM dbo.Important_Record
> WHERE RecordDate < @.dt
> IF @.@.ROWCOUNT = 0
> BREAK
> END
>
>
> --
> -oj
>
> <NTuser_Man@.msn.com> wrote in message
> news:1125524729.117694.268840@.g49g2000cwa.googlegroups.com...

No comments:

Post a Comment