CHECKSUM to determine record changes

I've searched the forum for uses of CHECKSUM and havent found a
satisfactory answer.
I need to know when a row changes and I dont care what it was or is, I
just need to know it changed.
Detail:
Hourly I select rows where the record as a create date or change date
of the last 3 days. I am only interested when the address or name
changes, not other columns.
Currently, I might select 30,000 rows that are new or changed, but
turns out that only 100 have address changes and 500 are new. I want
to process the 600, not the 30,000.
My main concern is if some columns change, but the resulting checksum
doesnt. Then I would have missed processing that record.
The column types that I will be tracking are:
"forenames" VARCHAR(50) NULL,
"surname" VARCHAR(51) NULL,
"ADDRESS1" VARCHAR(60) NULL,
"ADDRESS2" VARCHAR(60) NULL,
"ADDRESS4" VARCHAR(50) NULL,
"STATE" VARCHAR(10) NULL,
"ZIP" VARCHAR(20) NULL,
"email" VARCHAR(50) NULL,
"telephone" VARCHAR(10) NULL
(I left off the keys, these are the only fields that I will be using
for the checksum()).
Am I safe? I read about 32 bit CRC and some data changes would go
unnoticed, but not sure if this layout would qualifiy.

TIA
Rob
rcamarda [ Di, 22 April 2008 00:22 ] [ ID #1946804 ]

Re: CHECKSUM to determine record changes

Im using SQL Server 2005 currently patched.
rcamarda [ Di, 22 April 2008 00:33 ] [ ID #1946806 ]

Re: CHECKSUM to determine record changes

"rcamarda" <robert.a.camarda [at] gmail.com> wrote in message
news:2d88cd48-2853-4f01-b797-c5c166bd167d [at] k13g2000hse.google groups.com...
> Im using SQL Server 2005 currently patched.

CHECKSUM isn't a reliable way to detect change because it's quite common to
find different rows with the same CHECKSUM value. You could use a ROWVERSION
column instead. ROWVERSION is guaranteed to increment when the row data is
updated.

Another alternative is to use a hash. The HashBytes function will return a
secure hash of a binary value with a very high probability of uniqueness.
Duplicate hashes are theoretically possible but are incredibly unlikely to
occur unintentionally. If you are extremely paranoid then you can use two
different hashes.

--
David Portas
David Portas [ Di, 22 April 2008 08:25 ] [ ID #1947602 ]

Re: CHECKSUM to determine record changes

David Portas wrote:
>
> "rcamarda" <robert.a.camarda [at] gmail.com> wrote in message
> news:2d88cd48-2853-4f01-b797-c5c166bd167d [at] k13g2000hse.google groups.com...
> > Im using SQL Server 2005 currently patched.
>
> CHECKSUM isn't a reliable way to detect change because it's quite common to
> find different rows with the same CHECKSUM value. You could use a ROWVERSION
> column instead. ROWVERSION is guaranteed to increment when the row data is
> updated.
>
> Another alternative is to use a hash. The HashBytes function will return a
> secure hash of a binary value with a very high probability of uniqueness.
> Duplicate hashes are theoretically possible but are incredibly unlikely to
> occur unintentionally. If you are extremely paranoid then you can use two
> different hashes.
>
> --
> David Portas

I agree about the advice for use ROWVERSION.

However, CHECKSUM is also just a hash. Whether the chance of an
unnotices change is lower if you use HashBytes function instead of
CHECKSUM depends on your data.

Although adding a second hash value will lower the chance of a missed
change, it would be silly to do so. In the end you would need as many
hash bytes as there are bytes in the data.

--
Gert-Jan
Gert-Jan Strik [ Mi, 23 April 2008 18:42 ] [ ID #1948521 ]
Datenbanken » comp.databases.ms-sqlserver » CHECKSUM to determine record changes

Vorheriges Thema: How to find records with length greater than 17
Nächstes Thema: using LEN to find a first word greater than 17