Tuesday, March 20, 2012

adding 140,000 records requires index rebuild. make sense?

hello all,
i have a table thats getting about 140,000 new records a day from a program
reading data from a socket (about 300 every 3 minutes). in addition to the
primary key it has an 8 byte date/time field thats indexed. there are also
two full-text indexed columns with change tracking and update in background
enabled.
as we started to load the database we noticed after some hours 'insert'
operations began to fail with a timeout and queries began timing out.
When i rebuild the index based on date/time it cleared up and queries were
fast again. This cycle has been going on for a couple of ws. I haven't
had to do anything to the full-text portion, these problem orbit around the
date/time index.
Does this make sense? It really sounds like the indexes have exceeded some
internal resource and are being added to in an inefficient manner causing
the timeout.
is there a body of knowledge about defining indexes for tables that grow
like this that i should be aware of?
thanks,
john mottJohn,
I have not heard of any issues like this. Maybe a blocking issue?
That said...you might consider decreasing the fillfactor for the index (note
will require more space) to see if that helps. Also, you might consider
scheduling a rebuild of the index as a nightly job (off peak hours or after
hours).
HTH
Jerry
"John Mott" <johnmott59@.hotmail.com> wrote in message
news:OzWjq1QxFHA.2652@.TK2MSFTNGP14.phx.gbl...
> hello all,
> i have a table thats getting about 140,000 new records a day from a
> program
> reading data from a socket (about 300 every 3 minutes). in addition to the
> primary key it has an 8 byte date/time field thats indexed. there are also
> two full-text indexed columns with change tracking and update in
> background
> enabled.
> as we started to load the database we noticed after some hours 'insert'
> operations began to fail with a timeout and queries began timing out.
> When i rebuild the index based on date/time it cleared up and queries were
> fast again. This cycle has been going on for a couple of ws. I haven't
> had to do anything to the full-text portion, these problem orbit around
> the
> date/time index.
> Does this make sense? It really sounds like the indexes have exceeded some
> internal resource and are being added to in an inefficient manner causing
> the timeout.
> is there a body of knowledge about defining indexes for tables that grow
> like this that i should be aware of?
> thanks,
> john mott
>
>|||Is the table more empty than full? Perhaps the statistics are out of whack.
140,000 rows is a considerable number, but it is not a tremendous amount.
Then again, that statement is relative to your environment. If your table
only has 300,000 rows you are adding 50% to the table.
How large is your database? How much room for growth do you have? SQL
Server might be expanding (growing) the database size during the mass
insert. This could cause reduced performance which would lead to query
timeouts.
How are you adding the rows? Are you performing 140,000 INSERT INTO
statements? Can you look into using BCP/Bulk Insert/DTS to get the data in
to the database? You might find that this option is faster.
Keith
"John Mott" <johnmott59@.hotmail.com> wrote in message
news:OzWjq1QxFHA.2652@.TK2MSFTNGP14.phx.gbl...
> hello all,
> i have a table thats getting about 140,000 new records a day from a
> program
> reading data from a socket (about 300 every 3 minutes). in addition to the
> primary key it has an 8 byte date/time field thats indexed. there are also
> two full-text indexed columns with change tracking and update in
> background
> enabled.
> as we started to load the database we noticed after some hours 'insert'
> operations began to fail with a timeout and queries began timing out.
> When i rebuild the index based on date/time it cleared up and queries were
> fast again. This cycle has been going on for a couple of ws. I haven't
> had to do anything to the full-text portion, these problem orbit around
> the
> date/time index.
> Does this make sense? It really sounds like the indexes have exceeded some
> internal resource and are being added to in an inefficient manner causing
> the timeout.
> is there a body of knowledge about defining indexes for tables that grow
> like this that i should be aware of?
> thanks,
> john mott
>
>|||Hey John,
Is the date-time index a clustered index? If not, do you have one, and
if so, what datatype is the index associated with? Tables of this size
really ned a clustered index to help in the retrieval process; a
clustered index should be built on a monotonically increasing value
(like a datetime, assuming that the INSERT is inserting data in a
relatively sequential manner). A common mistake is to use a
uniqueidentifier for a primary key, and using that as a clustered
index, which will lead to fragmentation (because the data is being
inserted out of order).
Sorry; not feeling well today, so my answers may be less than cogent.
However, your first step is to determine if your clustered index is on
thewrong column.
HTH,
Stu|||There is a good body of knowledge on www.msdn.com
Here is what a few minutes of searching churned up:
Try to discover the root cause of why your queries are timing out:
INF: Understanding and Resolving SQL Server 7.0 or 2000 Blocking Problems
http://support.microsoft.com/defaul...kb;EN-US;224453
Planning and Creating Indexes
http://www.microsoft.com/technet/pr...s/c0618260.mspx
DBCC SHOWCONTIG will reveal if index fragmentation is an issue and DBCC
INDEXDEFRAG can be periodically issued to help minimize it. Defragmenting is
faster than re-indexing, and based on your description, this application
sounds like it is running in an environment where you don't want the system
down for extended periods of time.
Microsoft SQL Server 2000 Index Defragmentation Best Practices
http://www.microsoft.com/technet/pr...n/ss2kidbp.mspx
Rather than performing 300 inserts per minute, you can initally write the
records to a tab delimited text file and then bulk insert perhaps every 5
minutes
SQL Server 2000 Incremental Bulk Load Case Study
http://www.microsoft.com/technet/pr...calability.mspx
"John Mott" <johnmott59@.hotmail.com> wrote in message
news:OzWjq1QxFHA.2652@.TK2MSFTNGP14.phx.gbl...
> hello all,
> i have a table thats getting about 140,000 new records a day from a
> program
> reading data from a socket (about 300 every 3 minutes). in addition to the
> primary key it has an 8 byte date/time field thats indexed. there are also
> two full-text indexed columns with change tracking and update in
> background
> enabled.
> as we started to load the database we noticed after some hours 'insert'
> operations began to fail with a timeout and queries began timing out.
> When i rebuild the index based on date/time it cleared up and queries were
> fast again. This cycle has been going on for a couple of ws. I haven't
> had to do anything to the full-text portion, these problem orbit around
> the
> date/time index.
> Does this make sense? It really sounds like the indexes have exceeded some
> internal resource and are being added to in an inefficient manner causing
> the timeout.
> is there a body of knowledge about defining indexes for tables that grow
> like this that i should be aware of?
> thanks,
> john mott
>
>|||Thank you for your response.
"Keith Kratochvil" <sqlguy.back2u@.comcast.net> wrote in message
news:eit6m%23QxFHA.3740@.tk2msftngp13.phx.gbl...
> Is the table more empty than full? Perhaps the statistics are out of
whack.
> 140,000 rows is a considerable number, but it is not a tremendous amount.
> Then again, that statement is relative to your environment. If your table
> only has 300,000 rows you are adding 50% to the table.
At this point there about 2,300,000 records.

> How large is your database? How much room for growth do you have? SQL
> Server might be expanding (growing) the database size during the mass
> insert. This could cause reduced performance which would lead to query
> timeouts.
Currently its about 23 Meg, but its fair to say that when i created it it
created with a default size with instructions to grow at 10% at a time.
Should i re-allocate this? would that lead to bad organization?

> How are you adding the rows? Are you performing 140,000 INSERT INTO
> statements? Can you look into using BCP/Bulk Insert/DTS to get the data
in
> to the database? You might find that this option is faster.
i am indeed doing a set of inserts, but the speed of insertion is
satisfactory. I'd heard about BCP before; would that have a different impact
on how the indexes were managed?

> --
> Keith
>|||Thank you for responding.
"Stu" <stuart.ainsworth@.gmail.com> wrote in message
news:1128010124.506187.10780@.g49g2000cwa.googlegroups.com...
> Hey John,
> Is the date-time index a clustered index? If not, do you have one, and
> if so, what datatype is the index associated with? Tables of this size
> really ned a clustered index to help in the retrieval process; a
> clustered index should be built on a monotonically increasing value
> (like a datetime, assuming that the INSERT is inserting data in a
> relatively sequential manner). A common mistake is to use a
> uniqueidentifier for a primary key, and using that as a clustered
> index, which will lead to fragmentation (because the data is being
> inserted out of order).
The date/time is not clustered, and i did exactly what you indicated was a
common mistake, created a generic primary key that is clustered. Since the
date/time is really how the data is retrieved (always searching last 'n'
units of time) it should be 'appending' the data as it add it. Of course, in
this case newer data goes 'at the end' so its not dissimilar to having it be
that way in a large sense; there is a correlation between the order as
defined by the normal primary key and the order defined by the date/time. I
can see, however, that i have essentially a wasted key since i really only
care about the date/time, i've used two keys where one would have done.|||Thank you for responding. These look like the kind of meaty stuff i should
have read before hand :-) I've always been able to treat the SQL Server as a
black box but this is the largest database app i've created and i'm flying
solo on the tuning and configuration front...
john
"JT" <someone@.microsoft.com> wrote in message
news:u7y2ImRxFHA.1028@.TK2MSFTNGP12.phx.gbl...
> There is a good body of knowledge on www.msdn.com
> Here is what a few minutes of searching churned up:
> Try to discover the root cause of why your queries are timing out:
> INF: Understanding and Resolving SQL Server 7.0 or 2000 Blocking Problems
> http://support.microsoft.com/defaul...kb;EN-US;224453
> Planning and Creating Indexes
> http://www.microsoft.com/technet/pr...s/c0618260.mspx
> DBCC SHOWCONTIG will reveal if index fragmentation is an issue and DBCC
> INDEXDEFRAG can be periodically issued to help minimize it. Defragmenting
is
> faster than re-indexing, and based on your description, this application
> sounds like it is running in an environment where you don't want the
system
> down for extended periods of time.
> Microsoft SQL Server 2000 Index Defragmentation Best Practices
>
http://www.microsoft.com/technet/pr...n/ss2kidbp.mspx
> Rather than performing 300 inserts per minute, you can initally write the
> records to a tab delimited text file and then bulk insert perhaps every 5
> minutes
> SQL Server 2000 Incremental Bulk Load Case Study
>
http://www.microsoft.com/technet/pr...ncbulkload.mspxed">
> With 140k inserts per day, I don't know what the total row count is, but
> consider vertical paritioning of rows into multiple tables and perhaps
> partitioned views.
> Scalability and Very Large Database (VLDB) Resources
> http://www.microsoft.com/sql/techin...calability.mspx
>
> "John Mott" <johnmott59@.hotmail.com> wrote in message
> news:OzWjq1QxFHA.2652@.TK2MSFTNGP14.phx.gbl...
the
also
were
haven't
some
causing
>|||Sorry John, let me try to clarify (hang on, it's gonna be rough):
First, a key is not an index; a key is used to identify and enforce
relational validity between tables. The primary key uniquely
identifies a row, and can be used as a foreign key to enforce a
relationship with another table. When you create a primary key on a
table, SQL Server will create an index associated with that key; by
default, that index is clustered (if there is not already a clustered
index on the table).
So what's an index? An index is simply a collection of pointers to the
location of a row within a table. Indexes are used to improve
retreival time, because the SQL Server optimizer should use the index
to quickly identify rows that meet the query requirements. Indexes may
be clustered or nonclustered; only one clustered index can exist on a
table. The cluster status refers to the physical ordering of data
within a page; a simple way of understanding it is that a clustered
index sorts the data by the indexed value and physically remembers that
sort (not really, but the analogy is close).
On a large table with a lot of inserts, you should always use a
clustered index on a monotonically increasing value (like an IDENTITY
integer or a date/time value). This will greatly reduce performance
problems because as data comes in, it always gets appended to the end
of the table, rather than having to be inserted somewhere in the
middle. For example, if you have a clustered index on a LastName
column, and you already have some data like so:
LastName
Ainsworth
Smith
Thomas
West
and you want to insert a row with the name Bice in it, SQL Server has
to split the page holding the data to insert that one record.
Continued page splits can lead to fragmentation. If your clustered
index is on a increasing value (like a datetime variable), fewer page
splits occur.
So, using your scenario, I would create a PRIMARY KEY constraint on the
unique row identifier, and created a clustered index on the datetime
column. Just to be clear, you can have as many nonclustered indexes on
a table as you need, but you can only have one clustered index.
Hope that clarifies.
Stu

No comments:

Post a Comment