Tuesday, March 20, 2012

Addicted to Stored Procedures: A time series problem. Is it possible?

It's been about two weeks since I first began using SQL Stored procedures and now I am thinking already about changing many of my plans and doing those operations in them. There is a problem I want to address and see if it is possible to accomplish it in SQL Server instead of the client C# code.
I have a time series. In simplified presentation it is a table of two columns. The first one is float, the second is DateTime. There might be quite a few rows in the table:

A1 ... DT1
A2 ... DT2
................
An.... DTn
What I need to do is to find patters among A1,A2,....An elements. They vary in magnitude in a rather random order. They constitute what some would call a Markov chain. I demonstrate what I want in a couple of real life examples that I would not have a problem to run in my C# code at all.

(1) element Ak may be only 90% of the element in the previous row Ak-1. I want to catch it. I want to get the DateTime stamp when it happened. The next step is: I want to go down the chain of rows and find the element Am which rebounds to the magnitude of the element Ak. In other words Am==Ak. I want to remember its DateTime stamp.Then I want to find all elements in the chain of rows that follow the same pattern.

How can I do it in SQL language? Is it possible? I've been thinking about creating a bunch of temporary tables with rows shifted up by one and and destroying them after a pattern has been found in the same rows of many tables.

(2) I want to collect a distribution of many such patterns and analyze then in C# code but the elementary block is the operation I described in point (1). If I could get a cue as to how to go about it I could figure out the rest, I hope.

I learn by examples, by looking at code samples. So far I haven't seen anything that fits this task. If anyone could send me in such direction it would be great. Most of the things I've seen is "SELECT FROM CUSTOMERS," etc.

A somewhat related question. Years ago I stumbled in MSN help on a set of Math routines (they could have been part of a class) that were designed specifically for analyzing patters. It is not the Math class, I have looked into it. It is something different. I cannot find them now. I did not save a bookmark back then.

Does anyone know what I am talking about: math functions for time series analysis?

Thanks.

Time Series Analysis is actually a part of Data Mining for SQL Server, I just found out, but since I have an SQLEXPRESS I probably cannot use it. Right? I have to get the full server for that?|||

OK, I've been trying to write a stored procedure with some SQL statements that conceivably could handle the task at hand.

This is what I've come up with. It is assumed that there is a table tick_M, it contains two columns: bid and dateTimed, I create a temporary table tempTable1 with three columns. In addition to bid and dateTimed there will be a RowNumber column. Thus we end up with 3 columns. Then I want to create yet another table from temporary tempTable1 which will consist of all the rows of the first table starting from row # 2. Then I want to make a LEFT INNER JOIN ON rownumber1 = rownumber2.

This is the code.

WITH tempTable1 AS
( SELECT ROW_NUMBER() OVER (ORDER BY dateTimed) AS 'RowNumber1',
bid OVER AS 'bid1',
dateTimed OVER AS 'dateTimed1'
FROM tick_M )
GO
WITH tempTable2 AS
( SELECT ROW_NUMBER() OVER (ORDER BY dateTimed1) AS 'RowNumber2',
bid1 OVER AS bid2,
dateTimed1 OVER AS dateTimed2
FROM tempTable1 WHERE RowNumber1 > 1)
GO
SELECT tempTable1.bid1, tempTable1.dateTimed1, tempTable1.RowNumber1,
tempTable2.bid2, tempTable2.dateTimed2
FROM tempTable1 AS Result
LEFT OUTER JOIN tempTable2 ON tempTable1.RowNumber1 =
tempTable2.RowNumber2
GO

The trouble is I get compilation errors. Could anyone help me here?

Thanks.

|||

Alex,

I neither understood nor thoroughly checked your code however at first glance I see some syntax errors. Remove "go" spread in your SQL. CTEs do not have a go after them. ie:

with tempTable1 as ( ... ),
tempTable2 as ( ... ) ...

|||

CetinBasoz wrote:

Alex,

I neither understood nor thoroughly checked your code however at first glance I see some syntax errors. Remove "go" spread in your SQL. CTEs do not have a go after them. ie:

with tempTable1 as ( ... ),
tempTable2 as ( ... ) ...

Cetin hi,

It gives me an error even before a GO. It says "Incorrect syntax near OVER" pointing to the first OVER. I removed the GOs.

Thanks for your help.

|||

Yes I'd error too:) Why do you have multiple over there (you might - but then you should also have an partiotion and/or order by clause to use with over).

Anyway Alex,

I think you should rephrase your original question with sampling (and prefarably with ready to generate code for sample data). I didn't understand it and probably no one did.

Just cleared code a bit w/o checking extensively.
WITH tempTable1 AS

( SELECT ROW_NUMBER() OVER (ORDER BY dateTimed) AS 'RowNumber1',

bid AS 'bid1',

dateTimed AS 'dateTimed1'

FROM tick_M )

WITH tempTable2 AS

( SELECT ROW_NUMBER() OVER (ORDER BY dateTimed1) AS 'RowNumber2',

bid1 AS bid2,

dateTimed1 AS dateTimed2

FROM tempTable1 WHERE RowNumber1 > 1)

SELECT tempTable1.bid1, tempTable1.dateTimed1, tempTable1.RowNumber1,

tempTable2.bid2, tempTable2.dateTimed2

FROM tempTable1 AS Result

LEFT OUTER JOIN tempTable2 ON tempTable1.RowNumber1 =

tempTable2.RowNumber2

|||

CetinBasoz wrote:

Yes I'd error too:) Why do you have multiple over there (you might - but then you should also have an partiotion and/or order by clause to use with over).

Anyway Alex,

I think you should rephrase your original question with sampling (and prefarably with ready to generate code for sample data). I didn't understand it and probably no one did.

Cetin hi,

Thank you for the code. Unfortnately it did not comply. I reworked the code entirely and came up with something that did:

USE hist_Tick
GO
DROP TABLE #tempTable_1
GO
SELECT ROW_NUMBER() OVER (ORDER BY dateTimed) AS 'RowNumber1',
bid AS 'bid1',
dateTimed AS 'dateTimed1'
INTO #tempTable_1
FROM dbo.tick_MSFT
GO
DROP TABLE #tempTable_2
SELECT ROW_NUMBER() OVER (ORDER BY dateTimed1) AS 'RowNumber2',
bid1 AS 'bid2',
dateTimed1 AS 'dateTimed2'
INTO #tempTable_2
FROM #tempTable_1
WHERE RowNumber1 > 1
GO
DROP TABLE #tempTable_3
GO
SELECT DISTINCT bid1, dateTimed1, RowNumber1, bid2, dateTimed2
INTO #tempTable_3
FROM #tempTable_1, #tempTable_2
WHERE RowNumber1 = RowNumber2
ORDER BY dateTimed1
GO
SELECT * FROM #tempTable_3
GO

The idea was to create a JOIN table with values from two original tables in such a way that the second table has all rows (values) shifted by one row forward (down). The combination table naturally should be one row shorter than the original tables. I got displayed #tempTable_3. It looks perfect.

Thanks.

|||

I see what you mean now:) It could be done even with VFP SQL using a single query joining the table with itself :) Would this do?

select
t1.bid as bid1, t1.datetimed as datetimed1, t2.bid as bid2, t2.datetimed as datetimed2
from timedSeries t1 left join timedSeries t2 on t1.DateTimed < t2.dateTimed
where t2.DateTimed = (select min(datetimed) from timedSeries where dateTimed > t1.dateTimed) or t2.dateTimed is null

Or this one with CTE:

with tSeries as ( select row_number() over (order by dateTimed) as rowNum, bid, datetimed from timedSeries )
select
t1.bid as bid1, t1.datetimed as datetimed1, t2.bid as bid2, t2.datetimed as datetimed2
from tSeries t1 left join tSeries t2 on t1.rowNum = t2.rowNum - 1

|||

Cetin hi,

Thank you very much. I switched to SQLEXPRESS because I could not handle a simple problem in C# dealing with VFP: I could not figure out how to write into the tables. I could read OK, could write into VFP tables, update them using XML but could not write directly. The mapping of columns killed me. Now I do not regret that I switched to SQL Server. I think, overall, it will be a more appropriate server for me.

But I will look into your code. I am also discovering additional commands in Transact-SQL which possibly will be even more helpful for me in this task.

Incidentally, I ran into a problem dropping the temporary tables in that query that I showed. When the query runs the first time DROP TABLE commmand causes an error. Conditional statement IF OBJECT_ID ('#tempTable_1') IS NOT NULL before it did not do anything. It appears SQL server ignores it during execution.

Thanks.

|||

Writing to VFP tables are straight forward but I didn't mean why you chose SQLEXPRESS:) I too like MSSQL and trying to learn it in depth. I meant that SQL would even work within VFP and supported since years (remember VFP supports much less SQL syntax than MSSQL does).

I don't think you need to drop a temp table there and also I don't think you need a temp table.

|||

Well, in this case, I am done with that stored procedure (DROPs aren't needed is a good news). I am actually having second thoughts on that. I think I can do FETCH row in a loop with cursors. I did not realize it until today. Now I know. It is all I need. I will rectify my knowledge in that area, write a few queries and, hopefully, it will take off. I am especially pleased that a bulk of calculations could be done at the database.

Now, with VFP I still do not understand how I can use it in here since now all my calculations, data retreival from a network server is done in C#. I use MSSQL for data storage and hopefully for most of analysis of historical data. Some analysis will be done in C# but that will be real time data analysis. I need to interact with the SQL server almost all the time but I could not easy interact with VFP database--it was all onesided. Although I do have another project still unfinished. I may use VFP database for the other one all the way. That project is also in C# but over there the data flow is slow and interactive and the volume is small.

Thanks.

|||

If data flow is slow then you don't need VFP:)

We are having a communication problem here. I didn't mean "do it in VFP". I meant those 2 SQL code would work better than your temptable approach. Try them and choose one.

No comments:

Post a Comment