This is a strange request that has been submitted as "critical" feature of
an application, and I am not sure of how to approach the problem. The
application has a simple, perfectly good Full Text Search capability (it is
searching on "resumes" and "letters". Now we have been asked to allow the
user to specify the minimum number of "hits" for a particular search
criteria...
For example:
CREATE TABLE [dbo].[EmployeeResume]
(
[EmployeeId] [int] NOT NULL,
EmployeeId [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_EmployeeResume] PRIMARY KEY CLUSTERED
(
[EmployeeId] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
go
INSERT INTO EmployeeResume
VALUES(100, 'Resume containing words such as POWERPOINT')
INSERT INTO EmployeeResume
VALUES(200, 'Resume containing words such as POWERPOINT EXCEL ACCESS')
INSERT INTO EmployeeResume
VALUES(300, 'Resume containing words such as POWERPOINT EXCEL EXCEL ACCESS')
For sake of simplicity I omit all the code related to the creation of Full
Text catalog on the table and column "Resume"...
Now, the statement...
SELECT EmployeeId FROM EmployeeResume WHERE CONTAINS (Resume, 'EXCEL')
would return 2 rows (EmployeeID 200, 300)
Now we want to allow the user to specify that the query should return only
rows where the word "EXCEL" appears a minimum of 2 times -- in other words,
the query above should only return EmployeeId 300 because the "count" of the
word "EXCEL" is >= 2.
Strange, I know. Anyone has a suggestion?
Thank you so much in advance.
Giorgio
The feature you are looking for is called the hit count and ships with most
of Microsoft's search products but not SQL FTS. What you need to do is to
run a query on your results set that does something like this (untested).
SELECT EmployeeId,
hitcount=(len(resume)-len(replace(Resume,'Excel','')) )/len('excel')FROM
EmployeeResume WHERE CONTAINS (Resume, 'EXCEL')
and (len(resume)-len(replace(Resume,'Excel',''))/len('excel')
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Giorgio Vidali" <gvidali@.earthlink.net> wrote in message
news:%23aHTotT$GHA.396@.TK2MSFTNGP05.phx.gbl...
> This is a strange request that has been submitted as "critical" feature of
> an application, and I am not sure of how to approach the problem. The
> application has a simple, perfectly good Full Text Search capability (it
> is searching on "resumes" and "letters". Now we have been asked to allow
> the user to specify the minimum number of "hits" for a particular search
> criteria...
>
> For example:
>
> CREATE TABLE [dbo].[EmployeeResume]
> (
> [EmployeeId] [int] NOT NULL,
> EmployeeId [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
> CONSTRAINT [PK_EmployeeResume] PRIMARY KEY CLUSTERED
> (
> [EmployeeId] ASC
> )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
> go
>
> INSERT INTO EmployeeResume
> VALUES(100, 'Resume containing words such as POWERPOINT')
>
> INSERT INTO EmployeeResume
> VALUES(200, 'Resume containing words such as POWERPOINT EXCEL ACCESS')
>
> INSERT INTO EmployeeResume
> VALUES(300, 'Resume containing words such as POWERPOINT EXCEL EXCEL
> ACCESS')
>
> For sake of simplicity I omit all the code related to the creation of Full
> Text catalog on the table and column "Resume"...
>
> Now, the statement...
>
> SELECT EmployeeId FROM EmployeeResume WHERE CONTAINS (Resume, 'EXCEL')
>
> would return 2 rows (EmployeeID 200, 300)
>
> Now we want to allow the user to specify that the query should return only
> rows where the word "EXCEL" appears a minimum of 2 times -- in other
> words, the query above should only return EmployeeId 300 because the
> "count" of the word "EXCEL" is >= 2.
>
> Strange, I know. Anyone has a suggestion?
>
> Thank you so much in advance.
>
> Giorgio
>
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment