Help identify forum spammers through an SQL query?

I would like to have a simple query that I can run against the database in order to return abnoralities to the threshold of time that users post on our forum. If I have the following database structure:

ThreadId | UserId | PostAuthor | PostDate | 1 1000 Spammer 2010-11-14 02:52:50.093 2 1000 Spammer 2010-11-14 02:53:06.893 3 1000 Spammer 2010-11-14 02:53:22.130 4 1000 Spammer 2010-11-14 02:53:37.073 5 2000 RealUser 2010-11-14 02:53:52.383 6 1000 Spammer 2010-11-14 02:54:07.430 

I would like to set a threshold, for example, to say that if 3 messages from the same user fall within 1 minute, the poster can send out forums. In turn, I would like to return a custom "Spammer" in the request with the number of messages made during the allotted time.

In the above example, Spammer sent 4 messages over a period of 1 minute, so the query result might look like this:

 UserId | PostAuthor | PostCount | DateStart | DateEnd 1000 Spammer 4 2010-11-14 02:52:50.093 2010-11-14 02:53:37.073 

Any suggestions in the format of the returned data are welcome. The format does not matter to me, as it is correct to identify forum violators.

+6
sql-server tsql
source share
5 answers

The output is not everything you need, but this is the beginning:

(Reword: give me all messages for which there are 2 or more other messages after it, but within one minute)

 Select Spammer = PostAuthor, NumberOfPosts = (Select Count(*) From Posts As AllPosts Where AllPosts.UserID = Posts.UserID) From Posts Where 2 <= (Select Count(*) From Posts As OtherPosts Where OtherPosts.UserID = Posts.UserID And OtherPosts.PostDate > Posts.PostDate And OtherPosts.PostDate < DateAdd(Minute, 1, Posts.PostDate)) 
+1
source share

Self-join solution:

 Select T1.UserId, T1.PostAuthor, T1.PostDate, Max(T2.PostDate), Count(*) from Posts T1 INNER JOIN Posts T2 ON T1.UserId = T2.UserId and T2.PostDate between T1.PostDate and dateadd(minute, 1, T1.PostDate) group by T1.UserId, T1.PostAuthor, T1.PostDate having count(*) >= 3 
+1
source share

I tried this myself and came up with this (I think it gives almost the same result as Stu, although the number of posts). This identifies users who have 3 messages in 1 minute (therefore, in the case of 5 messages, he repeats the user 3 times)

 DECLARE @threshold INT; SET @threshold = 3; ;WITH postCTE as ( SELECT Userid, PostAuthor, PostDate, RowNumber = ROW_NUMBER() OVER (PARTITION by UserId ORDER BY PostDate ASC) FROM Posts ) SELECT p1.UserId, p1.PostAuthor, p1.PostDate AS StartTime, p2.PostDate AS EndTime FROM postCTE p1 JOIN postCTE p2 ON p1.UserId = p2.UserId AND p1.Rownumber = p2.RowNumber - (@threshold - 1) WHERE DATEDIFF(MINUTE,p1.PostDate,p2.PostDate) <= 1 

Returns the next result set

 UserId PostAuthor StartTime EndTime 1000 Spammer 2010-11-14 02:52:50.093 2010-11-14 02:53:22.130 1000 Spammer 2010-11-14 02:53:06.893 2010-11-14 02:53:37.073 1000 Spammer 2010-11-14 02:53:22.130 2010-11-14 02:54:07.430 
0
source share

I believe that Sadhir is on the right track. I have a few corrections to the script. The first amendment includes the use of DATADIFF minute units. Using the minute will incorrectly return the four entries from George's example. I changed the minute to second. I also formatted the output to show the number of messages recorded in a minute by calculating the difference between the rows in the CTE. While George did not request this, I added a parameter to control the number of days to look back at the table, since I did not think that someone would want to do the whole table every time.

 DECLARE @threshold INT; SET @threshold = 3; DECLARE @lookbackdays int; SET @lookbackdays = 2; ;WITH postCTE as ( SELECT Userid, PostAuthor, PostDate, RowNumber = ROW_NUMBER() OVER (ORDER BY UserId,PostDate ASC) FROM Post2Forum WHERE PostDate > GETDATE() - @lookbackdays ) SELECT p1.PostAuthor AS [PostAuthor], p2.RowNumber - p1.RowNumber +1 AS [PostCount], p1.UserId, p1.PostDate AS [DateStart], p2.PostDate AS [DateEnd] FROM postCTE p1 INNER JOIN postCTE p2 ON p1.UserId = p2.UserId AND p1.Rownumber = p2.RowNumber - (@threshold ) WHERE DATEDIFF(second,p1.PostDate,p2.PostDate) <= 60 

The result of the query in my testing is:

 PostAuthor PostCount UserId DateStart DateEnd Spammer 4 1000 2010-11-14 02:52:50.093 2010-11-14 02:53:37.073 
0
source share

Not quite what you want, but will serve the purpose more or less ...

 SELECT UserId, PostAuthor, COUNT(*) AS [PostCount], YEAR(PostDate), MONTH(PostDate), DAY(PostDate), DATEPART(hh, PostDate), DATEPART(mi, PostDate) FROM LogTable GROUP BY UserId, PostAuthor, YEAR(PostDate), MONTH(PostDate), DAY(PostDate), DATEPART(hh, PostDate), DATEPART(mi, PostDate) HAVING COUNT(*) >= 3 ORDER BY YEAR(PostDate), MONTH(PostDate), DAY(PostDate), DATEPART(hh, PostDate), DATEPART(mi, PostDate) 
-one
source share

All Articles