Should SQL format the output or just retrieve the raw data?

Question

Should SQL format the output or just retrieve the raw data?

Generally speaking, the SQL queries that I write return unformatted data, and I leave it at the presentation level, web page or Windows application to format the data as needed. Other people I work with, including my boss, will insist that it is more efficient to have a database. I’m not sure that I am buying this and I think that even if the measuring performance were achieved due to the fact that the database did this, you have more good reasons to avoid this altogether.

For example, I will post my queries at the data access level with the goal of potentially reusing queries when possible. With this in mind, I make sure that queries can be reused more often if the data remains in their native type rather than converting the data to a string and applying formatting functions to them, for example, formatting a date column in DD Format -MMM-YYYY for display. Of course, if SQL returned dates as formatted strings, you could cancel the process to return the value back to the date data type, but this seems inconvenient due to the lack of a better word. Furtehrmore, when it comes to formatting other data, for example, the serial number of a machine consisting of a prefix, base and suffix with a dash and leading zeros removed in each subfield, you risk that you cannot correctly return to the original serial number when go in the other direction. This may be a bad example, but I hope you see the direction I'm going ...

To take it a step further, I see that people write VERY complex SQL queries because they essentially write what I would call presentation logic in SQL instead of returning simple data and then applying that presentation logic at the presentation level . In my opinion, this leads to a very complex, complicated to maintain and more fragile SQL that is less adaptable to changes.

Take the following real-life example of what I found on our system and tell me what you think. Reasonable for this approach was that it simplified the web application for page rendering, since the following 1-line fragment of the classic ADO logic in the classic ASP web application was used to process returned strings.

oRS.GetString ( , , "</td>" & vbCrLf & "<td style=""font-size:x-small"" nowrap>" ,"</td>" & vbCrLf & "</tr>" & vbCrLf & "<tr>" & vbCrLf & _ "<td style=""font-size:x-small"" nowrap>" ,"&nbsp;" ) & "</td>" & vbCrLf & "</tr>" & vbCrLf & _

Here is SQL itself. Although I appreciate the author’s ability to write complex SQL, I feel that this is a nightmare to maintain. Am I nuts? SQL returns a list of programs that currently work against our database and the status of each of them:

Since SQL did not display with CR / LF when I pasted here, I decided to put SQL on an empty personal Google site. Please feel free to comment. Thanks.

By the way, this SQL was actually built using VB Script embedded in the classic ASPIN page but not calling the stored procedure, so you have the added complexity of inline concatenations and cited markup if you know what I mean, not to mention lack of formatting. The first thing I did when I was asked to help debug SQL was to add debug.print to the SQL output and pass it through the SQL formatter I just found. Some formatting was lost when pasting at the following link:

Edit (Andomar): copied inline: (removed external link, thanks-Chad)

 SELECT Substring(Datename("dw",start_datetime),1,3) + ', ' + Cast(start_datetime AS VARCHAR) "Start Time (UTC/GMT)" ,program_name "Program Name" ,run_sequence "Run Sequence" ,CASE WHEN batchno = 0 THEN Char(160) WHEN batchno = NULL THEN Char(160) ELSE Cast(batchno AS VARCHAR) END "Batch #" /* ,Replace(Replace(detail_log ,'K:\' ,'file://servernamehere/DiskVolK/') ,'\' ,'/') "log"*/ /* */ ,Cast('<a href="GOIS_ViewLog.asp?Program_Name=' AS VARCHAR(99)) + Cast(program_name AS VARCHAR) + Cast('&Run_Sequence=' AS VARCHAR) + Cast(run_sequence AS VARCHAR) + Cast('&Page=1' AS VARCHAR) + '' + Cast('">' + CASE WHEN end_datetime >= start_datetime THEN CASE WHEN end_datetime <> 'Jan 1 1900 2:00 PM' THEN CASE WHEN (success_code = 10 OR success_code = 0) AND exit_code = 10 THEN CASE WHEN errorcount = 0 THEN 'Completed Successfully' ELSE 'Completed with Errors' END WHEN success_code = 100 AND exit_code = 10 THEN 'Completed with Errors' ELSE CASE WHEN program_name <> 'FileDepCheck' THEN 'Failed' ELSE 'File not found' END END ELSE CASE WHEN success_code = 10 AND exit_code = 0 THEN 'Failed; Entries for Input File Missing' ELSE 'Aborted' END END ELSE CASE WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240) OR ((SELECT Count(* ) FROM MASTER.dbo.sysprocesses a(nolock) INNER JOIN gcsdwdb.dbo.update_log b(nolock) ON a.program_name = b.program_name WHERE a.program_name = update_log.program_name AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0)) THEN 'Processing...' ELSE 'Aborted without end date' END END + '</a>' AS VARCHAR) "Status / Log" ,Cast('<a href="' AS VARCHAR) + Replace(Replace(detail_log,'K:\','file://servernamehere/DiskVolK/'), '\','/') + Cast('" title="Click to view Detail log text file"' AS VARCHAR(99)) + Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\DetailLog.bmp" border="0"></a>' AS VARCHAR(999)) + Char(160) + Cast('<a href="' AS VARCHAR) + Replace(Replace(summary_log,'K:\','file://servernamehere/DiskVolK/'), '\','/') + Cast('" title="Click to view Summary log text file"' AS VARCHAR(99)) + Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\SummaryLog.bmp" border="0"></a>' AS VARCHAR(999)) "Text Logs" ,errorcount "Error Count" ,warningcount "Warning Count" ,(totmsgcount - errorcount - warningcount) "Information Message Count" ,CASE WHEN end_datetime > start_datetime THEN CASE WHEN Cast(Datepart("hh",(end_datetime - start_datetime)) AS INT) > 0 THEN Cast(Datepart("hh",(end_datetime - start_datetime)) AS VARCHAR) + ' hr ' ELSE ' ' END + CASE WHEN Cast(Datepart("mi",(end_datetime - start_datetime)) AS INT) > 0 THEN Cast(Datepart("mi",(end_datetime - start_datetime)) AS VARCHAR) + ' min ' ELSE ' ' END + CASE WHEN Cast(Datepart("ss",(end_datetime - start_datetime)) AS INT) > 0 THEN Cast(Datepart("ss",(end_datetime - start_datetime)) AS VARCHAR) + ' sec ' ELSE ' ' END ELSE CASE WHEN end_datetime = start_datetime THEN '< 1 sec' ELSE CASE WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240) OR ((SELECT Count(* ) FROM MASTER.dbo.sysprocesses a(nolock) INNER JOIN gcsdwdb.dbo.update_log b(nolock) ON a.program_name = b.program_name WHERE a.program_name = update_log.program_name AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0)) THEN 'Running ' + Cast(Datediff(mi,start_datetime,Getdate()) AS VARCHAR) + ' min' ELSE '&nbsp;' END END END "Elapsed Time" /* ,end_datetime "End Time (UTC/GMT)" ,datepart("hh" , (end_datetime - start_datetime)) "Hr" ,datepart("mi" ,(end_datetime - start_datetime)) "Mins" ,datepart("ss" ,(end_datetime - start_datetime)) "Sec" ,datepart("ms" ,(end_datetime - start_datetime)) "mSecs" ,datepart("dw" ,start_datetime) "dp" ,case when datepart("dw" ,start_datetime) = 6 then ' Fri' when datepart("dw" ,start_datetime) = 5 then ' Thu' else '1' end */ ,totalrows "Total Rows" ,inserted "Rows Inserted" ,updated "Rows Updated" /* ,success_code "succ" ,exit_code "exit" */ FROM update_log WHERE start_datetime >= '5/29/2009 16:15' ORDER BY start_datetime DESC

+4

sql

Chadd May 30, '09 at 8:20

source share

7 answers

SQL should not be formatted, period. This is relational algebra for extracting (when using SELECT ) data from a database.

Getting DBMS to format data for you is not what you need to do, and it should be left to your own code (outside the DBMS). DBMS, as a rule, is at a sufficient load, since it does not need to perform your presentation work for you. It is also optimized for data retrieval, not presentation.

I know database administrators who will require my immediate execution if I try to do something like this :-)

+6

paxdiablo May 30, '09 at 8:26

source share

The concept of formatting output in SQL makes a kind of gap for the whole concept of separation of presentation and data, and not only that, but there are a number of conditions that may arise:

What if you need to localize your date formats? The UK uses a different date format in the USA, for example, are you going to internationalize all the way to the data layer?
What if formatting rules change? That is, should some text be formatted differently to fit the new corporate policy? Again, you will need to return to the data layer.
If we take the web context, how do you decide to avoid the meanings? It is advisable that you go to a web page or in JSON or elsewhere.

Not only that, but also SQL string management functions are generally not very dangerous.

+5

Kazar May 30, '09 at 8:28

source share

I am the developer responsible for reporting on my company. In simple words, the engine works by creating an XML data document to go to a report from a database, and then transform the XML into any way to create a web page or PDF or Word document based on user requirements.

When I started five years ago, I had a database formatting the output, although I’m glad to say that I didn’t write anything, it’s just as terrible as with the questions. Over time, I moved the other way, and now XML only stores raw data, and this gets tidy during the presentation.

Our software uses Traffic Lights as a quick status indicator, so we have many char fields in the database where "R", "A", "G", "U" are stored to represent red, yellow, green and unknown. I had several tricks, such as SELECTS with built-in CASE statements to convert single character codes to their English counterparts:

 SELECT CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' ...etc...

Sorting cannot be performed on internal codes; Users expect that everything will be in two directions: red, amber, green or green, amber, red; so I also had corresponding SORT columns

 SELECT CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' WHEN 'A' THEN 'Amber' END as status, CASE status WHEN 'R' THEN 0 WHEN 'A' THEN 1 WHEN 'G' THEN 2 END as sort FROM table ORDER BY sort

This is just a brief example. I had other tricks for formatting dates, assembling names, etc.

This, of course, led to problems making the application multilingual, as English was boiled into the database. I need to find the localization of the client and write many multilingual CASES to support other languages. Not good. There were also problems. Americans love their mm / dd dates and Europeans do dd / mm.

This also led to other duplication issues. If someone added a fourth or fifth traffic light option, I need to change all my SQL when the new status is already presented in the code as a Java rename or something else that I could search as soon as I read one character from the database .

In my case, it has become much simpler just for the database to return the raw data, and I should write a set of comparators and formatters to represent the data in the document in my native language and user encoding. If I started again today, I would have done it.

+2

banjollity May 30, '09 at 8:42

source share

I think there is room for some kind of transformation on the way out of SQL, and it depends on the expectations of the calling program.

For example, if datetime is appropriate, it should be returned initially. On the other hand, if you return only a year in the datetime field (or a quarter, for example, 1/1, 4/1, 7/1, 10/1), and it is expected that the client will analyze the information, put it in a separate column ( e.g. year = 2008 or quarter = '2008Q1). Some code translations from code to description (deleting a column of code and only issuing a description). There are reasonable cases where concatenation and line building are appropriate.

In your specific example, this is the place where it is unacceptable, and on the surface it looks like a weaker connection (only changing the SP in the database), it can actually create a stronger connection, forcing additional SPs to be written for different applications instead of several. User interfaces can use the same SP. And then several SPs may need to be changed synchronously as the system evolves.

+1

Cade roue May 30, '09 at 8:34

source share

When considering whether to format data on behalf of a presentation layer, consider that your “presentation level” may be a web service or other program. You can start by formatting on behalf of a part of the user interface code, only to later require the same request that will be used by the web service, which will have different requirements.

My favorite was a set of stored procedures that all formatted a date / time. In the local time zone. This does not work well when calling a web service from a different time zone. It worked even less frequently when the regional settings of the database server changed, changing the format of the date and time. Oh, and that didn't work at midnight, as he truncated “00:00” at the end.

OTOH, it was very user-friendly.

+1

John saunders May 30, '09 at 11:28

source share

Most of the people I know disagree with me here, but I like this approach. Therefore, I will list some advantages:

SQL is very powerful: how many C # lines will this query have?
SQL is very easy to update. I believe this code is stored in a stored procedure, which you can change with a simple ALTER PROC. This can significantly reduce the time to fix errors.
SQL fast; I have seen cases where implementing an ORM layer slowed down the application.
SQL is easily debugged, and errors are easy to reproduce. Just run the query. Checking your fix is a matter of starting a new request.
SQL like this is not so difficult to maintain when it is properly formatted. I can’t understand how much SQL I can’t understand in 5-10 minutes; but C # layered solutions can take a very long time, especially if you need to figure out which layer abstraction breaks.

I'm sure other people will list the flaws of the SQL approach.

0

Andomar May 30, '09 at 8:40

source share

anon · Accepted Answer · 2009-05-30T08:26:54+0000

The answer, obviously, is "just returning the result." Formatting on the SQL server has the following problems:

it increases network traffic from SQL server
SQL has very poor string handling functions
SQL servers are not optimized for string manipulation
You use server processor cycles that are best used to process requests.
it can make life difficult (or impossible) for the query optimizer
you need to write many more requests to support various formatting.
You may need to write different requests to support formatting in different browsers.
You cannot reuse queries for different purposes.

I am sure there are many more.

Should SQL format the output or just retrieve the raw data?

More articles: