Whose Responsibility for Data Validation?

Question

Whose Responsibility for Data Validation?

I am confused as to whether the caller or the person responsible is responsible for enforcing the legality of the data.

If the control group checks that the pass-in arguments must not be null and meet some other requirements so that the called user method can execute normally and successfully and catch any potential exceptions? Or is it the caller's responsibility to do this?

+51

language-agnostic

hiway Jun 19 '13 at 9:38

source share

13 answers

For the API, the caller must always perform the correct check and throw a descriptive exception for the invalid data.

For any client with an IO overhead client, you must also perform a basic check ...

+19

Thihara Jun 19 '13 at 9:44 on

source share

Check: Caller vs. Called

Version TL; DR are both.

The long version means who, why, when, how, and what.

AND

Both should be prepared to answer the question "can this data work reliably?" Are we aware of this data in order to do something meaningful? Many believe that data reliability should never be reliable, but this only leads to a chicken and egg problem. Pursuing him endlessly from both ends will not provide significant value, but to some extent this is important.

Both should validate the data form to ensure database usability. If any of them does not recognize or understand the data form, there is no way to find out how to further handle it with any reliability. Depending on the environment, the data may need a certain “type”, which is often an easy way to validate the form. We often consider types that present evidence of a general lineal back to a specific ancestor, and retain important traits in order to have the correct form. Other characteristics may be important if the data is something other than a memory structure, for example, if it is a stream or some other resource, external context.

Many languages include data form validation as a built-in language feature through type or interface validation. However, when the preference of the composition over inheritance, the provision of a good mechanism for checking the existence of signs lies with the performer. One strategy to achieve this is through dynamic programming, or in particular through introspection, inference, or type reflection.

Called

The caller must check the domain (set of inputs) of the given context to which he will work. The design of the called always assumes that it can only handle so many input cases. Typically, these values are broken down into specific subclasses or input categories. We check the domain in the called because the call is intimate with localized restrictions. He knows best of all what is a good contribution and what is not.

Normal Values: These domain map values are in range. For each foo there is one and only one bar .
Out of range / out of range values: these values are part of a common domain, but will not be displayed in a range in the context of the called. There is no specific behavior for these values, and therefore no valid exit is possible. Out-of-range checks often include ranges, restrictions, or valid characters (or numbers or compound values). A power check (multiplicity) and then a presence check (zero or empty) are special forms of range checking.
Values that lead to Illogical or undefined behavior: these values are special values or extreme cases that are otherwise normal, but may result in unexpected results due to the design of the algorithm and known environmental constraints. For example, a function that works with numbers should protect against dividing by zero or the accumulators that will overflow, or the inadvertent loss of accuracy. Sometimes the operating environment or the compiler may warn that these situations may occur, but relying on the runtime or the compiler is not good practice, because it cannot always output what is possible and what is not. This step should be verified to a large extent by a secondary check that the caller provided good, useful, meaningful input.

Caller

The caller is special. The caller has two situations in which he must check the data.

The first situation is a change in an assignment or an explicit state, when the change occurs with at least one data element using some explicit mechanism, inside or outside of something in its container. This is somewhat beyond the scope of the question, but something needs to be kept in mind. It is important to take into account the context when a state change occurs and one or more elements describing the state are affected.

Self / Referential Integrity: Consider using an internal mechanism to check the status if other participants can reference the data. When data does not have consistency checks, it can be safely assumed that it is in an undefined state. This is not intermediate, but vague. Know yourself. When you do not use the mechanism to check for internal consistency during a state change, the data is not reliable and leads to problems in the second situation. Make sure the data for the caller is in a known good condition; alternatively, in a known transition / recovery state. Do not call until you are ready.

The second situation is when the data calls the function. The caller can expect only so much from the caller. The caller must know and respect that the caller recognizes only a specific domain. The caller must also be interested in himself, as he can go on and on long before the callers complete. This means that the caller must help the caller be not only successful, but also appropriate for the task: bad data as a result generates bad data. At the same time, even good data in relation to and called in relation to the called cannot match the following thing in terms of the calling. Good data can actually be bad data for the caller. The caller’s output can invalidate the caller for the current state of the caller.

Ok, enough comments, what should a specific caller check?

It’s logical and normal: considering the data, is a good strategy called that corresponds to the goal and intention? If we know that this will work with certain values, there is no longer any reason to make a call without the appropriate guards. If we know that a call cannot process zero, do not ask for it, as it will never succeed. What is more expensive and more difficult to manage: [redundant (do we know?)] Security proposal or exception [which occurs late in a possibly longer resource, depending on external resources)? Implementations can change and change suddenly. Providing security to the caller reduces the impact and risk of changing this implementation.
Return Values: Check for failure. This is what the caller may or should not do. Before using or relying on the returned data, check for alternative results if the design of the system takes into account successful and unsuccessful values that may accompany the actual return value.

Footnote: In case this is not clear. Null is a domain problem. It may or may not be logical and normal, so it depends. If zero is the natural input to the function, and the function can be reasonably expected to produce meaningful output, then leave it to the caller to use it. If the caller’s domain is such that null is not logical, then protect it in both places.

The important question is: if you pass null to the callee and the caller produces something, is it not a hidden creation template that creates something from nothing?

+11

JustinC Jun 19 '13 at 20:39 on

source share

All about the "contract." This is a review that determines which parameters are accurate or not. You can add to the documentation that the "null" parameter is not valid and then throw a NullPointerException or InvalidArgumentException in order.

If the return result for the null parameter makes sense, specify it in the documentation. This situation is reasonable - poor design - create an overridden method with fewer parameters instead of accepting null.

Just remember how to throw descriptive exceptions. By the rule:

If the caller passed incorrect arguments other than those described in the documentation (i.e. null, id <0, etc.) - get an exception thrown ( NullPointerException or InvalidArgumentException )
If the caller has passed the correct arguments, but there may be an expected business case, which makes it impossible to process the call, you can throw a verified descriptive exception. For example, for getPermissionsForUser(Integer userId) caller skips userId without knowing if such a user exists, but this is a non-empty integer. Your method may return a list of permissions or UserNotFoundException a UserNotFoundException . This may be a checked exception.
If the parameters are correct in accordance with the documentation, but they lead to an internal processing error, you can exclude an uncontrolled exception. This usually means that your method is not tested well; -)

+9

Piotr Gwiazda Jun 19 '13 at 10:04 on

source share

Well ... it depends.

If you can be sure of how to process invalid data inside your interlocutor, then do it there.

If you are not sure (for example, because your method is fairly general and used in several different places and ways), let the caller decide.

For example, imagine a DAO method that should get a specific entity, and you won't find it. Can you decide whether to throw an exception, maybe discard the transaction, or just think about it? In such cases, it definitely depends on how the caller decides how to deal with it.

+4

Marco Forberg Jun 19 '13 at 9:47 on

source share

I. It is a matter of good software development on both sides and regardless of environment (C / S, web, internal API) and language.

The call should check all the parameters in accordance with a well-documented list of parameters (you have documented it, right?). Depending on the environment and architecture, good error messages or exceptions should be implemented to give a clear idea of what is wrong with the parameters.

The caller must ensure that only the appropriate parameter values are passed in the api call. Any invalid values should be caught as soon as possible and reflected in some way to the user.

As often happens in life, none of the parties should simply assume that the other guy will go right and ignore the potential problem.

+4

cdkMoose Jun 19 '13 at 17:04 on

source share

I will consider a different point of view on this issue. Working inside the attached application, both the caller and the called party are in the same code. Then, any verification required by the contract of the called party must be carried out by the called party.

So, you wrote a function, and your contract says: "It is not NULL." You must verify that NULL values have not been sent and are causing an error. This ensures that your code is correct, and if someone else is doing something, he should not know about it before.

In addition, if you assume that other code will call your method correctly, but it will not , it will make it difficult to track the source of potential errors.

This is important for “Fail Early, Fail Often,” where the idea is to raise the error condition as soon as a problem is detected.

+3

Chris Jun 19 '13 at 12:49 on

source share

Depends on whether you program arbitrarily, defensively or completely.

If you program to defend (my personal favorite for most Java methods), you check the input in the method. You discard the exception (or fail in another way) when the check fails.
If you program nominally , you do not check the input (but expect the client to verify that the input is valid). This method is useful when validation will have a disgusting effect on performance because it will take a long time to validate (for example, to search by time).
If you program completely (my personal favorite for most Objective-C methods), you check the input in the method, but you change the invalid input to a valid input (for example, by binding values to the closest real value).

In most cases, you would program protected (fault-tolerant) or fully (fail-safe). Nominal programming is a risky IMO and should be avoided when waiting for input from an external source.

Of course, do not forget to document everything (especially when nominally programming).

+3

Randy Marsh Jun 26 '13 at 12:43 on

source share

Responsibility is responsible for the accuracy of the data. This is because only the called person knows what really is. It is also a good security practice.

+2

Grzegorz Żur Jun 19 '13 at 9:41

source share

It should be both on the client side and on the server side (called and calling).

Client:

This is the most effective option.
Checking the client will reduce one request per server.
To reduce bandwidth traffic.
Comment time (if it has a delay with the server)

Server:

Do not believe UI data (due to hackers).
Basically, the backend code will be reused, so we don’t know if the data will be empty, etc. therefore, we must check both the polling methods and the caler.

In general, 1. If the data comes from the user interface, it is always better to check the user interface level and perform a double check at the server level. 2. If data transfer is carried out at the server level itself, we need to check on the call and for double verification, we must also do on the subscriber side.

thank

+2

Hariharan Jun 19 '13 at 9:54 on

source share

In my humble opinion and in a few more words explaining why this most often causes responsibility, but this does not mean that the caller is always free from stress.

The reason is that the called woman is in a better position to know what she needs to do her job, because she does the job. Thus, it is a good encapsulation for an object or method for self-esteem. If the callee may not work with a null pointer, this is an invalid argument and should be discarded as such. If there are arguments out of range, it is also easy to defend against.

However, "ignorance of the law is not a defense." This is not a good pattern for the caller to simply put everything that he has given into his auxiliary function and let the caller figure it out. The caller does not add any value when he does this, firstly, especially if what the caller calls the called object is the data that he himself gave to his interlocutor, which means that this level of the call stack is probably redundant . It also makes the code of the calling and called code very complicated, since both sides “protect” from unwanted behavior by the other (the calling side tries to save something workable and tests everything, and the calling agent ends the call in statements about attempts that try to correct the call) .

Therefore, the caller must verify that he can know about the requirements for the transmitted data. This is especially true when there is time associated with making a call, for example, when calling a service proxy. If you need to wait a significant part of a second to find out that your parameters are wrong, when you need to make several ticks to make the same client side, the advantage is obvious. The guard’s reservations are exactly that; , - .

+2

KeithS 19 . '13 15:32

source share

-, . , , . - , . Java InvalidArgumentException .

. , , . , , . , , , . . , , , .

0

André Stannek 19 . '13 9:57

source share

, , , . .

, (), , .

, , .

0

wobblycogs 19 . '13 12:26

source share

duffymo · Accepted Answer · 2013-06-19 09:40

Verification on the side of the buyer (client) and supplier (API).

Customers should do this because it means a better experience. For example, why a round-trip network to say that you have one bad text box?

Suppliers must do this because they should never trust customers (e.g., XSS and the person in medium attacks). How do you know that the request was not intercepted? Confirm everything.

There are several valid levels:

Here are all the required fields, the correct formats. This is what the client is checking.
# 1 plus a valid relationship between the fields (for example, if X is present, then Y is required).
# 1 plusS # 2 plus valid business: complies with all business rules for proper handling.

Only the provider side can perform # 2 and # 3.

Whose Responsibility for Data Validation?

More articles: