What does the term “canonical form” or “canonical representation” mean in Java?

I often heard that this term is used, but I never understood it.

What does this mean, and can someone give some examples / point me to some links?

EDIT: Thanks everyone for the answers. Can you also tell me how canonical representation is useful in working equals (), as stated in Effective Java?

+66
java
Nov 11 '08 at 5:36
source share
9 answers

Wikipedia refers to the term Canonicalization .

The process of converting data that has more than one possible representation into a “standard” canonical representation. This can be done to compare different representations for equivalence, to count the number of different data structures, to increase the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sort order.

The Unicode example gave me the most meaning:

Unicode variable length encodings, in particular UTF-8, have more than one possible encoding for most common characters. This makes string checking more difficult, since all possible encodings of each character of the string must be taken into account. A software implementation that does not take into account all character encodings runs the risk of accepting strings that are considered invalid in the design of the application, which may cause errors or allow attacks. The solution is to allow a single encoding for each character. Canonization is the process of translating each character in a string to its only permitted encoding. An alternative is software to determine if a row is canonized, and then reject it if it is not. In this case, in the client / server context, canonicalization will respond to the client.

Thus, the standard form of data presentation. From this form you can convert to any representation that you may need.

+46
Nov 11 '08 at 5:51
source share

I believe that there are two related uses of the canonical: forms and instances.

A canonical form means that the values ​​of a particular type of resource can be described or represented in several ways, and one of these methods is selected as the preferred canonical form. (This form is canonized, like books that turned it into a bible, but other forms are not.) A classic example of a canonical form is paths in a hierarchical file system, where a single file can be referenced in several ways

myFile.txt # in current working dir ../conf/myFile.txt # relative to the CWD /apps/tomcat/conf/myFile.txt # absolute path using symbolic links /u1/local/apps/tomcat-5.5.1/conf/myFile.txt # absolute path with no symlinks 

The classic definition of the canonical representation of this file will be the last path. With local or relative paths, you cannot globally identify a resource without contextual information. With absolute paths, you can identify a resource, but you cannot determine whether two paths belong to the same object. If two or more paths are converted to their canonical forms, you can do all of the above, and determine whether the two resources are the same or not, if it is important for your application (solve the smoothing problem).

Note that the canonical form of the resource is not a quality of this particular form itself; there can be several possible canonical forms for a given type, for example, paths to files (say, lexicographically, primarily possible absolute paths). One form is simply chosen as the canonical form for a specific reason for use, or maybe arbitrarily, so that everyone speaks the same language.

Forcing objects into their canonical instances is one and the same basic idea, but instead of defining one “best” representation of the resource, he arbitrarily selects one instance of the instance class with the same “contents” as the canonical reference, then converts all references to equivalent objects for use of one canonical instance.

This can be used as a method of optimizing time and space. If there are several instances of equivalent objects in the application, then, forcing them to solve everything as the only canonical instance of a certain value, you can exclude all but one of the values, saving space and, possibly, time, since now you can compare those values ​​with the reference identifier (==) as opposed to equivalence of objects ( equals() method).

A classic example of optimizing performance with canonical instances is folding lines with the same content. Calling String.intern() for two strings with the same sequence of characters is guaranteed to return the same canonical String object for this text. If you pass all your lines through this canonizer, you know that equivalent lines are actually identical references to objects, i.e. Aliases

Enumeration types in Java 5.0+ force all instances of a particular enumeration value to use the same canonical instance inside a virtual machine, even if the value is serialized and deserialized. This is why you can use if (day == Days.SUNDAY) with impunity in java if Days is an enum type. Doing this for your own activities is certainly possible, but takes care. Read effective Java from Josh Bloch for details and tips.

+50
Dec 12 '08 at 3:46 a.m.
source share

The word "canonical" is simply synonymous with "standard" or "ordinary." It has no Java specific meaning.

+20
Nov 11 '08 at 5:48
source share

A good example for understanding the "canonical form / presentation" is to consider the data type definition of the XML schema "boolean":

  • The “lexical representation” of a Boolean can be one of: {true, false, 1, 0} , whereas
  • "canonical representation" can only be one of {true, false}

This essentially means that

  • "true" and "1" displayed in canonical representation. "true" and
  • "false" and "0" matched with the canonical expression. "false"

see XML w3 schema data type definition for boolean

+17
Sep 12 '12 at 12:33
source share

reduced to the simplest and most significant form without loss of generality

+14
Jul 29 '10 at 8:13
source share

An easy way to remember this canonical method is used in theological circles, canonical truth is real truth, so if two people find it, they find the same truth. Same thing with the canonical instance. If you think you have found two of them (i.e. a.equals(b) ), you really only have one (i.e. a == b ). Thus, equality means identity in the case of a canonical object.

Now for comparison. Now you have the choice of using a==b or a.equals(b) , since they will give the same answer in the case of a canonical instance, but a == b is a reference comparison (the JVM can compare two numbers extremely quickly, since they represent It’s just two 32-bit patterns compared to a.equals(b) , which is a method call and includes additional overhead.

+4
Aug 02 '12 at 11:18
source share

Another good example might be: you have a class that supports the use of Cartesian (x, y, z), spherical (r, theta, phi) and cylindrical coordinates (r, phi, z). In order to establish equality (the method of equality), you probably want to convert all the representations into one “canonical” representation of your choice, for example. spherical coordinates. (Or maybe you would like to do this as a whole - for example, use one internal representation.) I am not an expert, but this happened to me as a good concrete example.

+2
Jun 15 '12 at 9:11
source share

canonical representation means viewing a character in a different style, for example, if I write the letter A, another person can write the letter A in a different style :)

This is in accordance with the POSITIONING OF THE OPTICAL CHARACTER

0
Sep 14 2018-10-10T00:
source share

Canonical form means the natural single representation of an element

0
Feb 17 '16 at 20:22
source share



All Articles