What is the difference between a directory and a schema in a relational database?

I thought that the schema was the "top wrapper" object in front of the database itself. I mean DB.schema.<what_ever_object_name_under_schema> .

Well, the wrapper directory is now quite confusing. Why do we need a directory? For what purpose should the directory be used?

+53
database schema
Aug 11 '11 at 8:13
source share
2 answers

From a relational point of view:

A catalog is a place where, by the way, all various schemes (external, conceptual, internal) and all relevant comparisons (external / conceptual, conceptual / internal) are stored.

In other words, the catalog contains detailed information (sometimes called descriptive information or metadata ) regarding various objects that are of interest to the system itself.

For example, the optimizer uses directory information about indexes and other physical storage structures, as well as much more, to help it decide how to implement user queries. Similarly, the security subsystem uses directory information about users and security restrictions to provide or reject such requests in the first place.

Introduction to Database Systems, 7th ed., CJ Date, pp. 69-70.




From an SQL perspective:

Directories are called schema sets in SQL. SQL environment contains zero or more directories. A catalog contains one or more schemas, but always contains a schema named INFORMATION_SCHEMA, which contains views and domains of the Information schema.

SQL Database Language , (Proposed Revised Text DIS 9075), p. 45




From an SQL perspective:

A directory is often synonymous with a database. In most SQL dbms, if you request information_schema views, you will find these values ​​in the table_catalog column table for the database name.

If you find your platform using the catalog in a broader way than any of these three definitions, it could mean more than a database — a database cluster, server, or server cluster. But I doubt that since you would find it easily in the documentation of your platform.

+49
Aug 11 2018-11-11T00:
source share

Mike Sherrill’s review " gave an excellent answer . I will add just one example: Postgres .

Cluster = Postgres installation

When you install Postgres on a machine, this installation is called a cluster. “The cluster is not meant in the hardware sense of several computers working together. In Postgres, the cluster refers to the fact that you can run several unrelated databases using the same Postgres server server.

The word cluster is also defined by SQL Standard in the same way as in Postgres. Adhering to the SQL standard is the main goal of the Postgres project.

SQL-92 indicates:

A cluster is a set of directories defined by an implementation.

and

Exactly One Cluster Associated with SQL Session

What a dumb way to tell a cluster is a database server (each directory is a database).

Cluster> Catalog> Schema> Table> Columns and Rows

So, in Postgres and SQL Standard, we have this containment hierarchy:

  • A computer can have one cluster or several.
  • The database server is a cluster.
  • The cluster directories . (Directory = Database)
  • Schemas directories . (Schema = namespace tables and security boundaries)
  • Schema tables .
  • In the rows tables.
  • Rows have values, specific columns . These values ​​are the business data that your applications and users care about such as person name, billing date, product price, gamers' high score. The column determines the data type of the values ​​(text, date, number, etc.).

Diagram showing nesting boxes representing how connecting on a port gets you to cluster (a database server) which contains one or more Catalogs (a database) each of which contains one or more Schemas (a namespace) each of which contains tables each of which has rows.

Multiple clusters

This diagram is a single cluster. In the case of Postgres, you can have more than one cluster on the host computer (or virtual OS). Typically, many clusters are run to test and deploy new versions of Postgres (for example: 9.0 , 9.1 , 9.2 , 9.3 , 9.4 , 9.5 ).

If you have multiple clusters, imagine the diagram above is duplicated.

Different port numbers allow multiple clusters to live simultaneously and simultaneously. Each cluster is assigned its own port number. Plain 5432 is only the default and can be installed by you. Each cluster listens on its own designated port for incoming connections to the database.

Scenario example

For example, a company may have two teams of software developers. One writes warehouse management software, while the other team creates sales and marketing management software. Each development team has its own database, blissfully unaware of others.

But the IT operations team decided to run both databases on the same computer (Linux, Mac, etc.). So, on this box they installed Postgres. Thus, one database server (database cluster). In this cluster, they create two directories, a directory for each development team: one for "warehouse" and one for "sales."

Each development team uses dozens of tables with different goals and access roles. Therefore, each development team organizes its tables in the form of diagrams. Coincidentally, both development teams do some credential tracking, so each team has a scheme called “accounting.” Using the same schema name is not a problem, because each directory has its own namespace , so no collision.

In addition, each team ultimately creates a table for accounting purposes called a “ledger”. Again, no name clash.

You can imagine this example as a hierarchy ...

  • Computer (hardware unit or virtualized server)
    • Postgres 9.2 cluster (installation)
      • warehouse catalog (database)
        • inventory scheme
          • [... several tables]
        • accounting scheme
          • ledger table
          • [... some other tables]
      • sales directory (database)
        • selling scheme
          • [... several tables]
        • accounting schema (matching name above)
          • ledger table (matching name as above)
          • [... some other tables]
      cluster
    • Postgres 9.3
      • [... other diagrams and tables]

Each software from the development team creates a connection to the cluster. In this case, they must indicate which directory (database) belongs to them. Postgres requires connecting to a single directory, but you are not limited to this directory. This start directory is simply the standard directory used when your SQL statements omit the directory name.

Therefore, if the development team needs access to other team tables, they can do this if the database administrator has granted them privileges to do so. Access is explicitly named in the template: catalog.schema.table . Therefore, if the warehouse team needs to see a book of other teams (the sales team), they write SQL statements using sales.accounting.ledger . To access their own book, they simply write accounting.ledger . If they access both registers in the same source code fragment, they can avoid confusion by including their own (optional) directory name, warehouse.accounting.ledger compared to sales.accounting.ledger .




By the way ...

You can hear the phrase used in a more general sense, which means the whole design of the specific structure of the database table. In contrast, in the SQL standard, a word denotes a specific level in the Cluster > Catalog > Schema > Table hierarchy.

Postgres uses both the word database and the directory in different places, for example, the CREATE DATABASE command .

Not all database systems provide this complete hierarchy Cluster > Catalog > Schema > Table . Some have only one directory (database). Some do not have a schema, just one set of tables. Postgres is an exceptionally powerful product.

+107
Jul 30 '13 at 9:56 on
source share



All Articles