Enable query cache in postgreSQL for better performance

Question

Enable query cache in postgreSQL for better performance

My application is working very hard with the database, so I'm trying to reduce the load on the database. I use PostgreSQL since rdbms and python are a programming language. To reduce the load, I already use the caching mechanism in the application. The type of caching that I used is server cache, browser cache. I am currently setting up the PostgreSQL query cache to match the characteristics of queries running on the server.

Questions:

Can I fine tune the query cache at the database level?
Can I fine tune the query cache based on a table?
provide a tutorial to learn about the query cache in PostgreSQL.

+11

python database caching postgresql

iDhavalVaja Jan 01 '15 at 5:26

source share

2 answers

Joe love · Answer 1 · 2019-10-12T00:14:51+0000

Setting up PostgreSQL is much more than just setting up caches. In fact, the primary high-level stuff is the “shared buffers” (imagine this is the main cache of data and indexes) and work_mem.

Shared buffers help with reading and writing. You want to give it a decent size, but for the entire cluster ... and you cannot configure it for each table or especially for the query. The important thing is that it does not store query results .. it stores tables, indexes, and other data. In an ACID-compliant database, this is not very efficient or useful for caching query results.

"Work_mem" is used to sort the query results in memory, and you do not need to resort to writing to disk. Depending on your request, this area can be as important as the buffer cache, and it is easier to configure. Before executing the query, which should perform a larger sort, you can execute the set command, such as "SET work_mem = '256MB';"

As others have suggested, you can figure out WHY the request is slow using the “explanation”. I would personally suggest exploring the "access path" that postgresql uses to access your data. This is much more complicated and, frankly, it’s better to use resources than just thinking about “caching results”.

You can honestly improve the situation with the help of data design, and with the help of functions such as partitioning, functional indexes and other methods.

Another thing is that you can improve performance by writing better queries. Things like "with" clauses can prevent the postgres optimizer from optimizing queries completely. The optimizer itself also has parameters that can be adjusted--, so the database will spend more (or less) time optimizing the query before it is executed ... which may make a difference.

You can also use certain methods to write queries to help the optimizer. One of these methods is the use of bind variables (colon variables) - this will lead to the fact that the optimizer will receive the same request again and again with different data transferred. Thus, the structure does not need to be evaluated again and again. query plans can be cached in this way.

Without seeing some of your queries, the design of tables and indexes, and the plan of explanation, it is difficult to give a specific recommendation.

In general, you need to find queries that are not as effective as you think, and find out where the conflict occurs. This is probably disk access, however, the reason is ultimately the most important part ... do I have to go to disk to sort it? Is it an internal choice of the wrong path to access the data, so that it reads data that could be easily deleted at an earlier stage of the query process ... I have been an Oracle Certified Administrator for over 20 years, and PostgreSQL is definitely different, however, many of The same methods are used when it comes to diagnosing query performance problems. Although you really can't provide hints, you can still rewrite queries or tune certain parameters to get better performance ... in general, I found that postgresql is easier to tune in the long run. If you can provide some details, such as a request and explain the information, I would be happy to give you specific recommendations. Unfortunately, the "cache setting" is likely to provide you with the desired speed by itself.

Ziggy Crueltyfree Zeitgeister · Answer 2 · 2016-03-31T06:38:35+0000

I developed a system for caching results to speed up the results obtained from the web solution. I reproduced below essentially what he did:

The following are general tables and cache processing functions.

 CREATE TABLE cached_results_headers ( cache_id serial NOT NULL PRIMARY KEY, date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP, last_access timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP, relid regclass NOT NULL, query text NOT NULL, rows int NOT NULL DEFAULT 0 ); CREATE INDEX ON cached_results_headers (relid, md5(query)); CREATE TABLE cached_results ( cache_id int NOT NULL, row_no int NOT NULL ); CREATE OR REPLACE FUNCTION f_get_cached_results_header (p_cache_table text, p_source_relation regclass, p_query text, p_max_lifetime interval, p_clear_old_data interval) RETURNS cached_results_headers AS $BODY$ DECLARE _cache_id int; _rows int; BEGIN IF p_clear_old_data IS NOT NULL THEN DELETE FROM cached_results_headers WHERE date < CURRENT_TIMESTAMP - p_clear_old_data; END IF; _cache_id := cache_id FROM cached_results_headers WHERE relid = p_source_relation AND md5(query) = md5(p_query) AND query = p_query AND date > CURRENT_TIMESTAMP - p_max_lifetime; IF _cache_id IS NULL THEN INSERT INTO cached_results_headers (relid, query) VALUES (p_source_relation, p_query) RETURNING cache_id INTO _cache_id; EXECUTE $$ INSERT INTO $$||p_cache_table||$$ SELECT $1, row_number() OVER (), rr FROM ($$||p_query||$$) r $$ USING _cache_id; GET DIAGNOSTICS _rows = ROW_COUNT; UPDATE cached_results_headers SET rows = _rows WHERE cache_id = _cache_id; ELSE UPDATE cached_results_headers SET last_access = CURRENT_TIMESTAMP; END IF; RETURN (SELECT h FROM cached_results_headers h WHERE cache_id = _cache_id); END; $BODY$ LANGUAGE PLPGSQL SECURITY DEFINER;

Below is an example of using the above tables and functions for a given view named my_view with the key field, which should be selected in the range of integer values. You should replace all of the following with your specific needs and replace my_view table, view, or function. Also replace filtering options as needed.

 CREATE VIEW my_view AS SELECT ...; -- create a query with your data, with one of the integer columns in the result as "key" to filter by CREATE TABLE cached_results_my_view ( row my_view NOT NULL, PRIMARY KEY (cache_id, row_no), FOREIGN KEY (cache_id) REFERENCES cached_results_headers ON DELETE CASCADE ) INHERITS (cached_results); CREATE OR REPLACE FUNCTION f_get_my_view_cached_rows (p_filter1 int, p_filter2 int, p_row_from int, p_row_to int) RETURNS SETOF my_view AS $BODY$ DECLARE _cache_id int; BEGIN _cache_id := cache_id FROM f_get_cached_results_header('cached_results_my_view', 'my_view'::regclass, 'SELECT r FROM my_view r WHERE key BETWEEN '||p_filter1::text||' AND '||p_filter2::text||' ORDER BY key', '15 minutes'::interval, '1 day'::interval); -- cache for 15 minutes max since creation time; delete all cached data older than 1 day old RETURN QUERY SELECT (row).* FROM cached_results_my_view WHERE cache_id = _cache_id AND row_no BETWEEN p_row_from AND p_row_to ORDER BY row_no; END; $BODY$ LANGUAGE PLPGSQL;

Example. Extract rows from 1 to 2000 from cached my_view results filtered by key BETWEEN 30044 AND 10610679 . Run it for the first time, and the query results will be cached into the cached_results_my_view table, and the first 2000 records will be returned. Run it again soon, and the results will be retrieved from the cached_results_my_view table directly without executing the query.

 SELECT * FROM f_get_my_view_cached_rows(30044, 10610679, 1, 2000);

Enable query cache in postgreSQL for better performance

More articles: