Postgresql Select * from performance table vs MySql

I have a MySQL database that I am migrating to PostgreSQL (due to GIS features).

Many of the tables contain hundreds of thousands of rows, so I need to remember performance.

My problem is that PostgreSQL seems awfully slow ...

For example, if I make a simple SELECT * FROM [table] in a particular table in a MySQL database, say that it has 113,000 rows, the query takes about 2 seconds to return the data. In PostgreSQL, an exact query in the same table takes almost 10 seconds.

Similarly, I have another table with fewer rows (88,000), and this is worse! MySQL takes 1.3 seconds, PostgreSQL takes 30 seconds!

Is this what I can expect from PostgreSQL, or is there something I can do to make it better?

My OS is XP and I am running 2.7ghz dual code with 3GB of RAM. The MySQL database is version 5.1 using the standard standard. The PostgreSQL database is version 8.4, and I edited the configuration as follows: shared_buffers = 128 MB effective_cache_size = 512 MB

Thanks!

Here is the structure of the second table, which contains about 88,000 rows:

CREATE TABLE nodelink ( nodelinkid serial NOT NULL, workid integer NOT NULL, modifiedbyid integer, tabulardatasetid integer, fromnodeid integer, tonodeid integer, materialid integer, componentsubtypeid integer, crosssectionid integer, "name" character varying(64) NOT NULL, description character varying(256) NOT NULL, modifiedbyname character varying(64) NOT NULL, -- Contains the values from the old engine ModifiedBy field, since they don't link with any user linkdiameter double precision NOT NULL DEFAULT 0, -- The diameter of the Link height double precision NOT NULL, width double precision NOT NULL, length double precision NOT NULL, roughness double precision NOT NULL, upstreaminvert double precision NOT NULL, upstreamloss double precision NOT NULL, downstreaminvert double precision NOT NULL, downstreamloss double precision NOT NULL, averageloss double precision NOT NULL, pressuremain double precision NOT NULL, flowtogauge double precision NOT NULL, cctvgrade double precision NOT NULL, installdate timestamp without time zone NOT NULL, whencreated timestamp without time zone NOT NULL, whenmodified timestamp without time zone NOT NULL, ismodelled boolean NOT NULL, isopen boolean NOT NULL, shapenative geometry, shapewgs84 geometry, CONSTRAINT nodelink_pk PRIMARY KEY (nodelinkid), CONSTRAINT componentsubtype_nodelink_fk FOREIGN KEY (componentsubtypeid) REFERENCES componentsubtype (componentsubtypeid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT crosssection_nodelink_fk FOREIGN KEY (crosssectionid) REFERENCES crosssection (crosssectionid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fromnode_nodelink_fk FOREIGN KEY (fromnodeid) REFERENCES node (nodeid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT material_nodelink_fk FOREIGN KEY (materialid) REFERENCES material (materialid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT tabulardataset_nodelink_fk FOREIGN KEY (tabulardatasetid) REFERENCES tabulardataset (tabulardatasetid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT tonode_nodelink_fk FOREIGN KEY (tonodeid) REFERENCES node (nodeid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT user_nodelink_fk FOREIGN KEY (modifiedbyid) REFERENCES awtuser (userid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT work_modellink_fk FOREIGN KEY (workid) REFERENCES "work" (workid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); ALTER TABLE nodelink OWNER TO postgres; COMMENT ON TABLE nodelink IS 'Contains all of the data that describes a line between any two nodes.'; COMMENT ON COLUMN nodelink.modifiedbyname IS 'Contains the values from the old engine' ModifiedBy field, since they don''t link with any user'; COMMENT ON COLUMN nodelink.linkdiameter IS 'The diameter of the Link'; 

I played a little with the select statement. If I just do "Select NodeLinkID from NodeLink", the request will be much faster - less than a second to get 88,000 rows. If I do "Select NodeLinkID generated from NodeLink", the request will take a long time - about 8 seconds. Does this shed light on what I'm doing wrong?


Other findings:

CREATE INDEX nodelink_lengthIDX on nodelink (length);

analyze nodelink

- Query execution: SELECT * FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query execution time: 3109 ms. Found 10,000 rows.

- Query execution: SELECT nodelinkID FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query execution time: 125 ms. Found 10,000 rows.

In MySQL, the first query is executed in approximately 120 ms, the second is executed in approximately 0.02ms.



Resolution of the issue:

Well guys, it looks like it was a storm in a cup ...

mjy is correct:

"How did you measure these timings - in your application or the corresponding command line interfaces?"

To test this theory, I put together a simple console application that ran the same query in MySQL db and the PGSQL database. Here is the result:

 Running MySQL query: [SELECT * FROM l_model_ldata] MySQL duration = [2.296875] Running PGSQL query: [SELECT * FROM nodelink] PGSQL duration = [2.875] 

Thus, the results are comparable. It seems that the pgadmin tool that comes with postgreSQL is pretty slow. Thanks to everyone for their suggestions and help!

mjy, if you want to post the answer, I can mark it as the correct answer for future reference.

+6
performance mysql postgresql
source share
6 answers

Here's a helpful post on setting up Postgres. It has definitions and a few tips.

This performance tuning article offers a pretty decent overview with some specific optimization methods.

+2
source share

Have you had any GIS features in MySQL? IIRC, this means that you used MyISAM, and not a transaction-compatible storage manager, which means that you really are not comparing apples to apples.

Also, is your application actually going to do this? Completely unqualified SELECT of all rows? If not, you better look at what you are actually going to do, which will probably include sentences of at least WHERE. (although this, of course, also cannot compare with a non-crashsafe non-transactional system)

+1
source share

PostgreSQL uses the MVCC architecture, which means that it uses a more complex format for storing data on disk than MySQL. It is slower in single access and faster in multi-user access.

a) check if your tables are cleared - see the VACUUM instruction b) use indexes. PostgreSQL has a larger repertoire of indexes, and then MySQL, so use it - there are GiST, GIN indexes.

0
source share

It looks like you are suffering from fragmentation. Do you have many updates without using a vacuum? Are you updating indexed columns so that HOT updates are not used?

What is the output of select relpages, reltuples from pg_class where relname='nodelink' . This will show you how many disk pages are stored in your tuples.

@Pavel: PostgreSQL, of course, is more flexible. indexes, but the index will not help in this case, since it selects everything in the table.

Many of the tables contain hundreds of thousands of rows, so I need to remember performance.

These are not particularly large tables ...

Is this what I can expect from PostgreSQL, or is there something I can do to make it better?

... so maybe you are doing something else wrong.

0
source share

If you have a table with hundreds (not more than hundreds of thousands) of records, what is the possible reason for executing a SELECT * FROM query? Perhaps you should think about what data you are requesting, and how you can only get the relevant rows from the database.

0
source share

This is a long way to go for a normal 100,000 row table, so I think PostGIS has a problem, not PostgreSQL. Try to get all rows without shapenative and shapewgs84 columns - if this is much faster, then PostGIS seems to be responsible for the slowdown.

0
source share

All Articles