Working with Large Datasets in Postgres
Let’s say you want to run a BIG query with gigabytes of results. How do you do it?
Plain Query
With a plain query, you load the entire result set into client RAM. If it’s even possible, you are either at risk of the Out-Of-Memory killer killing your process, or you are oversizing your clients with way more RAM than they normally need.
Server-Side Cursor (WITHOUT HOLD)
The first thing you find after searching for better ways of doing this is server-side cursors, which sounds like exactly what you need. You start using it in its default mode, WITHOUT HOLD
. Now you’ve got a nice cursor you can iterate through, only loading a portion of the result set into client RAM at once.
But because cursors made WITHOUT HOLD
only exist for the duration of the transaction, the transaction must be kept open. If the result set takes awhile to work through, the transaction might be open for several minutes or even hours.
In Postgres, rows affected by UPDATE
have an entirely new row added to replace the old one and rows affected by DELETE
are only marked for deletion at the time of the query. Later, it is the job of the autovacuum workers to actually remove those “dead” rows which are no longer visible to any transaction.
When a transaction is held open for a long time to use the cursor, the autovacuum workers will be unable to do their cleanup work. They will slowly scan over entire tables but only find “dead” rows that must remain visible for the long-running transaction. They end up constantly retrying, consuming entire cores only checking again and again until the transaction ends.
Because the vacuum operations cannot do the cleanup work, tables will become bloated with a high portion of rows being “dead”, taking up more space than needed and slowing down sequential access.
I’m sure they have their uses, but because a transaction must be held open and that has many negative effects, I avoid using cursors WITHOUT HOLD
.
Server-Side Cursor (WITH HOLD)
Since long-running transactions are bad news, what about creating the cursor WITH HOLD
and ending the transaction? I’m so glad you asked.
“In the current implementation, the rows represented by a [WITH HOLD] cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions.” — Postgres DECLARE
When the transaction ends with COMMIT
, the entire result set must be computed and written to disk, which may take some time. The high write IO may cause delayed read IO and a spike in SELECT
query latency.
With this method the cursor lifetime changes from being contained within the transaction to existing until either the CLOSE
command or the connection ends. With pooling and connection reuse, that may be a long time so it’s important that application code reliably sends the CLOSE
command to release cursor resources, even in the face of errors. It’s also not possible to monitor open cursors to verify CLOSE
is used.
Server-side cursors WITH HOLD
are a good option if you understand these caveats.
Client-Side Cursors / Multiple Queries
If you can tolerate data being added or removed from the result set by other clients, breaking up the large query into multiple small ones may be your best bet.
For example, if you’re doing a large scan on a table with a numeric id, consider using WHERE id > ? ORDER BY id LIMIT 100
, where the client keeps track of the last id
it saw. FETCH WITH TIES
should be used if the sort column has duplicates.
Another option is combining OFFSET
and LIMIT
to grab any part of a re-computed result set, but modifications to the underlying data set can cause either missed or duplicate rows to be sent to the client. If doing this you MUST use an ORDER BY
clause, otherwise you will get inconsistent results.
If you have very wide result rows, another trick you can use is retrieving the primary keys of the entire result set, then retrieving the rest of the row as necessary.
Conclusion
Overall, it’s best to change your access pattern so you can iteratively query the database for more data as needed. If that’s not possible, consider using server-side cursors WITH HOLD
.
Notes
Thank you to David Orr for reviewing this post.