That Guy From Delhi: workaround

Showing posts with label workaround. Show all posts

28 Apr 2024

Boost Database Security: Restrict Users to Read Replicas

When you're working with large databases in production, it is incredibly common to use read-replicas to improve performance. These read-replicas are a copy of your primary (main) database and let your applications offload read-heavy queries, which in-turn reduces strain on your primary database, effectively making the application faster and snappier.

Sometimes, you may want to restrict specific database users so they can connect ONLY to these read-replicas, and not to the primary database server. This can be tricky to implement, since any permissions configured for this use-case, whether on the user-level, the database level, the schema-level or even the table level would be quickly replicated to the read-replicas and thus would not work as expected.

This guide will show how to configure a database user to only login successfully on a read-replica. The only requirement is to enable the pg_tle extension [3] on your PostgreSQL database. This is simple to do on your Ubuntu based Laptop (see how to do that here [2]) or virtual-machines offered by your favourite cloud-provider. Furthermore, you could apply your login rules using Pl/PgSQL, PL/v8 or even PL/Rust - See here[1].

Why Restrict Access?

There are many good reasons for restricting users to read-replicas:

Performance: You can dedicate your primary database server to handling write operations (like updating data), ensuring those operations happen as fast as possible.
Reporting / Analytics: Production environments often have dedicated users for ancillary tasks, such as monitoring, reporting dashboards, read-only tenants etc. Restricting these database users to read-replica helpsreducing extra load on the primary database.
Security: In some cases, granting direct access to the primary database might be considered a security risk. Further, you may not be able to force login hygeine for all your database users, and then having a lockdown system to reject those database users to login to primary is crucial for application rollout.

Prerequisites

An existing PostgreSQL database instance with at least one read-replica.
- You could also try this on your own Postgres database with pg_tle extension. Read here [2] for more on how to install pg_tle on your Ubuntu system.
Basic understanding of users and permissions within a database.

Steps

Identify Target Database and Users: First we need to define how to implement the restriction. i.e. Which users (and database) are to be restricted to login only to read-replica. In the example below, we would restrict the user standby_only_user to only be able to login to Standbys / Read-Replicas on database prod_db.

psql <<SQL
  \c prod_db
  CREATE EXTENSION pg_tle;
SQL

Ensure that shared_preload_libraries is properly set to allow pg_tle. Also make sure that the pgtle.clientauth_db_name is appropriately set to the desired database (here prod_db):

cat <<EOL >> data/postgresql.conf
  shared_preload_libraries='pg_tle'
  pgtle.enable_clientauth=require
  pgtle.clientauth_db_name=prod_db
  pgtle.clientauth_users_to_skip=robins
  pgtle.clientauth_databases_to_skip=''
EOL

Secret Sauce:

Next we create the key pg_tle function that restricts the user standby_only_user to login successfully only if this is a standby / read-replica:

SELECT pgtle.install_extension (
  'standbyusercheck',
  '1.0',
  'Allow some users to login only to standby / read-replicas',
$_pgtle_$
  CREATE SCHEMA standbycheck_schema;

  REVOKE ALL ON SCHEMA standbycheck_schema FROM PUBLIC;
  GRANT USAGE ON SCHEMA standbycheck_schema TO PUBLIC;

  CREATE OR REPLACE FUNCTION standbycheck_schema.standbycheck_hook(port pgtle.clientauth_port_subset, status integer)
  RETURNS void AS $$
    DECLARE
      is_standby bool := TRUE;
    BEGIN
      IF port.user_name = 'standby_only_user' THEN
        SELECT pg_is_in_recovery()
          INTO is_standby;
        IF is_standby THEN
          RAISE NOTICE 'User allowed to login';
        ELSE
          RAISE EXCEPTION 'User can only login to Standby / Read-Replicas';
        END IF;
      END IF;
    END
  $$ LANGUAGE plpgsql SECURITY DEFINER;

  GRANT EXECUTE ON FUNCTION standbycheck_schema.standbycheck_hook TO PUBLIC;
  SELECT pgtle.register_feature('standbycheck_schema.standbycheck_hook', 'clientauth');
  REVOKE ALL ON SCHEMA standbycheck_schema FROM PUBLIC;
$_pgtle_$
);

And now that the function is defined,CREATE EXTENSION would install the function and bind it to future login attempts.

CREATE EXTENSION standbyusercheck;
SHOW pgtle.clientauth_db_name;

Test Connection:

Attempting to connect as a privileged user (here robins) to either of primary or read-replica should succeed.

Logging into Replica as robins
 login  | current_database | pg_is_in_recovery
--------+------------------+-------------------
 robins | prod_db          | t
(1 row)

Logging into Primary as robins
 login  | current_database | pg_is_in_recovery
--------+------------------+-------------------
 robins | prod_db          | f
(1 row)

However, the user standby_only_user should NOT be able to login to the primary.

Logging into Primary as standby_only_user
psql: error: connection to server at "localhost" (127.0.0.1), port 6432 failed: FATAL:  User can only login to Standby / Read-Replicas

While the user (standby_only_user) should only be able to login to any read-replica.

Logging into Replica as standby_only_user
       login       | current_database | pg_is_in_recovery
-------------------+------------------+-------------------
 standby_only_user | prod_db          | t
(1 row)

Other important aspects of this feature

You could force clientauth for all logins by setting the parameter pgtle.enable_clientauth = require
You could configure some users to always be allowed to login to either of Primary / Read-replica in cases of emergency, by adding that user to the pgtle.clientauth_users_to_skip. Ideally you would want your admin database roles to this list.
Orthogonally, you could configure some databases to always allow users to skip clientauth by setting the pgtle.clientauth_databases_to_skip feature.
Note, that both clientauth_databases_to_skip and clientauth_databases_to_skip can be utilised together. This is a good way to ensure that some set of database users (and some databases) are exempt from such a login restriction.
If pgtle.enable_clientauth is set to on or require and if the database mentioned in pgtle.clientauth_db_name is not configured correctly, postgres would complain with the messsage FATAL: pgtle.enable_clientauth is set to require, but pg_tle is not installed or there are no functions registered with the clientauth feature. This is a good engine check, helping us avoid basic misconfigurations.
If you're anticipating connection storms, you can also increase the workers (that would help enforce the login restriction) by setting the pgtle.clientauth_num_parallel_workers parameter to greater than 1.

Conclusion

By following the above steps, you've now successfully configured your PostgreSQL environment to restrict certain users to only login to the read-replicas. This helps not just optimize your database performance, but also bolster security.

Let me know if you'd like to explore more advanced scenarios or discuss IAM integration for fine-grained access control!

Reference

Clientauth Hook Documentation - https://github.com/aws/pg_tle/blob/main/docs/04_hooks.md'
Install pg_tle On Ubuntu - https://www.thatguyfromdelhi.com/2024/04/installing-pgtle-on-ubuntu-quick-guide.html
Unlock PostgreSQL Super Powers with pg_tle - https://www.thatguyfromdelhi.com/2024/04/unlock-postgresql-superpowers-with-pgtle.html

12 Aug 2017

Redshift support for psql

Am sure you know that psql doesn't go out of it's way to support Postgres' forks natively. I obviously understand the reasoning, which allowed me to find a gap that I could fill here.

The existing features (in psql) that work with any Postgres fork (like Redshift) are entirely because it is a fork of Postgres. Since I use psql heavily at work, last week I decided to begin maintaining a Postgres fork that better supports (Postgres forks, but initially) Redshift. As always, unless explicitly mentioned, this is entirely an unofficial effort.

The 'redshift' branch of this Postgres code-base, is aimed at supporting Redshift in many ways:

Support Redshift related artifacts

Redshift specific SQL Commands / variations
Redshift Libraries

Support AWS specific artifacts

For e.g. AWS Regions

Support Redshift specific changes

For e.g. "/d table" etc.

The idea is:

Maintain this branch for the long-term

At least as long as I have an accessible Redshift cluster

Down the line look at whether other Postgres forks (for e.g. RDS Postgres) need such special attention

Although nothing much stands out yet

Except some rare exceptions like this or this, which do need to go through an arduous long wait / process of refinement.

Change the default port to 5439 (or whatever the flavour supports)

...with an evil grin ;)

Additionally, as far as possible:

Keep submitting Postgres related patches back to Postgres master
Keep this branch up to date with Postgres master

Update (31st August 2017)

Currently this branch supports most Redshift specific SQL commands such as

CREATE LIBRARY
CREATE TABLE (DISTKEY / DISTSTYLE / ...)
Returns non-SQL items like

ENCODINGs (a.k.a. Compressions like ZSTD / LZO etc )
REGIONs (for e.g. US-EAST-1 etc.)

Of course some complex variants (for e.g. GRANT SELECT, UPDATE ON ALL TABLES IN SCHEMA TO GROUP xxx ) don't automatically come up with tab-complete feature. This is primarily because psql's tab-complete feature isn't very powerful to cater to all such scenarios which in turn is because psql's auto-complete isn't a full-fledged parser to begin with.
In a nutshell, this branch is now in a pretty good shape to auto-complete the most common Redshift specific SQL Syntax.
The best part is that this still merges perfectly with Postgres mainline!

Let me know if you find anything that needs inclusion, or if I missed something.

====================================

$ psql -U redshift_user -h localhost -E -p 5439 db
psql (client-version:11devel, server-version:8.0.2, engine:redshift)
Type "help" for help.

db=#

3 Aug 2017

Reducing Wires

Recently got an additional monitor for my workstation@home and found that the following wires were indispensable:

USB Mouse
Monitor VGA / HDMI / DVI cable
USB Hub cable (Pen Drive etc.)

I was lucky that this ($20 + used) Dell monitor was an awesome buy since it came with a Monitor USB Hub (besides other goodies such as vertical rotate etc).

After a bit of rejigging, this is how things finally panned-out:

1 USB Wire (from the laptop) for the MUH (Monitor USB Hub)

This is usually something like this.

Use a USB->DVI converter and use that to connect MUH -> Monitor DVI port

This is usually something like this.

Plug USB Mouse to MUH
With things working so well, I also plugged a Wireless Touchpad dongle to the MUH

So now when I need to do some office work, connecting 1 USB wire gets me up and running!

#LoveOneWires :)

Now only if I could find a stable / foolproof Wireless solution here ;)

21 Jul 2017

Using generate_series() in Redshift

Considering that Redshift clearly states that it doesn't support (the commonly used postgres function) generate_series(), it gets very frustrating if you just want to fill a table with a lot of rows and can't without a valid data-source.

Solution (Generates a billion integers on my test-cluster):

--INSERT INTO tbl
WITH x AS (
SELECT 1
FROM stl_connection_log a, stl_connection_log b, stl_connection_log c
-- LIMIT 100
)
SELECT row_number() over (order by 1) FROM x;

For a Redshift server with even a basic level of login activity, this should generate enough rows. For e.g. On my test cluster, where I am the only user, this currently generates 4034866688 (4 billion) rows :) !

Interestingly, irrespective of the document, generate_series() actually does work on Redshift:

# select b from generate_series(1,3) as a(b);
┌───┐
│ b │
├───┤
│ 1 │
│ 2 │
│ 3 │
└───┘
(3 rows)

The reason why this wouldn't let you insert any rows to your table though, is that this is a Leader-Node-Only function, whereas INSERTs (on any non-single Redshift Cluster) are run on the Compute Nodes (which don't know about this function).

The reason why the above works, is ROW_NUMBER() and CROSS JOIN allow us to generate a large number of rows, but for that, the initial data-set (here the STL_CONNECTION_LOG System Table) should have at least some rows to multiply on! You could use any other system table (that is available on Compute Nodes) if required, for some other purpose.

Play On!