Showing posts with label gpl. Show all posts
Showing posts with label gpl. Show all posts

14 Aug 2025

Testing PostgreSQL on Debian/Hurd: A Windows + QEMU Adventure

Curiosity often leads to the most interesting technical adventures. This time, I decided to explore something off the beaten path: running Debian GNU/Hurd inside a virtual machine on my Windows 11 host and compiling PostgreSQL from source.

This post is part 1 of a multi-part series documenting the process, challenges, and discoveries along the way. Future parts will dive deeper into advanced topics, automation, and ongoing compatibility work—so if you're interested in PostgreSQL, alternative operating systems, or open source testing, stay tuned!

What is Debian?
Debian is one of the oldest and most respected Linux distributions, known for its stability, vast software repositories, and commitment to free software principles. While most people associate Debian with the Linux kernel, it’s actually a complete operating system that can run on different kernels.

What is GNU/Hurd?
GNU/Hurd is an alternative kernel developed by the GNU Project. Unlike Linux, GNU/Hurd is built on a microkernel architecture (specifically GNU Mach), aiming for greater modularity and flexibility. While GNU/Hurd is still experimental and not as mature or widely used as Linux, it represents a fascinating approach to operating system design.

Debian GNU/Hurd combines the familiar Debian userland (tools, package management, etc.) with the GNU/Hurd kernel, offering a unique environment for open source enthusiasts and OS tinkerers.

My goal for this experiment was to see how far I could get with a modern database stack—specifically, compiling and running PostgreSQL—on this unusual platform.



Setting Up the VM

Instead of the CD image, I used the pre-built disk image available here. After downloading and extracting the .img file, I launched the VM with QEMU using the following command:

qemu-system-x86_64.exe -machine type=pc,accel=whpx,kernel-irqchip=off -boot d -m 4096 -usb -display default,show-cursor=on -drive file=".\debian-hurd-i386-20250807.img",cache=writeback

Explanation of the command:

  • qemu-system-x86_64.exe: Runs QEMU for 64-bit x86 systems (works for 32-bit guests too).
  • -machine type=pc,accel=whpx,kernel-irqchip=off: Specifies a PC-type machine, enables Windows Hypervisor Platform acceleration (WHPX), and disables kernel IRQ chip emulation for compatibility.
  • -boot d: Boots from the first hard disk.
  • -m 4096: Allocates 4GB of RAM to the VM.
  • -usb: Enables USB support.
  • -display default,show-cursor=on: Uses the default display and ensures the mouse cursor is visible.
  • -drive file=".\debian-hurd-i386-20250807.img",cache=writeback: Uses the extracted Hurd disk image as the hard drive and enables writeback caching for better disk performance.

This boots directly into the installed Debian/Hurd system with improved performance and usability on a Windows 11 host.

Preparing to Build PostgreSQL

Debian/hurd is minimal out of the box, so the first step was to install all the build tools and libraries required for compiling PostgreSQL:

sudo apt-get update
sudo apt-get install build-essential git libxml2-dev libxslt-dev autotools-dev automake libreadline-dev zlib1g-dev bison flex libssl-dev libpq-dev ccache

This command installs the compiler, linker, version control tools, XML and SSL libraries, autotools, and all other dependencies PostgreSQL may need for a successful build and test cycle.

Downloading and Compiling PostgreSQL

Instead of downloading a release tarball, I cloned the official PostgreSQL git repository and compiled the master branch:

git clone https://github.com/postgres/postgres.git
cd postgres
./configure --prefix=~/proj/localpg
make
make install

This approach ensures you're building the latest development version of PostgreSQL directly from source, and installs it locally to your user's ~/proj/localpg directory.

Setting Up the Database Cluster

PostgreSQL needs a data directory (cluster) to store its databases. Since the installation was local to my user, I simply initialized the cluster and started the server using the full path to the binaries (since they're not in my PATH):

~/proj/localpg/bin/initdb -D ~/proj/localpg/pgdata
~/proj/localpg/bin/pg_ctl -D ~/proj/localpg/pgdata -l logfile start

Connecting and Creating a Table

With the server running, I connected to the database and created a sample table:

~/proj/localpg/bin/psql -d postgres

Inside psql:

CREATE TABLE test_table (id SERIAL PRIMARY KEY, name TEXT);
INSERT INTO test_table (name) VALUES ('Hello from Debian/Hurd!');
SELECT * FROM test_table;

Example output:

CREATE TABLE
INSERT 0 1
 id |         name         
----+----------------------
  1 | Hello from Debian/Hurd!
(1 row)

Running the Test Suite

To ensure the build was solid, I went back to the source directory and ran:

cd ~/postgres
make check

This runs PostgreSQL's regression tests, verifying that the core features work as expected—even on Hurd. This ran mostly fine (except for a few tests that failed - more to be researched on that failure).

Quick QEMU Tip

When working with QEMU, remember that Ctrl-Alt-G is your friend—it releases the mouse and keyboard from the VM window, making it much easier to switch back to your host system.

Adding a Separate Volume for More Disk Space

The base Debian/Hurd image is quite small and can easily run out of space, especially when compiling large projects or running make check. I frequently hit disk full errors during testing.

Solution:

  1. Shutdown the VM.

  2. Resize the disk image:

    qemu-img resize debian-hurd-i386-20250807.img +10G
    

    This adds 10GB to the existing disk image.

  3. Restart the VM.

  4. Create a new partition:

    • Use fdisk /dev/hd0 (or the appropriate device) to create a new partition in the extra space.
  5. Format the new partition:

    mkfs.ext4 /dev/hd0s3
    

    (Note: On my setup, the original root partition was /dev/hd0s2, so the new partition created for extra space was /dev/hd0s3. Adjust the device name as needed for your configuration.)

    Although the root volume is of ext2 type (!!!), Debian/Hurd works fine with ext4—so feel free to use ext4 for the new partition.

  6. Mount the new volume:

    mkdir -p /mnt/newvol
    mount /dev/hd0s3 /mnt/newvol
    
  7. Grant non-root user access:

    • As root, change ownership:
      chown robins:robins /mnt/newvol
      
    • Now your non-root user (e.g., robins) can use /mnt/newvol for compiling PostgreSQL and running make check without running out of disk space.

Why use a non-root user for PostgreSQL? PostgreSQL is designed to run as a non-root user for security reasons. Running the database server or its tests as root can expose your system to unnecessary risks and may even cause certain operations to fail. Always use a dedicated non-root user for installation, testing, and day-to-day database operations.

This approach made it possible to complete the build and test cycle without disk space issues.

Final Thoughts

Running Debian/Hurd in a VM on Windows 11 was surprisingly smooth, though some packages and features are less mature than on Linux. Compiling PostgreSQL from scratch was a great way to explore the system's capabilities and compatibility. If you're looking for a fun, geeky weekend project, give Debian/Hurd a try!

Next Steps & What's Still Pending

This is only part 1 of a multi-part series. In future installments, I'll cover:

  • Setting up the PostgreSQL buildfarm for automated testing on Debian/Hurd
  • Deeper investigation into SMP/multi-core support (currently not working)
  • More QEMU optimization and compatibility testing
  • Additional performance tuning and disk management strategies
  • Troubleshooting Perl module installation issues (e.g., LWP::Protocol::https, LWP::Simple, Net::SSLeay), which currently fail to install—more research is needed to understand and resolve these problems.
  • Investigating why make check did not complete successfully (failed on a few tests)—this requires further research.

Some features, like multi-core support, full buildfarm integration, reliable Perl module installation, and passing all PostgreSQL regression tests, are not yet working or fully tested. These will be explored in detail in future posts. Stay tuned!

6 Aug 2025

Pi-hole Part 2: Going Fully Independent with Unbound

After my positive first impressions with Pi-hole, I decided to take the next logical step: eliminating my dependence on external DNS resolvers entirely. While Quad9 served me well, there's something unsettling about routing all my DNS queries through a third party, no matter how trustworthy they appear.

The solution? Unbound - a validating, recursive, caching DNS resolver that can operate completely independently, resolving domain names directly from authoritative sources without relying on upstream DNS providers.

Why Make the Switch?

My motivation goes beyond just privacy paranoia (though that's certainly part of it):

Privacy First: Every DNS query reveals your browsing patterns. Even with Quad9's privacy promises, I prefer keeping that data entirely within my network.

Security Enhancement: Unbound performs DNSSEC validation by default, ensuring the authenticity of DNS responses and protecting against DNS spoofing attacks.

Reduced Latency: While counterintuitive, eliminating the round-trip to external resolvers can actually improve response times for frequently accessed domains through better local caching.

True Independence: No more wondering about logging policies, data retention, or potential government requests to DNS providers.

The Downsides to Consider

Before diving in, it's worth acknowledging the trade-offs:

Increased Complexity: You're now responsible for maintaining and troubleshooting your own DNS infrastructure. When things break, you can't just blame your ISP's DNS servers.

Initial Query Delays: Cold cache scenarios will be slower as Unbound has to walk the entire DNS hierarchy from root servers. First visits to new domains will take noticeably longer.

Resource Usage: While minimal, you're now running an additional service that consumes memory and CPU cycles on your Pi.

Potential Connectivity Issues: If your Pi goes down, your entire network loses DNS resolution. External resolvers provide redundancy that you're giving up.

CDN Sub-optimization: Content delivery networks may not route you to the geographically closest servers, potentially affecting streaming and download performance. Many large DNS providers like Cloudflare and Google use Anycast networks, where the same IP address is announced from multiple geographic locations, automatically routing you to the nearest server. When you run your own recursive resolver, you lose this geographic optimization and might connect to CDN endpoints that are further away.

False Sense of Security: While your DNS queries are now private, it's important to remember that all your actual web traffic (HTTP/HTTPS requests) still flows through your ISP and is visible in their logs. You've privatized the "phone book lookup" but not the actual "conversation" - your ISP can still see which IP addresses you're connecting to, just not the domain names that resolved to them.

Maintenance Overhead: The root hints file should be updated periodically (though it changes infrequently), and you're responsible for keeping your Unbound configuration current and secure.

The Verdict: Benefits Outweigh the Costs

After weighing these trade-offs, the privacy and security benefits still made a compelling case for proceeding. The complexity is manageable for anyone comfortable with basic Linux administration, and the performance impacts are largely theoretical for typical home usage. Most importantly, the peace of mind from knowing exactly where my DNS queries go and how they're handled proved worth the additional overhead.

The Setup Process

Installing and configuring Unbound alongside Pi-hole turned out to be surprisingly straightforward.

Step 1: Install Unbound

sudo apt update
sudo apt install unbound -y

Step 2: Configure Unbound for Security and Performance

I created a custom configuration at /etc/unbound/unbound.conf.d/pi-hole.conf:

sudo nano /etc/unbound/unbound.conf.d/pi-hole.conf

Here's my security-focused configuration:

server:
    # Basic settings
    port: 5335
    do-ip4: yes
    do-ip6: yes
    do-udp: yes
    do-tcp: yes
    
    # Security settings
    trust-anchor-file: "/var/lib/unbound/root.key"
    auto-trust-anchor-file: "/var/lib/unbound/root.key"
    val-clean-additional: yes
    val-permissive-mode: no
    val-log-level: 1
    
    # Privacy settings
    hide-identity: yes
    hide-version: yes
    harden-glue: yes
    harden-dnssec-stripped: yes
    harden-below-nxdomain: yes
    harden-referral-path: yes
    use-caps-for-id: yes
    
    # Performance settings (security-first approach)
    cache-min-ttl: 300
    cache-max-ttl: 86400
    prefetch: yes
    prefetch-key: yes
    
    # Interface settings
    interface: 127.0.0.1
    access-control: 127.0.0.1/32 allow
    access-control: ::1 allow
    
    # Logging
    verbosity: 1
    log-queries: no
    log-replies: no
    
    # Root hints
    root-hints: "/var/lib/unbound/root.hints"

Step 3: Root Hints Management

Root hints are essential files that tell Unbound where to find the DNS root servers - the starting point for all DNS resolution. When installing Unbound via package manager, root hints are included and should be updated automatically through regular system updates.

Note: The root hints file changes infrequently (typically a few times per year when root servers are added, removed, or have IP changes). Rather than manual updates, it's best to keep your system updated through your package manager, which will handle root hints updates appropriately - roughly every 6 months is more than sufficient for most users.

Step 4: Initialize DNSSEC Root Key (Usually Optional)

Modern Unbound packages typically handle DNSSEC root key initialization automatically. However, if you encounter DNSSEC validation errors or want to ensure the key is properly configured, you can initialize it manually:

sudo unbound-anchor -a "/var/lib/unbound/root.key"
sudo chown unbound:unbound /var/lib/unbound/root.key

Note: If this step fails or the file already exists, it's likely already configured correctly by the package installation.

Step 5: Start and Enable Unbound

sudo systemctl enable unbound
sudo systemctl start unbound

Step 6: Test Unbound

Before integrating with Pi-hole, I verified Unbound was working with both valid and invalid domain lookups:

Testing with an existing domain:

robins@pi4:~ $ dig @127.0.0.1 -p 5335 google.com | egrep -w 'status|^google'
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58713
google.com.             269     IN      A       142.250.70.206

Testing with a non-existent domain:

dig @127.0.0.1 -p 5335 asdfasdsfasfdfsafasd | egrep -w 'status|flags'
robins@pi4:~ $ dig @127.0.0.1 -p 5335 asdfasdsfasfdfsafasd | egrep -w 'status|flags'
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 19691
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
; EDNS: version: 0, flags:; udp: 1232

Both tests confirm Unbound is working correctly. The existing domain test shows successful resolution with NOERROR status, while the non-existent domain test returns NXDOMAIN (Non-eXistent DOMAIN), which is the proper response when a domain doesn't exist. Note the ad flag in the non-existent domain response, indicating that even failed lookups are DNSSEC-validated - Unbound cryptographically verified that this domain truly doesn't exist rather than just accepting an unvalidated negative response.

Step 7: Configure Pi-hole to Use Unbound

In the Pi-hole admin interface:

  1. Navigate to SettingsDNS
  2. Uncheck all upstream DNS servers
  3. Add 127.0.0.1#5335 as a custom DNS server
  4. Enable DNSSEC validation
  5. Save settings

The Results: Worth the Effort

After 24 hours of operation with the new setup, the benefits are clear:

Complete Privacy: All DNS queries now resolve independently without touching external servers (except for the initial root server queries, which reveal no specific domain information).

DNSSEC Validation: Every response is cryptographically verified, providing protection against DNS manipulation.

Improved Cache Efficiency: Unbound's more sophisticated caching algorithm seems to provide better hit rates for our household's browsing patterns.



Performance Impact: Negligible. The Pi 4 handles both Pi-hole and Unbound without breaking a sweat:

robins@pi4:~ $ uptime
 21:22:25 up 2 days,  9:47,  3 users,  load average: 0.00, 0.00, 0.00

Security Considerations

My configuration prioritizes security over raw performance:

  • Conservative TTL settings: Shorter cache times mean more frequent validation
  • Strict DNSSEC validation: No permissive mode that might accept invalid responses
  • Minimal logging: No query logging to preserve privacy even locally
  • Restricted access: Only localhost can query Unbound directly

Final Thoughts

Setting up Unbound with Pi-hole represents the logical conclusion of taking control over your DNS infrastructure. While the privacy and security benefits are the primary motivators, the technical satisfaction of running a completely self-contained DNS resolution system is considerable.

The setup process is more involved than simply pointing Pi-hole at an external resolver, but the configuration is straightforward and well-documented. For anyone concerned about DNS privacy or wanting to reduce external dependencies, this combination provides an excellent solution.

The Pi 4 continues to prove itself as the perfect platform for this type of network infrastructure project - handling both Pi-hole and Unbound with resources to spare. As I mentioned in part 1, even a Pi 2 would likely suffice for most households, making this an accessible upgrade for anyone looking to enhance their home network's privacy and security posture.

5 Aug 2025

Pi-hole First Impressions: Home DNS Caching Done Right

(This is first of a 2 part series, where I explore pi-hole as a DNS caching solution-Here's Part 2)

I've had a Raspberry Pi 4 sitting idle for months, and finally decided to put it to good use by setting up Pi-hole for network-wide DNS caching and ad blocking. After less than a day of operation, I'm genuinely impressed with both the installation process and the results.

The Installation: Surprisingly Smooth

Setting up Pi-hole turned out to be one of the smoothest software installations I've experienced in recent memory. The entire process is handled by a single bash script:


curl -sSL https://install.pi-hole.net | bash


That's it! What impressed me most wasn't just the simplicity, but the thoroughness of the pre-installation checks. Before the script even attempted to install anything, it ran through a comprehensive validation process:

  • Network connectivity tests
  • DNS resolution verification
  • Package manager status checks
  • System requirements validation
  • Port availability confirmation

This rigorous validation gave me real confidence that the installation would succeed. Too often, installation scripts fail halfway through, leaving you with a partially configured mess. Pi-hole's approach of "check everything first, then install" meant that once the green lights were all showing, the installation proceeded flawlessly.

Interestingly, what took the most time during the entire setup wasn't Pi-hole itself, but my initial attempt to use Docker. I'd assumed Docker would be the quick and easy route (as it usually is), but the Pi had some leftover Docker installations that needed cleaning up first - removing old docker, docker.io, and docker-compose packages was a messy affair. Then the Pi seemed to have issues with the Docker installation process itself. In the end, I abandoned the Docker approach and went with the native installation, which ironically turned out to be much faster and cleaner.

One lesser-known tip that made the physical setup even smoother: if you're running a Google mesh network, the routers with both WAN and LAN ports can easily be used as wifi-to-LAN bridges. Rather than connecting the Pi 4 via WiFi, I simply plugged it into the LAN port of one of the mesh routers. This eliminated the need for additional network switches and reduced latency by a few milliseconds by avoiding the WiFi overhead entirely.

Another major convenience: you don't need to update DNS settings on each individual device. A single configuration change at the router level does the job for the entire household. In Google mesh, I simply changed the DNS setting from "Automatic" to "Custom" and set the Pi 4's IP as the primary DNS server. This immediately moved all devices on the network to use Pi-hole without touching any individual device settings. The only exceptions are devices that explicitly force their own DNS (looking at you, certain smart TVs and streaming devices), but those are relatively rare.

The Results: Better Than Expected

After 23 hours of operation protecting our home network, the statistics are eye-opening:



DNS Query Volume: 100,000 requests in 23 hours

  • That's roughly 4,350 DNS queries per hour from our household
  • Averages out to about 72 queries per minute
  • Shows just how chatty modern devices really are

Cache Performance: 70% cache hit rate

  • Pi-hole is successfully caching DNS responses locally
  • Reduces latency for frequently accessed domains
  • Decreases load on upstream DNS servers
  • Currently using Quad9 (9.9.9.9) as the upstream resolver - the only non-US DNS service I still trust in this day and age

Blocking Effectiveness: 8.5% of requests blocked

  • More than 1 in 12 DNS requests were for ad/tracking domains
  • That's 8,500 blocked requests in just 23 hours
  • Represents a significant reduction in unwanted network traffic

System Performance: Negligible Impact

One concern with running Pi-hole on a Pi 4 was whether it would impact system performance. For context, here are the system specs:


robins@pi4:~ $ uname -a
Linux pi4 6.12.34+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.34-1+rpt1~bookworm
  (2025-06-26) aarch64 GNU/Linux

robins@pi4:~ $ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       463Mi       1.7Gi        34Mi       1.6Gi       3.3Gi
Swap:          511Mi          0B       511Mi


The performance numbers are reassuring:

CPU Usage: Essentially zero load:


robins@pi4:~ $ uptime
 10:39:33 up 23:04,  3 users,  load average: 0.00, 0.00, 0.00

  • Zero computational overhead even after 23+ hours of operation
  • Pi 4 is more than capable of handling the workload
  • Plenty of headroom for additional services

Memory Usage: Minimal footprint

  • Pi-hole runs efficiently even on modest hardware
  • No noticeable impact on system responsiveness

Honestly, the Pi 4 seems like overkill for this task. Given the minimal resource requirements, a Pi 2 (with the older Cortex-A7) would likely handle Pi-hole just fine, leaving the Pi 4 (with the more capable Cortex-A72) free for more power-hungry side projects. The beauty of Pi-hole is that it's so lightweight, you can run it on practically any Raspberry Pi model and still have resources to spare.

The Web Interface: Polished and Responsive

Pi-hole's web dashboard deserves special mention. It's:

  • Fast and Responsive: Page loads are snappy, even on the Pi 4
  • Live Updates: The dashboard shows real-time query statistics
  • Well-Designed: Clean interface that makes data easy to understand
  • Comprehensive: Detailed logs, statistics, and configuration options



The live dashboard is particularly satisfying to watch - seeing blocked queries in real-time gives you a visceral sense of how much unwanted traffic Pi-hole is filtering out.

Real-World Benefits

Beyond the statistics, the practical benefits are noticeable:

  • Faster Browsing: Pages load quicker without ad network delays
  • Cleaner Experience: Websites feel less cluttered
  • Privacy Improvement: Reduced tracking across devices
  • Bandwidth Savings: Less unwanted traffic on the network

One particular benefit I'm looking forward to testing: my network occasionally suffers from stuttering ping latencies. Instead of the usual 10ms responses, I'll sometimes see 100ms or even 1-second ping times that persist for minutes. During these episodes, streaming mostly works and browsing barely functions, but what really hurts is DNS resolution - especially for websites requiring multiple round-trips. With Pi-hole's 70% cache hit rate, those problematic periods should be noticeably less laggy since most DNS queries won't need to traverse the struggling network connection. More testing needed, but the potential is promising.

The ad filtering aspect wasn't my primary motivation, but it doesn't hurt either. For websites I frequent that provide genuinely good content - like Phoronix with their constant stream of quality tech updates - I've set up pass-throughs so they can rightfully earn the advertising revenue they deserve for their work. For projects like Pi-hole itself, I prefer the direct contribution route via donations. It's about supporting the creators and maintainers who provide value, whether through allowing ads or direct financial support.

Worth Supporting

The Pi-hole project has created something genuinely useful that "just works" out of the box. The quality of both the software and the installation experience makes this a project worth supporting financially. I'll definitely be making a donation to help ensure continued development.

Final Thoughts

Pi-hole represents the best kind of open-source software: it solves a real problem elegantly, installs without drama, and delivers measurable benefits immediately. If you have a spare Raspberry Pi lying around, setting up Pi-hole is an excellent way to put it to work improving your entire network's performance and privacy.

The fact that it blocked 8,500 unwanted requests in less than a day while using virtually no system resources makes it a clear win. Sometimes the best technology is the kind you set up once and then forget about - until you look at the statistics and realize how much work it's quietly doing in the background.

16 Jun 2024

Compiling latest gcc to test more architectures

Off late, I've had two separate needs to compile GCC by hand and although my first foray into compiling gcc from git took patience, stumbling over the basics was interesting to say the least.

The first time I realised that an old GCC version could matter, was this feedback [1] that one of my buildfarm members was running an old (for its arch) gcc version, something that I almost never paid much attention to. The other being that that led me to newer architectures (more on that below) and how this could repeat itself if / when I end up playing with more architectures.

So finally, I can say I have a framework that frequently checks / recompiles gcc and ensures all my local tests are using the latest and the greatest gcc :) . (I am happy with how this has taken shape on my home server, and once I am able to port it to my other machines, don't see why this shouldn't land on github).

Now admittedly, compiling gcc on a nightly basis was already an overkill, but then what the heck - I went and did this hourly basis just because well-why-not. My personal ask was to:

  1. Incrementally learn how compiling gcc unfolds
  2. Have some fun scripting while at it
  3. ... but most importantly, see whether I could utilize this experience in other experiments where the idea is to forewarn database developers about upcoming changes.

A little more on point 3 above, I oversee a few machines on the postgres buildfarm and they differ in some aspects:

  1. Different archictures:
    1. aarch64: Gravitons
    2. x86-64: A vanilla off-the-shelf dell workstation
    3. armv7l - Raspberry Pi4
  2. Different GCCs:
    1. 8.3.0 (default in pi4)
    2. 7.3.1 (default on most ALs)
    3. 13.2 (default on Ubuntu)
    4. 14.0.1 (naive attempt at compiling whatever cleared make check)
    5. gcc (experimental nightly)
  3. (Internally I also run some different fuzzing workloads but that's besides the point)
  4. (Future plans - add some form of randomizer to test odd combinations of compilation flags, but more on that in an upcoming post)


Now if all goes to plan, I should also add to the mix, 2 new architectures. They wouldn't be the snappiest processors in the market (at least not in the pricing-level I am after), but hey it should be fun to play with!

  1. loongarch64 - (cough) Recent (but sure as butterflies, a promising) entrant - the MIPS64 Loongson has been around for some time now, but my interest has grown off late owing to sporadic reports that its becoming somewhat competitive, which should be interesting to review.
  2. riscv64 - Another interesting arch that should be fun to try out. Again, am not holding my breath that it'd top any charts, but could still end up being interesting nonetheless.

On the GCC front, getting the setup ready and stable, clears the path to now focus upgrading my buildfarm animals one-by-one and basically start focussing beyond this hurdle.

gcseb02 20240615_1700 - git checkout successful.
gcseb02 20240615_1700 - git pull successful.
gcseb02 20240615_1700 - No change in gcc version. Quitting.

gcsa36f 20240615_1800 - git checkout successful.
gcsa36f 20240615_1800 - git pull successful.
gcsa36f 20240615_1800 - gcc has changed - [471fb092601] vs [57af69d56e7]. Recompiling.
gcsa36f 20240615_1800 - make successful
gcsa36f 20240615_1800 - make install successful.
gcsa36f 20240615_1800 - gcc version string has changed from [15.0.0 20240615 (experimental) - 471fb092601] to [15.0.0 20240615 (experimental) - 57af69d56e7]

gcsf66a 20240615_1900 - git checkout successful.
gcsf66a 20240615_1900 - Unable to git pull. Are we connected? Quitting.
gcsf66a 20240615_1900 - git switched back to 57af69d56e7.

gcs629f 20240615_2000 - git checkout successful.
gcs629f 20240615_2000 - git pull successful.
gcs629f 20240615_2000 - gcc has changed - [57af69d56e7] vs [6762d5738b0]. Recompiling.
gcs629f 20240615_2000 - make successful
gcs629f 20240615_2000 - make install successful.
gcs629f 20240615_2000 - gcc version string has changed from [15.0.0 20240615 (experimental) - 57af69d56e7] to [15.0.0 20240615 (experimental) - 6762d5738b0]

.
.

gcsc115 20240616_0400 - git checkout successful.
gcsc115 20240616_0400 - git pull successful.
gcsc115 20240616_0400 - No change in gcc version. Quitting.

Reference

2. Compilation script source - https://github.com/robins/gcc_compile

28 Apr 2024

Boost Database Security: Restrict Users to Read Replicas

Only Allow Login to Read-Replicas and Standbys

When you're working with large databases in production, it is incredibly common to use read-replicas to improve performance. These read-replicas are a copy of your primary (main) database and let your applications offload read-heavy queries, which in-turn reduces strain on your primary database, effectively making the application faster and snappier.

Sometimes, you may want to restrict specific database users so they can connect ONLY to these read-replicas, and not to the primary database server. This can be tricky to implement, since any permissions configured for this use-case, whether on the user-level, the database level, the schema-level or even the table level would be quickly replicated to the read-replicas and thus would not work as expected.

This guide will show how to configure a database user to only login successfully on a read-replica. The only requirement is to enable the pg_tle extension [3] on your PostgreSQL database. This is simple to do on your Ubuntu based Laptop (see how to do that here [2]) or virtual-machines offered by your favourite cloud-provider. Furthermore, you could apply your login rules using Pl/PgSQL, PL/v8 or even PL/Rust - See here[1].

Why Restrict Access?

There are many good reasons for restricting users to read-replicas:

  • Performance: You can dedicate your primary database server to handling write operations (like updating data), ensuring those operations happen as fast as possible.

  • Reporting / Analytics: Production environments often have dedicated users for ancillary tasks, such as monitoring, reporting dashboards, read-only tenants etc. Restricting these database users to read-replica helpsreducing extra load on the primary database.

  • Security: In some cases, granting direct access to the primary database might be considered a security risk. Further, you may not be able to force login hygeine for all your database users, and then having a lockdown system to reject those database users to login to primary is crucial for application rollout.

Prerequisites

  • An existing PostgreSQL database instance with at least one read-replica.
    • You could also try this on your own Postgres database with pg_tle extension. Read here [2] for more on how to install pg_tle on your Ubuntu system.
  • Basic understanding of users and permissions within a database.

Steps

  1. Identify Target Database and Users: First we need to define how to implement the restriction. i.e. Which users (and database) are to be restricted to login only to read-replica. In the example below, we would restrict the user standby_only_user to only be able to login to Standbys / Read-Replicas on database prod_db.
psql <<SQL
  \c prod_db
  CREATE EXTENSION pg_tle;
SQL 
  1. Ensure that shared_preload_libraries is properly set to allow pg_tle. Also make sure that the pgtle.clientauth_db_name is appropriately set to the desired database (here prod_db):
cat <<EOL >> data/postgresql.conf
  shared_preload_libraries='pg_tle'
  pgtle.enable_clientauth=require
  pgtle.clientauth_db_name=prod_db
  pgtle.clientauth_users_to_skip=robins
  pgtle.clientauth_databases_to_skip=''
EOL
  1. Secret Sauce:

Next we create the key pg_tle function that restricts the user standby_only_user to login successfully only if this is a standby / read-replica:

SELECT pgtle.install_extension (
  'standbyusercheck',
  '1.0',
  'Allow some users to login only to standby / read-replicas',
$_pgtle_$
  CREATE SCHEMA standbycheck_schema;

  REVOKE ALL ON SCHEMA standbycheck_schema FROM PUBLIC;
  GRANT USAGE ON SCHEMA standbycheck_schema TO PUBLIC;

  CREATE OR REPLACE FUNCTION standbycheck_schema.standbycheck_hook(port pgtle.clientauth_port_subset, status integer)
  RETURNS void AS $$
    DECLARE
      is_standby bool := TRUE;
    BEGIN
      IF port.user_name = 'standby_only_user' THEN
        SELECT pg_is_in_recovery()
          INTO is_standby;
        IF is_standby THEN
          RAISE NOTICE 'User allowed to login';
        ELSE
          RAISE EXCEPTION 'User can only login to Standby / Read-Replicas';
        END IF;
      END IF;
    END
  $$ LANGUAGE plpgsql SECURITY DEFINER;

  GRANT EXECUTE ON FUNCTION standbycheck_schema.standbycheck_hook TO PUBLIC;
  SELECT pgtle.register_feature('standbycheck_schema.standbycheck_hook', 'clientauth');
  REVOKE ALL ON SCHEMA standbycheck_schema FROM PUBLIC;
$_pgtle_$
);

And now that the function is defined,CREATE EXTENSION would install the function and bind it to future login attempts.

CREATE EXTENSION standbyusercheck;
SHOW pgtle.clientauth_db_name;
  1. Test Connection:
  • Attempting to connect as a privileged user (here robins) to either of primary or read-replica should succeed.
Logging into Replica as robins
 login  | current_database | pg_is_in_recovery
--------+------------------+-------------------
 robins | prod_db          | t
(1 row)

Logging into Primary as robins
 login  | current_database | pg_is_in_recovery
--------+------------------+-------------------
 robins | prod_db          | f
(1 row)
  • However, the user standby_only_user should NOT be able to login to the primary.
Logging into Primary as standby_only_user
psql: error: connection to server at "localhost" (127.0.0.1), port 6432 failed: FATAL:  User can only login to Standby / Read-Replicas
  • While the user (standby_only_user) should only be able to login to any read-replica.
Logging into Replica as standby_only_user
       login       | current_database | pg_is_in_recovery
-------------------+------------------+-------------------
 standby_only_user | prod_db          | t
(1 row)

Other important aspects of this feature

  • You could force clientauth for all logins by setting the parameter pgtle.enable_clientauth = require

  • You could configure some users to always be allowed to login to either of Primary / Read-replica in cases of emergency, by adding that user to the pgtle.clientauth_users_to_skip. Ideally you would want your admin database roles to this list.

  • Orthogonally, you could configure some databases to always allow users to skip clientauth by setting the pgtle.clientauth_databases_to_skip feature.

  • Note, that both clientauth_databases_to_skip and clientauth_databases_to_skip can be utilised together. This is a good way to ensure that some set of database users (and some databases) are exempt from such a login restriction.

  • If pgtle.enable_clientauth is set to on or require and if the database mentioned in pgtle.clientauth_db_name is not configured correctly, postgres would complain with the messsage FATAL: pgtle.enable_clientauth is set to require, but pg_tle is not installed or there are no functions registered with the clientauth feature. This is a good engine check, helping us avoid basic misconfigurations.

  • If you're anticipating connection storms, you can also increase the workers (that would help enforce the login restriction) by setting the pgtle.clientauth_num_parallel_workers parameter to greater than 1.

Conclusion

By following the above steps, you've now successfully configured your PostgreSQL environment to restrict certain users to only login to the read-replicas. This helps not just optimize your database performance, but also bolster security.

Let me know if you'd like to explore more advanced scenarios or discuss IAM integration for fine-grained access control!

Reference

  1. Clientauth Hook Documentation - https://github.com/aws/pg_tle/blob/main/docs/04_hooks.md'
  2. Install pg_tle On Ubuntu - https://www.thatguyfromdelhi.com/2024/04/installing-pgtle-on-ubuntu-quick-guide.html
  3. Unlock PostgreSQL Super Powers with pg_tle - https://www.thatguyfromdelhi.com/2024/04/unlock-postgresql-superpowers-with-pgtle.html

20 Nov 2017

Update: RDS Prewarm script updated to fetch FSM / VM chunks

(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance)

This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run).

Key scenarios, where this is really helpful are:

  • Create a Read-Replica (or Hot Standby in Postgres terms)
  • Restore a new RDS Instance from a Snapshot


Update: The Script, now also does the following:

  • Now also fetches disk blocks related to FSM / VM of all tables
  • Now fetches all Indexes

Limitations that still exist:
  • TOAST tables are still directly inaccessible in RDS
    • Indexes for TOAST columns also fall under this category
    • Trying hard to see if this last hurdle can be worked around
      • Anyone with any ideas?!
  • Script needs to be run once per Database Owner
    • Not sure if there is any magic around this
      • Object ownership is a Postgres property
        • RDS Postgres does not give Superuser access
    • I'll try to ease this in the future
      • By creating a script to list the Users that this needs to run as
      • The other possibility is to use DBLink to run this for separate Users in a single run

I'll update here, in case I make any significant changes.

Sample Run



-[ RECORD 1 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:40:08.291891-05
table_size          | 13 GB
freespace_map_size  | 3240 kB
visibility_map_size | 408 kB
blocks_prefetched   | 1639801
current_database    | pgbench
schema_name         | public
table_name          | pgbench_accounts
-[ RECORD 2 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:43:37.703711-05
table_size          | 2142 MB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 274194
current_database    | pgbench
schema_name         | public
table_name          | pgbench_accounts_pkey
-[ RECORD 3 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.899115-05
table_size          | 440 kB
freespace_map_size  | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched   | 59
current_database    | pgbench
schema_name         | public
table_name          | pgbench_tellers
-[ RECORD 4 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.901088-05
table_size          | 240 kB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 30
current_database    | pgbench
schema_name         | public
table_name          | pgbench_tellers_pkey
-[ RECORD 5 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.905107-05
table_size          | 40 kB
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 5
current_database    | pgbench
schema_name         | public
table_name          | pgbench_branches_pkey
-[ RECORD 6 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.907089-05
table_size          | 40 kB
freespace_map_size  | 24 kB
visibility_map_size | 8192 bytes
blocks_prefetched   | 9
current_database    | pgbench
schema_name         | public
table_name          | pgbench_branches
-[ RECORD 7 ]-------+------------------------------
clock_timestamp     | 2017-11-19 15:44:12.907142-05
table_size          | 0 bytes
freespace_map_size  | 0 bytes
visibility_map_size | 0 bytes
blocks_prefetched   | 0
current_database    | pgbench
schema_name         | public
table_name          | pgbench_history

14 Oct 2017

First alpha release of PsqlForks - Menon

Primer: PsqlForks aims to support all DB Engines that (even partially) speak Postgres (psqlforks = psql for Postgres forks).

Given that PsqlForks has been in development for a few weeks, it's time to stabilize a bit and towards that, we finally have Menon, PsqlForks first Alpha Release. Being an alpha, by definition it isn't ready for production, but it feels stable enough ... feel free to test it out!

Importantly, this fork is synced with postgres/master regularly, and should ideally sport all recent psql developments. Further, I am not a C expert and am just barely comprehending Postgres, so let me know of any 18-wheelers that I didn't see.

The release title - 'Menon', is a common sub-Caste in South-Indian state of Kerala. Selecting this nomenclature emanates from the idea of popularizing (heh!) common names and places from Kerala... and that it doesn't hurt to have an identifiable name (and while at it, add character) to a Release :) 

This release includes: 

  • Decent support for Redshift:
    • SQL tab completion for Redshift related variations
    • \d etc. now support Redshift specifics - ENCODINGs / SORTKEYs / DISTKEY / COMPRESSION etc.
    • Support Temporary Credentials using IAM Authentication (via AWS CLI)
    • View detailed progress here.
  • Basic support / Recognition semantics for:
    • CockroachDB - view progress here
    • PipelineDB
    • PgBouncer
    • RDS PostgreSQL
You could read more here:

For the interested:

Testing PostgreSQL on Debian/Hurd: A Windows + QEMU Adventure

Curiosity often leads to the most interesting technical adventures. This time, I decided to explore something off the beaten path: running D...