▲You Should Add Debug Views to Your DBchrispenner.ca

45 points by ezekg 4 days ago | 17 comments

ericHosick 1 hours ago [-]

To help understand why a complex query with many joins comes back empty, I like to provide helper debug functions that builds the query step by step. Each block checks one prerequisite and tells the user what’s missing:

```sql FUNCTION debug_user(user_id):

  IF NOT (SELECT FROM user WHERE user.user_id = user_id) THEN
    -- user doesn’t exist
  END IF;

  IF NOT (SELECT FROM user
           JOIN user_addr ON user_addr.user_id = user.user_id
           WHERE user.user_id = user_id) THEN
    -- user has no address
  END IF;

  -- keep adding joins until you find the break

```

because when you have a query that involves 6 relations, and you don't get results, it is nice to know why.

FrancoisBosun 23 minutes ago [-]

I do the "same thing", but using PostgreSQL `EXPLAIN ANALYZE`. EXPLAIN ANALYZE has information about the number of rows returned, which means I know exactly which node in the query plan failed to return expected values.

ericHosick 11 minutes ago [-]

don't mean to push the goalpost (didn't mention the following in the above post)

these functions can also be exposed in admin ui's making it easier for the general admin users/support users.

another very useful use case is when row level security (RLS) is enabled.

iTokio 22 minutes ago [-]

You can can also run a detailed explain plan and with a suitable visualizer it’s trivial to see where the query fell apart. You get to see the actual data distribution this way.

jameshart 35 minutes ago [-]

Counterpoint: you should not be connecting to your production database, you should not be running non-critical queries on production database servers, and you probably shouldn’t have permission to see all this data about your users.

Obviously your mileage may vary, your scale is your own, your trade offs are your own trade offs.

But be aware that there comes an operational scale where this is not an acceptable way - operationally, legally, privacy-wise - to investigate customer issues, and you’ll need different tricks.

fifilura 1 hours ago [-]

If you build your queries with CTEs, it is easy to pull them out in a console and query each step separately to find the issue.

al3rez 3 hours ago [-]

I implemented something similar for a 3M/day ad tech platform. We created a few materialized views for my boss, myself, and the DevOps team to monitor, instead of querying over 100 tables. I used stored procedures in PostgreSQL, which made it fast, efficient, and non-blocking also not to mention we avoided setting up complex Grafana/UI in admin dashboard, boss: i want to know x, y ,z, okay -> tableplus, -> export csv boom!

ndriscoll 4 hours ago [-]

The next logical step is to realize that using these views to drive your actual application has great potential to both vastly simplify the logic and make it super high performance. You can even take this so far as to make a view that spits out XML (or I guess json): you basically get SSR for free with easy to inspect/compose queries.

jerf 4 hours ago [-]

"Obviously querying over project_shorthand = '@unison/cloud' isn't going to be able to use an index, so isn't going to be the most performant query"

If you know you're going to be querying on username and project shorthand a lot you're just a handful more lines away from instantiating that as a database function which can take your shorthand apart with string functions and get back to a high-performance query. Functions can return rowsets which you can then further filter with WHERE clauses.

Although in that case I think I'd just take the first and second parameters separately anyhow, since

    debug_contribution("sophie", "@unison/cloud")

and

    debug_contributions("sophie", "unison", "cloud")

isn't a big enough difference to be worth writing string-parsing code. But do as you like.

ibejoeb 3 hours ago [-]

If you're using it regularly, you can make it a derived column and index it in most RDBSs. Then it'll work with predicate push-down and you can do other fancy things like store it on alternative partitions to optimize block reads when you're not using it.

The other obvious benefit is that it is no different in semantics than any other column, so there's no advance knowledge necessary for other users vs using a function.

noisy_boy 2 hours ago [-]

I was very confused about the term "Debug" view. We just called them views and used them normally. The main challenge was predicate pushdown which, though usually worked, was somewhat dependent on how smart the optimizer was. Also, things could get tricky when you start building views on top of views. Soon you are not certain why a simple looking query selecting from a simple looking view became slow until you see that it is views all the way down. As with most things, balance is the key.

4 hours ago [-]

sodapopcan 4 hours ago [-]

I get the sentiment but it is an odd one that we'd consider adding a name to a list of tables (that can be sorted and filtered) that we generally don't look at as much as our code* as more clutter than adding a SQL to our codebase.

* I realize this is not true for everyone

imACalorie12345 2 hours ago [-]

I do the same but it typically only stays my personal sphere or documented until you get to something like retool/admin ui as a common place.

2 hours ago [-]

Ozzie_osman 4 hours ago [-]

Or write the query as a function in code, that way, everyone can know it exists, call it easily, and change it along with any other changes to your logic or schema.

vincekerrazzi 4 hours ago [-]

I agree. So I did. And over the span of a year I was the only one that used them. ¯\_(ツ)_/¯

QuantumSeed 2 hours ago [-]

I address that problem by scheduling a brown bag lunch and inviting devs from related teams to join in so I can present my cool new tools and techniques to everyone. Sometimes it's wasted effort, but sometimes it does result in wider usage

morkalork 3 hours ago [-]

This plus some automated assertion checks on whatever invariants you can think of makes for a very cheap and scalable canary.

See an alert? Just query the view and you should have all the client/job/whatever IDs you need to trace the problem.

Loading comments...