MySQL
You'll get the best performance if you forget the where clause and place all conditions in the ON expression.
I think this is because the query first has to join the tables and then runs the where clause on that, so if you can reduce what is required to join then that's the fasted way to get the results/do the update.
Example
Scenario
You have a table of users. They can log in using their username
or email
or phone_number
. These accounts can be active (1) or inactive (0). This table has 50000 rows
You then have a table of users to disable (blacklist) at one go because you find out they've all done something bad.
A script runs that checks for users in the blacklist_users
table and disables them in the users
table.
This blacklist_users
table however, only has one column with usernames
, emails
and account numbers
all mixed together.
The blacklist_users
table also has a "has_run" indicator which needs to be set to 1 (true) when it has been run so it can be skipped in future queries.
So if you have a WHERE clause here, internally, the results are getting brought back in the join and then the WHERE clause is being queried against that dataset. Instead, we can move all the where clause conditions into the join, and internally, remove the second query completely.
Therefore, this is the most optimal query to avoid needless lookups of the users
table...
Query
UPDATE users User
INNER JOIN
blacklist_users BlacklistUser
ON
(
User.username = BlacklistUser.account_ref
OR
User.email = BlacklistedUser.account_ref
OR
User.phone_number = BlacklistUser.account_ref
AND
User.is_active = 1
AND
BlacklistUser.has_run = 0
)
SET
User.is_active = 0,
BlacklistUser.has_run = 1;
Reasoning
If we had to join on just the OR conditions it would essentially need to check each row 4 times (1 for email
, 1 for phone_number
, 1 for username
, 1 for is_active
) to see if it should join, and potentially return a lot more rows. However, by giving it more conditions it can "skip" every row we're not updating.
Bonus
It's more readable. All the conditions are in one place and the rows to update are in another place.
I hope all that makes sense.