Nice article – repartitioning and shuffling are always favorite topics for Spark interviews😄
One doubt I always had is: what happens when coalesce() is used to (mistakenly) INCREASE the number of partitions instead?
Given that this is not recommended, I would expect for the Catalyst Optimizer to intervene by converting coalesce() into repartition() while generating a Logical Plan. If it wasn’t the case, coalesce() would in fact behave as a wide transformation in this specific case.
Let me know if you can shed some light on this :D