def mirrors my experience too. Vast majority of spark jobs were easily ported to sql/dbt and the remaining ones are in pyspark. I used to use a lot of scala spark in backend data processing in 2016 but now its almost down to zero.
scala is real big impediment to making data processing accessible to general public in your company. order of preference now at my company is,
I've rolled out Scala based Spark interfaces to non-programmers in Databricks notebooks, so it's definitely possible, but only if you stick with the basic language features.
I think Scala Spark (using 10% of the language features) is a better technical decision (because it provides huge benefits like fat JARs, shading, better text editor support, etc), but the worse overall choice for most organizations because people are generally terrified of Scala.
They'd rather do nothing than write Scala code. I can empathize with their position.
> scala is real big impediment to making data processing accessible to general public in your company
Ding Ding Ding! Presto/Athena now is becoming huge in BI ecosystem. We don't really use Spark for ad-hoc BI anymore, we use it for data science and large repetitive workload.
scala is real big impediment to making data processing accessible to general public in your company. order of preference now at my company is,
1. sql 2. pyspark 3. java spark 4. scala spark
eg: shopify found that 70% of their pyspark could be converted to just sql https://shopify.engineering/build-production-grade-workflow-...