You plan to use a Fabric notebook and PySpark to read sales data and save the data as a Delta table named Sales. The table must be partitioned by Year and Quarter. You load the sales data to a DataFrame named df that contains a Year column and a Quarter column. Which option describes the correct approach to write this as a partitioned Delta table?

Prepare for the DP-600 Fabric Analytics Engineer Exam. Test your knowledge with multiple choice questions and detailed explanations. Gear up for your success now!

Multiple Choice

You plan to use a Fabric notebook and PySpark to read sales data and save the data as a Delta table named Sales. The table must be partitioned by Year and Quarter. You load the sales data to a DataFrame named df that contains a Year column and a Quarter column. Which option describes the correct approach to write this as a partitioned Delta table?

When writing a Delta table, you can improve query performance by partitioning the data on the columns you commonly filter on. By partitioning on Year and Quarter, Spark will organize the data into subdirectories for each year-quarter combination, enabling partition pruning and faster scans.

To achieve this with PySpark in Fabric, write the DataFrame using Delta format and specify the partitioning columns, then save as a table named Sales. For example:

df.write.partitionBy("Year", "Quarter").format("delta").saveAsTable("Sales")

This approach guarantees the data is stored as a Delta table and physically partitioned by Year and Quarter, meeting the requirement. Writing as Parquet would not create a Delta table, and partitioning without Delta formatting wouldn’t satisfy the need for a Delta table. Saving without partitioning would miss the required optimization.

Prepare for the DP-600 Fabric Analytics Engineer Exam. Test your knowledge with multiple choice questions and detailed explanations. Gear up for your success now!

Get the latest from Examzify