Exploring the Spark Connect gRPC API more

All Spark Connect Posts

Code

Goal of this post

In this post we will continue looking at the gRPC API and the AnalyzePlan method which takes a plan and analyzes it. To be honest I expected this to be longer but decided just to do the AnalyzePlan method. There are a few more API’s like ReleaseExecute, InterruptAsync, and ReattachExecute that I was going to cover but changed my mind so consider this part of the last post :).

AnalyzePlan

This call is fairly self explanatory and easy to call, we pass it a plan and it returns the analyzed version of it:

var channel = GrpcChannel.ForAddress("http://localhost:15002", new GrpcChannelOptions(){});
await channel.ConnectAsync();

var client = new SparkConnectService.SparkConnectServiceClient(channel);
var sessionId = Guid.NewGuid().ToString();

var response = client.AnalyzePlan(new AnalyzePlanRequest()
{
    ClientType = ".NET Awesome",
    SessionId = sessionId,
    Explain = new AnalyzePlanRequest.Types.Explain()
    {
        ExplainMode = AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Extended,
        Plan = new Plan()
        {
            Root = new Relation()
            {
                ShowString = new ShowString()
                {
                    Input = new Relation()
                    {
                        Range = new Range()
                        {
                            Start = 0, End = 100, Step = 2, NumPartitions = 1
                        }
                    }
                }
            }
        }
    }
});

Console.WriteLine(response.Explain.ExplainString);

When running produces:

== Parsed Logical Plan ==
LocalRelation [show_string#76]

== Analyzed Logical Plan ==
show_string: string
LocalRelation [show_string#76]

== Optimized Logical Plan ==
LocalRelation [show_string#76]

== Physical Plan ==
LocalTableScan [show_string#76]

Obviously the more complicated plan you pass in, the more data you will see. There are a few different types of plans that we can get back, controlled by the ExplainMode which can be any of:

  • AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Simple
  • AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Extended
  • AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Codegen
  • AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Cost
  • AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Formatted