Exploring the Spark Connect gRPC API more
Goal of this post
In this post we will continue looking at the gRPC API and the AnalyzePlan
method which takes a plan and analyzes it. To be honest I expected this to be longer but decided just to do the AnalyzePlan
method. There are a few more API’s like ReleaseExecute
, InterruptAsync
, and ReattachExecute
that I was going to cover but changed my mind so consider this part of the last post :).
AnalyzePlan
This call is fairly self explanatory and easy to call, we pass it a plan and it returns the analyzed version of it:
var channel = GrpcChannel.ForAddress("http://localhost:15002", new GrpcChannelOptions(){});
await channel.ConnectAsync();
var client = new SparkConnectService.SparkConnectServiceClient(channel);
var sessionId = Guid.NewGuid().ToString();
var response = client.AnalyzePlan(new AnalyzePlanRequest()
{
ClientType = ".NET Awesome",
SessionId = sessionId,
Explain = new AnalyzePlanRequest.Types.Explain()
{
ExplainMode = AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Extended,
Plan = new Plan()
{
Root = new Relation()
{
ShowString = new ShowString()
{
Input = new Relation()
{
Range = new Range()
{
Start = 0, End = 100, Step = 2, NumPartitions = 1
}
}
}
}
}
}
});
Console.WriteLine(response.Explain.ExplainString);
When running produces:
== Parsed Logical Plan ==
LocalRelation [show_string#76]
== Analyzed Logical Plan ==
show_string: string
LocalRelation [show_string#76]
== Optimized Logical Plan ==
LocalRelation [show_string#76]
== Physical Plan ==
LocalTableScan [show_string#76]
Obviously the more complicated plan you pass in, the more data you will see. There are a few different types of plans that we can get back, controlled by the ExplainMode
which can be any of:
AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Simple
AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Extended
AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Codegen
AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Cost
AnalyzePlanRequest.Types.Explain.Types.ExplainMode.Formatted