Skip to main content
Back to Blog

OpenAI API Protocol Complete Guide - The Universal Standard for LLM Application Development

Introduction

“The quality of your prompt determines the model’s output, but whether the model outputs correctly and how good the output is also depends on other parameters.”

This is a common confusion for many developers new to AI application development. Everyone knows the importance of writing good prompts, but understanding the various parameters in API calls remains a mystery.

In this article, I’ll guide you through the OpenAI API protocol in detail - the standard interface that has become the foundation for almost all LLM application development. Whether you use Spring AI, LangChain, or other agent frameworks, understanding this protocol is essential for advancing your AI development skills.


1. What is the OpenAI API Protocol

The OpenAI API protocol is a set of HTTP API interface specifications defined by OpenAI for interacting with large language models. Due to its first-mover advantage and wide range of applications, this protocol has become the de facto industry standard.

Why is it so important?

1. Industry Standard Status

Almost all mainstream LLMs are now fully compatible with OpenAI’s interface specifications. This means:

  • Learn one API, and you can call almost all LLMs on the market
  • Extremely low code migration cost - just change base_url and api_key
  • Various agent frameworks are built on top of this protocol

2. Framework Ecosystem Support

Mainstream agent frameworks have deeply encapsulated this protocol:

  • Spring AI / Spring AI Alibaba: First choice for Java ecosystem
  • LangChain: Most popular in Python/JavaScript ecosystem
  • AutoGen: Microsoft’s open-source multi-agent framework
  • LlamaIndex: Focused on RAG application development

But regardless of how frameworks encapsulate it, it all boils down to sending a formatted HTTP request to the LLM’s API endpoint. In fact, you can even develop agent applications using just an HttpClient.

3. API Gateway Ecosystem

Mainstream API gateways are all compatible with the OpenAI protocol:

  • New API: Open-source one-stop API management/distribution system
  • One API: Predecessor of New API
  • Various commercial relay services

2. Protocol Basics: API Endpoints and Authentication

Complete API Path Format

https://domain-address/v1/chat/completions

However, configuration varies slightly across different frameworks:

Framework/PlatformConfiguration MethodConcatenation Rule
LangChain, AutoGen, etc.Configure to /v1/Framework auto-concatenates chat/completions
Spring AIConfigure before /v1Framework auto-concatenates /v1/chat/completions
Direct HTTP callFull pathNo auto-concatenation

Common Model base_urls

Model Providerbase_url
OpenAI ChatGPThttps://api.openai.com/v1
Alibaba Cloud Qwenhttps://dashscope.aliyuncs.com/compatible-mode/v1
DeepSeekhttps://api.deepseek.com/v1
Zhipu AI (GLM)https://open.bigmodel.cn/api/paas/v4
Moonshot (Kimi)https://api.moonshot.cn/v1
Ollama Local Modelhttp://localhost:11434/v1
vLLM Local Deploymenthttp://localhost:8000/v1

Authentication Method

All requests need to carry an API Key in the HTTP request header:

Authorization: Bearer <your-api-key>

API Key is used for:

  • Identity verification
  • Access authorization
  • Billing statistics

3. Input Parameters Detailed Explanation

Complete Parameter Table

Parameter NameRequiredData TypeDefaultRangePurposeDescription
model✅ RequiredString--Specify ModelDetermines which specific LLM to use, e.g., gpt-4, qwen-plus, deepseek-chat
messages✅ RequiredArray--Message ListConversation history or instructions passed to the model, containing role and content
temperature❌ OptionalFloat1.00.0 - 2.0Temperature CoefficientControls output randomness, low values are stable, high values are creative
top_p❌ OptionalFloat1.00.0 - 1.0Nucleus SamplingSimilar to temperature but different algorithm, recommend adjusting only one
max_tokens❌ OptionalIntegerModel Max-Max Output LengthLimits the maximum number of tokens in a single response
n❌ OptionalInteger11-NGeneration CountGenerate n different responses for the same prompt
stream❌ OptionalBooleanfalsetrue/falseStreaming OutputWhether to return tokens in streaming mode
stop❌ OptionalString/Arraynull-Stop SequenceStop generation immediately when specified string is encountered
presence_penalty❌ OptionalFloat0-2.0 - 2.0Presence PenaltyPositive values encourage new topics, negative values reduce new topics
frequency_penalty❌ OptionalFloat0-2.0 - 2.0Frequency PenaltyPositive values reduce repetition, negative values increase coherence
seed❌ OptionalIntegernull-Reproducibility SeedImproves result reproducibility (not 100% guaranteed)
tools❌ OptionalArraynull-Tool DefinitionDefines callable external tools/functions
tool_choice❌ OptionalString/Objectauto-Tool SelectionControls tool calling behavior
response_format❌ OptionalObjectnull-Response FormatSpecifies output format like JSON
user❌ OptionalStringnull-User IdentifierUsed to track and identify end users

Core Parameters Detailed

1. messages (Message List)

This is the most important parameter, defining the complete conversation context with the model. messages is an array where each element is a message object containing two core fields: role and content.

role field: Used to identify who said a message and its role in the conversation. The three most common roles are system, user, and assistant.

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I help you?"},
    {"role": "user", "content": "Please introduce yourself."}
  ]
}

Role Types Detailed Explanation:

RolePurposeUsage ScenarioExample
systemSet model behavior, role, and constraintsPlace first in messages, define AI’s “persona” and response style"You are a professional Python programming assistant. Please provide code examples in your responses."
userUser input or questionsAdd to end of messages each time user asks a question"Please write a quicksort algorithm for me."
assistantModel’s historical responsesSave previous AI responses in multi-turn conversations to maintain context"Sure, here's the Python implementation of quicksort..."
toolTool call return results⚠️ Only for Function Calling, must immediately follow assistant message containing tool_calls{"role": "tool", "tool_call_id": "xxx", "content": "Beijing is sunny today, 25℃"}

⚠️ Important Note: The tool role cannot be used alone. It must be a response to a previous assistant message’s tool_calls. If an assistant message doesn’t have tool_calls, you cannot follow it with a tool role message, otherwise it will cause an error.

Three Core Roles Explained:

🔹 system (System Role)

system messages are used to set the model’s “persona” and behavior specifications before the conversation begins. It’s like giving the AI a “job description”.

{"role": "system", "content": "You are a senior frontend engineer, proficient in React, Vue, TypeScript.\nWhen answering questions:\n1. Prioritize code examples\n2. Explain key concepts\n3. Provide best practice suggestions"}

Usage Tips:

  • Usually only one system message is placed at the beginning of the messages array
  • Can define role, tone, output format, prohibited items, etc.
  • The more specific the system prompt, the more the model’s output matches expectations
  • Some models follow system messages differently, test and adjust as needed

Common system prompt templates:

ScenarioSystem Prompt Example
Programming AssistantYou are a {language} programming expert. Please provide code examples and comments in your responses.
Translation AssistantYou are a professional translator. Please translate user input into {target language}, maintaining the original tone and style.
CopywritingYou are a marketing copywriting expert. Please write copy in {style} style, keeping it within {word count} words.
Data AnalysisYou are a data analyst. Please describe analysis data with clear tables and charts, providing professional insights.
AI AgentYou are an intelligent assistant that can call tools based on user needs. Available tools: {tool list}

🔹 user (User Role)

user messages represent user input, the content the model needs to respond to. Each time a user asks a question, add the new message to the end of the messages array.

{"role": "user", "content": "Please explain what polymorphism is in Java?"}

Usage Tips:

  • Clear, specific questions get better answers
  • You can provide context or examples in user messages
  • Complex tasks can be broken into multiple user messages for step-by-step guidance

🔹 assistant (Assistant Role)

assistant messages represent the model’s historical responses. In multi-turn conversations, you need to add previous assistant responses to messages so the model “remembers” the conversation context.

{"role": "assistant", "content": "Polymorphism is one of the core concepts of object-oriented programming..."}

Usage Scenarios:

  1. Multi-turn Conversation: Maintain conversation coherence
[
  {"role": "user", "content": "What is Java's polymorphism?"},
  {"role": "assistant", "content": "Polymorphism refers to the same operation acting on different objects..."},
  {"role": "user", "content": "Can you give me a code example?"}  // Model knows "it" refers to polymorphism
]
  1. Few-shot Prompting: Guide model output format through examples
[
  {"role": "system", "content": "You are a sentiment analysis assistant, judge the sentiment tendency of user input."},
  {"role": "user", "content": "The weather is great today!"},
  {"role": "assistant", "content": "Positive sentiment"},
  {"role": "user", "content": "This movie is so boring."},
  {"role": "assistant", "content": "Negative sentiment"},
  {"role": "user", "content": "I just completed a project."}  // Model will learn to output "Neutral sentiment" or similar format
]
  1. AI Agent Tool Calling Flow: assistant returns tool_calls, tool returns results
[
  {"role": "user", "content": "How's the weather in Beijing today?"},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"city\":\"Beijing\"}"}}]},
  {"role": "tool", "tool_call_id": "call_1", "content": "Beijing is sunny today, temperature 25℃"},
  {"role": "assistant", "content": "Beijing is sunny today with a temperature of 25 degrees Celsius, suitable for outdoor activities."}
]

⚠️ tool Role Usage Guidelines:

The tool role can only be used in Function Calling scenarios and must meet these requirements:

RequirementDescription
PrerequisiteThe previous message must be assistant and contain tool_calls field
Required Fieldsrole: "tool", tool_call_id (corresponding to id in tool_calls), content (tool return result)
Position RequirementMust immediately follow the assistant message containing tool_calls

Wrong Example (will cause error):

// ❌ Wrong: tool message not preceded by tool_calls
[
  {"role": "user", "content": "Query weather"},
  {"role": "tool", "content": "Beijing sunny, 25℃"}  // Error! No assistant tool_calls before it
]

Correct Example:

// ✅ Correct: tool message immediately follows tool_calls
[
  {"role": "user", "content": "Query Beijing weather"},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "call_123", ...}]},
  {"role": "tool", "tool_call_id": "call_123", "content": "Beijing sunny, 25℃"}  // Correct!
]

2. temperature (Temperature Coefficient)

This is the key parameter controlling output diversity:

Temperature ValueApplicable ScenariosCharacteristics
0.1 - 0.3Code generation, factual Q&A, legal textHigh certainty, predictable
0.5 - 0.7General conversation, translation, summarizationBalance between creativity and accuracy
0.8 - 1.0Creative writing, brainstorming, marketing copyHigh creativity, diversity
> 1.0Extreme creativity scenariosOutput may be unpredictable

Recommendation: Usually default 0.7 is fine, set to 0.3 for more stable output.

3. stream (Streaming Output)

This parameter is crucial for user experience:

ValueBehaviorApplicable Scenarios
falseWait for complete response then return all at onceBatch processing, background tasks
trueReturn tokens one by one in streaming modeReal-time conversation, improve user experience

Advantages of Streaming Output:

  • Users can see generated content in real-time, reducing waiting anxiety
  • Can interrupt unsatisfactory content early
  • More natural feel like human conversation

4. tools / tool_choice (Tool Calling)

This is the core parameter for AI Agent development, used for Function Calling:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a specified city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

tool_choice Options:

  • auto: Model automatically decides whether to call tools (default)
  • none: Do not call any tools
  • required: Must call tools
  • {"type": "function", "function": {"name": "xxx"}}: Force call specified tool

Parameter Usage Recommendations

Most common parameter combination for daily development:

{
  "model": "qwen-plus",
  "messages": [...],
  "stream": true,
  "temperature": 0.7
}

Scenarios requiring stable output:

{
  "model": "qwen-plus",
  "messages": [...],
  "stream": false,
  "temperature": 0.3,
  "seed": 42
}

AI Agent development scenarios:

{
  "model": "qwen-plus",
  "messages": [...],
  "tools": [...],
  "tool_choice": "auto",
  "stream": false
}

4. Output Parameters Detailed Explanation

Non-streaming Output Structure

When stream: false, the API returns a complete JSON response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am an AI assistant...",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

Non-streaming Output Parameter Table

Parameter NameData TypePurposeDescription
idStringUnique IdentifierUnique ID for this API request
objectStringResponse Object TypeFixed as chat.completion
modelStringModel NameActual model name used
createdIntegerTimestampUnix timestamp when API response was created
choicesArrayOutput ListContains all response options generated by the model
choices[i].indexIntegerOption IndexIndex starting from 0
choices[i].messageObjectMessage ContentContains complete response information
choices[i].message.roleStringRoleFixed as assistant
choices[i].message.contentStringGenerated TextAll text content generated by the model
choices[i].message.tool_callsArrayTool CallsAI Agent core: contains function names and parameters to call
choices[i].finish_reasonStringStop Reasonstop (normal end), length (reached max_tokens), tool_calls (call tools)
usageObjectToken StatisticsCore basis for billing
usage.prompt_tokensIntegerInput TokensNumber of tokens consumed by input messages
usage.completion_tokensIntegerOutput TokensNumber of tokens consumed by model output
usage.total_tokensIntegerTotal TokensTotal tokens consumed

Streaming Output Structure

When stream: true, the API returns data chunks through SSE (Server-Sent Events):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"You"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"Good"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":20,"completion_tokens":3,"total_tokens":23}}

data: [DONE]

Streaming Output Parameter Table

Parameter NameData TypePurposeDescription
idStringUnique IdentifierSame in each chunk, identifies the same request
objectStringResponse Object TypeFixed as chat.completion.chunk
modelStringModel NameActual model name used
createdIntegerTimestampUnix timestamp when API response was created
choicesArrayChunk Output ListContains incremental content for this chunk
choices[i].indexIntegerOption IndexIndex starting from 0
choices[i].deltaObjectIncremental ContentNew content added in this data chunk
choices[i].delta.roleStringRoleOnly appears in the first chunk
choices[i].delta.contentStringIncremental TextTiny portion of text added this time, needs concatenation
choices[i].delta.tool_callsArrayTool Call IncrementalStreaming tool calls, returned incrementally
choices[i].finish_reasonStringStop ReasonOnly returned in the last chunk
usageObjectToken StatisticsUsually returned in the last chunk

finish_reason Complete Explanation

ValueMeaningHandling Suggestion
stopNormal completionOutput complete, can use directly
lengthReached max_tokens limitOutput truncated, may need to continue
tool_callsModel requests to call toolsAI Agent core, need to execute tools and return results
content_filterContent filtered by securityInput or output triggered security policy
function_callLegacy function calling (deprecated)Use tool_calls instead

5. Streaming vs Non-streaming Comparison

Comparison DimensionStreaming Output (stream: true)Non-streaming Output (stream: false)
Response MethodReturn token by tokenReturn complete result at once
First Token LatencyLow, fast displayHigh, need to wait for complete generation
User ExperienceGood, real-time feedbackPoor, need to wait
Implementation ComplexityHigher, need to handle SSESimple, normal HTTP request
Token StatisticsReturned in last chunkReturned with response
Error HandlingComplex, may fail mid-streamSimple, unified handling
Applicable ScenariosReal-time conversation, chat applicationsBatch processing, background tasks
Tool CallingIncremental returnComplete return

6. Core Parameters in AI Agent Development

In AI Agent development, the following parameters are most critical:

1. messages - Conversation Context

Agents need to maintain complete conversation history, including:

  • System prompt (defining Agent behavior)
  • User messages
  • Assistant responses
  • Tool call records
  • Tool return results

2. tools / tool_calls - Tool Calling

This is the core for Agent interaction with the external world:

// Model requests to call tool
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\": \"Beijing\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

3. Multi-turn Conversation Flow

User Input → Model Judgment → Return tool_calls → Execute Tool →
Add Results to messages → Call Model Again → Return Final Answer

7. Spring AI and Spring AI Alibaba Integration

Spring AI Basic Configuration

# application.yml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      base-url: https://api.openai.com
      chat:
        options:
          model: gpt-4
          temperature: 0.7

Spring AI Alibaba Configuration

# application.yml
spring:
  ai:
    dashscope:
      api-key: ${DASHSCOPE_API_KEY}
      chat:
        options:
          model: qwen-plus
          temperature: 0.7

Code Example Comparison

Spring AI Call Example:

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    // Non-streaming call
    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
                .user(message)
                .call()
                .content();
    }

    // Streaming call
    @GetMapping("/chat/stream")
    public Flux<String> chatStream(@RequestParam String message) {
        return chatClient.prompt()
                .user(message)
                .stream()
                .content();
    }
}

Spring AI Complete Configuration Example:

@Configuration
public class AIConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("You are a helpful AI assistant.")
                .defaultOptions(ChatOptions.builder()
                        .model("qwen-plus")
                        .temperature(0.7)
                        .maxTokens(2000)
                        .build())
                .build();
    }
}

8. Native Java HTTP Call Examples

Non-streaming Call Example

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;

public class OpenAIClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;

    public OpenAIClient() {
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(30))
                .build();
    }

    /**
     * Non-streaming call
     */
    public String chat(String userMessage) throws Exception {
        // Build request body
        String requestBody = """
            {
              "model": "qwen-plus",
              "messages": [
                {"role": "system", "content": "You are a helpful AI assistant."},
                {"role": "user", "content": "%s"}
              ],
              "temperature": 0.7,
              "stream": false
            }
            """.formatted(userMessage);

        // Build request
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        // Send request
        HttpResponse<String> response = httpClient.send(
                request,
                HttpResponse.BodyHandlers.ofString()
        );

        // Parse response
        if (response.statusCode() == 200) {
            return parseContent(response.body());
        } else {
            throw new RuntimeException("API call failed: " + response.body());
        }
    }

    /**
     * Parse non-streaming response
     */
    private String parseContent(String responseBody) {
        // Simple parsing (recommend using Jackson/Gson in actual projects)
        int contentStart = responseBody.indexOf("\"content\":\"") + 11;
        int contentEnd = responseBody.indexOf("\"", contentStart);
        // Handle escape characters etc... should use JSON library
        return responseBody.substring(contentStart, contentEnd);
    }

    public static void main(String[] args) throws Exception {
        OpenAIClient client = new OpenAIClient();
        String response = client.chat("Hello, please introduce yourself.");
        System.out.println(response);
    }
}

Streaming Call Example

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.function.Consumer;

public class OpenAIStreamClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;

    public OpenAIStreamClient() {
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(30))
                .build();
    }

    /**
     * Streaming call
     * @param userMessage User message
     * @param onContent Callback for each content received
     */
    public void chatStream(String userMessage, Consumer<String> onContent) throws Exception {
        // Build request body
        String requestBody = """
            {
              "model": "qwen-plus",
              "messages": [
                {"role": "system", "content": "You are a helpful AI assistant."},
                {"role": "user", "content": "%s"}
              ],
              "temperature": 0.7,
              "stream": true
            }
            """.formatted(userMessage);

        // Build request
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .header("Accept", "text/event-stream")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        // Send streaming request
        HttpResponse<java.util.stream.Stream<String>> response = httpClient.send(
                request,
                HttpResponse.BodyHandlers.ofLines()
        );

        // Handle streaming response
        response.body().forEach(line -> {
            if (line.startsWith("data: ") && !line.equals("data: [DONE]")) {
                String jsonData = line.substring(6);
                String content = extractDeltaContent(jsonData);
                if (content != null && !content.isEmpty()) {
                    onContent.accept(content);
                }
            }
        });
    }

    /**
     * Extract delta content from SSE data
     */
    private String extractDeltaContent(String jsonData) {
        // Simple parsing (recommend using Jackson/Gson in actual projects)
        try {
            int deltaStart = jsonData.indexOf("\"delta\":");
            if (deltaStart == -1) return null;

            int contentStart = jsonData.indexOf("\"content\":\"", deltaStart);
            if (contentStart == -1) return null;

            contentStart += 11;
            int contentEnd = jsonData.indexOf("\"", contentStart);

            return jsonData.substring(contentStart, contentEnd)
                    .replace("\\n", "\n")
                    .replace("\\\"", "\"")
                    .replace("\\\\", "\\");
        } catch (Exception e) {
            return null;
        }
    }

    public static void main(String[] args) throws Exception {
        OpenAIStreamClient client = new OpenAIStreamClient();

        System.out.println("AI Response:");
        client.chatStream("Please write a short poem about spring.", content -> {
            System.out.print(content);  // Real-time print
        });
        System.out.println("\n--- Done ---");
    }
}

Complete Example Using Jackson

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;

public class OpenAIJsonClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;
    private final ObjectMapper objectMapper;

    public OpenAIJsonClient() {
        this.httpClient = HttpClient.newHttpClient();
        this.objectMapper = new ObjectMapper();
    }

    // Request object
    public static class ChatRequest {
        public String model;
        public List<Message> messages;
        public Double temperature;
        public Boolean stream;

        public static class Message {
            public String role;
            public String content;

            public Message(String role, String content) {
                this.role = role;
                this.content = content;
            }
        }
    }

    // Response object
    public static class ChatResponse {
        public String id;
        public String object;
        public Long created;
        public String model;
        public List<Choice> choices;
        public Usage usage;

        public static class Choice {
            public Integer index;
            public Message message;
            public String finish_reason;

            public static class Message {
                public String role;
                public String content;
                @JsonProperty("tool_calls")
                public List<ToolCall> toolCalls;
            }
        }

        public static class Usage {
            @JsonProperty("prompt_tokens")
            public Integer promptTokens;
            @JsonProperty("completion_tokens")
            public Integer completionTokens;
            @JsonProperty("total_tokens")
            public Integer totalTokens;
        }

        public static class ToolCall {
            public String id;
            public String type;
            public Function function;

            public static class Function {
                public String name;
                public String arguments;
            }
        }
    }

    /**
     * Non-streaming call (using strongly typed objects)
     */
    public ChatResponse chat(String userMessage) throws Exception {
        // Build request
        ChatRequest request = new ChatRequest();
        request.model = "qwen-plus";
        request.temperature = 0.7;
        request.stream = false;
        request.messages = List.of(
                new ChatRequest.Message("system", "You are a helpful AI assistant."),
                new ChatRequest.Message("user", userMessage)
        );

        String requestBody = objectMapper.writeValueAsString(request);

        HttpRequest httpRequest = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        HttpResponse<String> response = httpClient.send(
                httpRequest,
                HttpResponse.BodyHandlers.ofString()
        );

        if (response.statusCode() == 200) {
            return objectMapper.readValue(response.body(), ChatResponse.class);
        } else {
            throw new RuntimeException("API call failed: " + response.body());
        }
    }

    public static void main(String[] args) throws Exception {
        OpenAIJsonClient client = new OpenAIJsonClient();

        ChatResponse response = client.chat("What is Java's polymorphism?");

        System.out.println("Model: " + response.model);
        System.out.println("Response: " + response.choices.get(0).message.content);
        System.out.println("Token Consumption: " + response.usage.totalTokens);
    }
}

9. New API Gateway and Anthropic Claude

New API Gateway

New API is currently the most popular open-source API management/distribution system, fully compatible with the OpenAI API protocol:

Core Features:

  • Supports multiple LLMs (OpenAI, Claude, Gemini, domestic models, etc.)
  • Unified OpenAI protocol interface
  • API Key management and billing
  • Channel management and load balancing

Usage:

Just point base_url to the New API server address:

String baseUrl = "https://your-new-api-server/v1";
// Or local deployment
String baseUrl = "http://localhost:3000/v1";

Anthropic Claude’s Independent System

It’s worth noting that Anthropic’s Claude series models use a self-contained API protocol that differs from the OpenAI protocol:

Comparison ItemOpenAI APIAnthropic API
Endpoint/v1/chat/completions/v1/messages
Message Formatmessages arraymessages + system separated
System PromptPlaced in messagesSeparate system parameter
Streaming Fieldchoices[0].delta.contentdelta.text
Tool Callingtool_callstool_use blocks in content

Anthropic API Example:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "system": "You are a helpful AI assistant.",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}

Compatibility Solution:

Most gateways (like New API) will automatically convert protocols, allowing you to call Claude using the OpenAI protocol. However, if you call the Anthropic official API directly, you need to follow its native protocol.


10. Best Practices Summary

Parameter Configuration Recommendations

ScenariotemperaturestreamOther Recommendations
Daily conversation0.7trueDefault is fine
Code generation0.3falseSet seed for stability
Creative writing0.9trueCan combine with top_p
JSON output0.3falseUse response_format
AI Agent0.7falseConfigure tools

Development Recommendations

  1. Prioritize streaming output: Significantly improves user experience
  2. Set temperature reasonably: Choose based on scenario, don’t blindly use default values
  3. Monitor token consumption: usage field is the billing basis, monitor it well
  4. Use seed parameter wisely: Use when stable output is needed
  5. Framework encapsulation first: Frameworks like Spring AI simplify development, but understanding the underlying protocol is important

Error Handling

try {
    ChatResponse response = client.chat(message);
    // Handle response
} catch (Exception e) {
    // Common error handling
    if (e.getMessage().contains("401")) {
        System.err.println("Invalid API Key");
    } else if (e.getMessage().contains("429")) {
        System.err.println("Rate limit exceeded, please retry later");
    } else if (e.getMessage().contains("500")) {
        System.err.println("Model service error");
    }
}

Summary

The OpenAI API protocol has become the de facto standard for LLM application development. Mastering this protocol’s input parameters, output parameters, and streaming/non-streaming output is key to understanding the underlying principles of AI Agent development.

Core takeaways:

  • messages is the most important input parameter, defining conversation context
  • stream determines streaming or non-streaming output, affecting user experience
  • tools/tool_calls is the core of AI Agent development
  • choices[].message.content and usage are the most important output fields
  • Frameworks like Spring AI and LangChain all encapsulate this protocol
  • Gateways like New API enable unified access to multiple models

Welcome to follow the WeChat Official Account FishTech Notes for more discussions!