OpenAI API Protocol Complete Guide - The Universal Standard for LLM Application Development
3/26/2026
查看这篇文章的中文版本Introduction
“The quality of your prompt determines the model’s output, but whether the model outputs correctly and how good the output is also depends on other parameters.”
This is a common confusion for many developers new to AI application development. Everyone knows the importance of writing good prompts, but understanding the various parameters in API calls remains a mystery.
In this article, I’ll guide you through the OpenAI API protocol in detail - the standard interface that has become the foundation for almost all LLM application development. Whether you use Spring AI, LangChain, or other agent frameworks, understanding this protocol is essential for advancing your AI development skills.

1. What is the OpenAI API Protocol
The OpenAI API protocol is a set of HTTP API interface specifications defined by OpenAI for interacting with large language models. Due to its first-mover advantage and wide range of applications, this protocol has become the de facto industry standard.
Why is it so important?
1. Industry Standard Status
Almost all mainstream LLMs are now fully compatible with OpenAI’s interface specifications. This means:
- Learn one API, and you can call almost all LLMs on the market
- Extremely low code migration cost - just change
base_urlandapi_key - Various agent frameworks are built on top of this protocol
2. Framework Ecosystem Support
Mainstream agent frameworks have deeply encapsulated this protocol:
- Spring AI / Spring AI Alibaba: First choice for Java ecosystem
- LangChain: Most popular in Python/JavaScript ecosystem
- AutoGen: Microsoft’s open-source multi-agent framework
- LlamaIndex: Focused on RAG application development
But regardless of how frameworks encapsulate it, it all boils down to sending a formatted HTTP request to the LLM’s API endpoint. In fact, you can even develop agent applications using just an HttpClient.
3. API Gateway Ecosystem
Mainstream API gateways are all compatible with the OpenAI protocol:
- New API: Open-source one-stop API management/distribution system
- One API: Predecessor of New API
- Various commercial relay services
2. Protocol Basics: API Endpoints and Authentication
Complete API Path Format
https://domain-address/v1/chat/completions
However, configuration varies slightly across different frameworks:
| Framework/Platform | Configuration Method | Concatenation Rule |
|---|---|---|
| LangChain, AutoGen, etc. | Configure to /v1/ | Framework auto-concatenates chat/completions |
| Spring AI | Configure before /v1 | Framework auto-concatenates /v1/chat/completions |
| Direct HTTP call | Full path | No auto-concatenation |
Common Model base_urls
| Model Provider | base_url |
|---|---|
| OpenAI ChatGPT | https://api.openai.com/v1 |
| Alibaba Cloud Qwen | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| DeepSeek | https://api.deepseek.com/v1 |
| Zhipu AI (GLM) | https://open.bigmodel.cn/api/paas/v4 |
| Moonshot (Kimi) | https://api.moonshot.cn/v1 |
| Ollama Local Model | http://localhost:11434/v1 |
| vLLM Local Deployment | http://localhost:8000/v1 |
Authentication Method
All requests need to carry an API Key in the HTTP request header:
Authorization: Bearer <your-api-key>
API Key is used for:
- Identity verification
- Access authorization
- Billing statistics
3. Input Parameters Detailed Explanation
Complete Parameter Table
| Parameter Name | Required | Data Type | Default | Range | Purpose | Description |
|---|---|---|---|---|---|---|
| model | ✅ Required | String | - | - | Specify Model | Determines which specific LLM to use, e.g., gpt-4, qwen-plus, deepseek-chat |
| messages | ✅ Required | Array | - | - | Message List | Conversation history or instructions passed to the model, containing role and content |
| temperature | ❌ Optional | Float | 1.0 | 0.0 - 2.0 | Temperature Coefficient | Controls output randomness, low values are stable, high values are creative |
| top_p | ❌ Optional | Float | 1.0 | 0.0 - 1.0 | Nucleus Sampling | Similar to temperature but different algorithm, recommend adjusting only one |
| max_tokens | ❌ Optional | Integer | Model Max | - | Max Output Length | Limits the maximum number of tokens in a single response |
| n | ❌ Optional | Integer | 1 | 1-N | Generation Count | Generate n different responses for the same prompt |
| stream | ❌ Optional | Boolean | false | true/false | Streaming Output | Whether to return tokens in streaming mode |
| stop | ❌ Optional | String/Array | null | - | Stop Sequence | Stop generation immediately when specified string is encountered |
| presence_penalty | ❌ Optional | Float | 0 | -2.0 - 2.0 | Presence Penalty | Positive values encourage new topics, negative values reduce new topics |
| frequency_penalty | ❌ Optional | Float | 0 | -2.0 - 2.0 | Frequency Penalty | Positive values reduce repetition, negative values increase coherence |
| seed | ❌ Optional | Integer | null | - | Reproducibility Seed | Improves result reproducibility (not 100% guaranteed) |
| tools | ❌ Optional | Array | null | - | Tool Definition | Defines callable external tools/functions |
| tool_choice | ❌ Optional | String/Object | auto | - | Tool Selection | Controls tool calling behavior |
| response_format | ❌ Optional | Object | null | - | Response Format | Specifies output format like JSON |
| user | ❌ Optional | String | null | - | User Identifier | Used to track and identify end users |
Core Parameters Detailed
1. messages (Message List)
This is the most important parameter, defining the complete conversation context with the model. messages is an array where each element is a message object containing two core fields: role and content.
role field: Used to identify who said a message and its role in the conversation. The three most common roles are system, user, and assistant.
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I help you?"},
{"role": "user", "content": "Please introduce yourself."}
]
}
Role Types Detailed Explanation:
| Role | Purpose | Usage Scenario | Example |
|---|---|---|---|
| system | Set model behavior, role, and constraints | Place first in messages, define AI’s “persona” and response style | "You are a professional Python programming assistant. Please provide code examples in your responses." |
| user | User input or questions | Add to end of messages each time user asks a question | "Please write a quicksort algorithm for me." |
| assistant | Model’s historical responses | Save previous AI responses in multi-turn conversations to maintain context | "Sure, here's the Python implementation of quicksort..." |
| tool | Tool call return results | ⚠️ Only for Function Calling, must immediately follow assistant message containing tool_calls | {"role": "tool", "tool_call_id": "xxx", "content": "Beijing is sunny today, 25℃"} |
⚠️ Important Note: The
toolrole cannot be used alone. It must be a response to a previous assistant message’stool_calls. If an assistant message doesn’t havetool_calls, you cannot follow it with atoolrole message, otherwise it will cause an error.
Three Core Roles Explained:
🔹 system (System Role)
system messages are used to set the model’s “persona” and behavior specifications before the conversation begins. It’s like giving the AI a “job description”.
{"role": "system", "content": "You are a senior frontend engineer, proficient in React, Vue, TypeScript.\nWhen answering questions:\n1. Prioritize code examples\n2. Explain key concepts\n3. Provide best practice suggestions"}
Usage Tips:
- Usually only one system message is placed at the beginning of the messages array
- Can define role, tone, output format, prohibited items, etc.
- The more specific the system prompt, the more the model’s output matches expectations
- Some models follow system messages differently, test and adjust as needed
Common system prompt templates:
| Scenario | System Prompt Example |
|---|---|
| Programming Assistant | You are a {language} programming expert. Please provide code examples and comments in your responses. |
| Translation Assistant | You are a professional translator. Please translate user input into {target language}, maintaining the original tone and style. |
| Copywriting | You are a marketing copywriting expert. Please write copy in {style} style, keeping it within {word count} words. |
| Data Analysis | You are a data analyst. Please describe analysis data with clear tables and charts, providing professional insights. |
| AI Agent | You are an intelligent assistant that can call tools based on user needs. Available tools: {tool list} |
🔹 user (User Role)
user messages represent user input, the content the model needs to respond to. Each time a user asks a question, add the new message to the end of the messages array.
{"role": "user", "content": "Please explain what polymorphism is in Java?"}
Usage Tips:
- Clear, specific questions get better answers
- You can provide context or examples in user messages
- Complex tasks can be broken into multiple user messages for step-by-step guidance
🔹 assistant (Assistant Role)
assistant messages represent the model’s historical responses. In multi-turn conversations, you need to add previous assistant responses to messages so the model “remembers” the conversation context.
{"role": "assistant", "content": "Polymorphism is one of the core concepts of object-oriented programming..."}
Usage Scenarios:
- Multi-turn Conversation: Maintain conversation coherence
[
{"role": "user", "content": "What is Java's polymorphism?"},
{"role": "assistant", "content": "Polymorphism refers to the same operation acting on different objects..."},
{"role": "user", "content": "Can you give me a code example?"} // Model knows "it" refers to polymorphism
]
- Few-shot Prompting: Guide model output format through examples
[
{"role": "system", "content": "You are a sentiment analysis assistant, judge the sentiment tendency of user input."},
{"role": "user", "content": "The weather is great today!"},
{"role": "assistant", "content": "Positive sentiment"},
{"role": "user", "content": "This movie is so boring."},
{"role": "assistant", "content": "Negative sentiment"},
{"role": "user", "content": "I just completed a project."} // Model will learn to output "Neutral sentiment" or similar format
]
- AI Agent Tool Calling Flow: assistant returns tool_calls, tool returns results
[
{"role": "user", "content": "How's the weather in Beijing today?"},
{"role": "assistant", "content": null, "tool_calls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"city\":\"Beijing\"}"}}]},
{"role": "tool", "tool_call_id": "call_1", "content": "Beijing is sunny today, temperature 25℃"},
{"role": "assistant", "content": "Beijing is sunny today with a temperature of 25 degrees Celsius, suitable for outdoor activities."}
]
⚠️ tool Role Usage Guidelines:
The tool role can only be used in Function Calling scenarios and must meet these requirements:
| Requirement | Description |
|---|---|
| Prerequisite | The previous message must be assistant and contain tool_calls field |
| Required Fields | role: "tool", tool_call_id (corresponding to id in tool_calls), content (tool return result) |
| Position Requirement | Must immediately follow the assistant message containing tool_calls |
Wrong Example (will cause error):
// ❌ Wrong: tool message not preceded by tool_calls
[
{"role": "user", "content": "Query weather"},
{"role": "tool", "content": "Beijing sunny, 25℃"} // Error! No assistant tool_calls before it
]
Correct Example:
// ✅ Correct: tool message immediately follows tool_calls
[
{"role": "user", "content": "Query Beijing weather"},
{"role": "assistant", "content": null, "tool_calls": [{"id": "call_123", ...}]},
{"role": "tool", "tool_call_id": "call_123", "content": "Beijing sunny, 25℃"} // Correct!
]
2. temperature (Temperature Coefficient)
This is the key parameter controlling output diversity:
| Temperature Value | Applicable Scenarios | Characteristics |
|---|---|---|
| 0.1 - 0.3 | Code generation, factual Q&A, legal text | High certainty, predictable |
| 0.5 - 0.7 | General conversation, translation, summarization | Balance between creativity and accuracy |
| 0.8 - 1.0 | Creative writing, brainstorming, marketing copy | High creativity, diversity |
| > 1.0 | Extreme creativity scenarios | Output may be unpredictable |
Recommendation: Usually default 0.7 is fine, set to 0.3 for more stable output.
3. stream (Streaming Output)
This parameter is crucial for user experience:
| Value | Behavior | Applicable Scenarios |
|---|---|---|
| false | Wait for complete response then return all at once | Batch processing, background tasks |
| true | Return tokens one by one in streaming mode | Real-time conversation, improve user experience |
Advantages of Streaming Output:
- Users can see generated content in real-time, reducing waiting anxiety
- Can interrupt unsatisfactory content early
- More natural feel like human conversation
4. tools / tool_choice (Tool Calling)
This is the core parameter for AI Agent development, used for Function Calling:
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}
tool_choice Options:
auto: Model automatically decides whether to call tools (default)none: Do not call any toolsrequired: Must call tools{"type": "function", "function": {"name": "xxx"}}: Force call specified tool
Parameter Usage Recommendations
Most common parameter combination for daily development:
{
"model": "qwen-plus",
"messages": [...],
"stream": true,
"temperature": 0.7
}
Scenarios requiring stable output:
{
"model": "qwen-plus",
"messages": [...],
"stream": false,
"temperature": 0.3,
"seed": 42
}
AI Agent development scenarios:
{
"model": "qwen-plus",
"messages": [...],
"tools": [...],
"tool_choice": "auto",
"stream": false
}
4. Output Parameters Detailed Explanation
Non-streaming Output Structure
When stream: false, the API returns a complete JSON response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699000000,
"model": "qwen-plus",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am an AI assistant...",
"tool_calls": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}
Non-streaming Output Parameter Table
| Parameter Name | Data Type | Purpose | Description |
|---|---|---|---|
| id | String | Unique Identifier | Unique ID for this API request |
| object | String | Response Object Type | Fixed as chat.completion |
| model | String | Model Name | Actual model name used |
| created | Integer | Timestamp | Unix timestamp when API response was created |
| choices | Array | Output List | Contains all response options generated by the model |
| choices[i].index | Integer | Option Index | Index starting from 0 |
| choices[i].message | Object | Message Content | Contains complete response information |
| choices[i].message.role | String | Role | Fixed as assistant |
| choices[i].message.content | String | Generated Text | All text content generated by the model |
| choices[i].message.tool_calls | Array | Tool Calls | AI Agent core: contains function names and parameters to call |
| choices[i].finish_reason | String | Stop Reason | stop (normal end), length (reached max_tokens), tool_calls (call tools) |
| usage | Object | Token Statistics | Core basis for billing |
| usage.prompt_tokens | Integer | Input Tokens | Number of tokens consumed by input messages |
| usage.completion_tokens | Integer | Output Tokens | Number of tokens consumed by model output |
| usage.total_tokens | Integer | Total Tokens | Total tokens consumed |
Streaming Output Structure
When stream: true, the API returns data chunks through SSE (Server-Sent Events):
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"You"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"Good"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":20,"completion_tokens":3,"total_tokens":23}}
data: [DONE]
Streaming Output Parameter Table
| Parameter Name | Data Type | Purpose | Description |
|---|---|---|---|
| id | String | Unique Identifier | Same in each chunk, identifies the same request |
| object | String | Response Object Type | Fixed as chat.completion.chunk |
| model | String | Model Name | Actual model name used |
| created | Integer | Timestamp | Unix timestamp when API response was created |
| choices | Array | Chunk Output List | Contains incremental content for this chunk |
| choices[i].index | Integer | Option Index | Index starting from 0 |
| choices[i].delta | Object | Incremental Content | New content added in this data chunk |
| choices[i].delta.role | String | Role | Only appears in the first chunk |
| choices[i].delta.content | String | Incremental Text | Tiny portion of text added this time, needs concatenation |
| choices[i].delta.tool_calls | Array | Tool Call Incremental | Streaming tool calls, returned incrementally |
| choices[i].finish_reason | String | Stop Reason | Only returned in the last chunk |
| usage | Object | Token Statistics | Usually returned in the last chunk |
finish_reason Complete Explanation
| Value | Meaning | Handling Suggestion |
|---|---|---|
| stop | Normal completion | Output complete, can use directly |
| length | Reached max_tokens limit | Output truncated, may need to continue |
| tool_calls | Model requests to call tools | AI Agent core, need to execute tools and return results |
| content_filter | Content filtered by security | Input or output triggered security policy |
| function_call | Legacy function calling (deprecated) | Use tool_calls instead |
5. Streaming vs Non-streaming Comparison
| Comparison Dimension | Streaming Output (stream: true) | Non-streaming Output (stream: false) |
|---|---|---|
| Response Method | Return token by token | Return complete result at once |
| First Token Latency | Low, fast display | High, need to wait for complete generation |
| User Experience | Good, real-time feedback | Poor, need to wait |
| Implementation Complexity | Higher, need to handle SSE | Simple, normal HTTP request |
| Token Statistics | Returned in last chunk | Returned with response |
| Error Handling | Complex, may fail mid-stream | Simple, unified handling |
| Applicable Scenarios | Real-time conversation, chat applications | Batch processing, background tasks |
| Tool Calling | Incremental return | Complete return |
6. Core Parameters in AI Agent Development
In AI Agent development, the following parameters are most critical:
1. messages - Conversation Context
Agents need to maintain complete conversation history, including:
- System prompt (defining Agent behavior)
- User messages
- Assistant responses
- Tool call records
- Tool return results
2. tools / tool_calls - Tool Calling
This is the core for Agent interaction with the external world:
// Model requests to call tool
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"Beijing\"}"
}
}]
},
"finish_reason": "tool_calls"
}]
}
3. Multi-turn Conversation Flow
User Input → Model Judgment → Return tool_calls → Execute Tool →
Add Results to messages → Call Model Again → Return Final Answer
7. Spring AI and Spring AI Alibaba Integration
Spring AI Basic Configuration
# application.yml
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
base-url: https://api.openai.com
chat:
options:
model: gpt-4
temperature: 0.7
Spring AI Alibaba Configuration
# application.yml
spring:
ai:
dashscope:
api-key: ${DASHSCOPE_API_KEY}
chat:
options:
model: qwen-plus
temperature: 0.7
Code Example Comparison
Spring AI Call Example:
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
// Non-streaming call
@GetMapping("/chat")
public String chat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.call()
.content();
}
// Streaming call
@GetMapping("/chat/stream")
public Flux<String> chatStream(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}
}
Spring AI Complete Configuration Example:
@Configuration
public class AIConfig {
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultSystem("You are a helpful AI assistant.")
.defaultOptions(ChatOptions.builder()
.model("qwen-plus")
.temperature(0.7)
.maxTokens(2000)
.build())
.build();
}
}
8. Native Java HTTP Call Examples
Non-streaming Call Example
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
public class OpenAIClient {
private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
private static final String API_KEY = "your-api-key";
private final HttpClient httpClient;
public OpenAIClient() {
this.httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(30))
.build();
}
/**
* Non-streaming call
*/
public String chat(String userMessage) throws Exception {
// Build request body
String requestBody = """
{
"model": "qwen-plus",
"messages": [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "%s"}
],
"temperature": 0.7,
"stream": false
}
""".formatted(userMessage);
// Build request
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(API_URL))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer " + API_KEY)
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();
// Send request
HttpResponse<String> response = httpClient.send(
request,
HttpResponse.BodyHandlers.ofString()
);
// Parse response
if (response.statusCode() == 200) {
return parseContent(response.body());
} else {
throw new RuntimeException("API call failed: " + response.body());
}
}
/**
* Parse non-streaming response
*/
private String parseContent(String responseBody) {
// Simple parsing (recommend using Jackson/Gson in actual projects)
int contentStart = responseBody.indexOf("\"content\":\"") + 11;
int contentEnd = responseBody.indexOf("\"", contentStart);
// Handle escape characters etc... should use JSON library
return responseBody.substring(contentStart, contentEnd);
}
public static void main(String[] args) throws Exception {
OpenAIClient client = new OpenAIClient();
String response = client.chat("Hello, please introduce yourself.");
System.out.println(response);
}
}
Streaming Call Example
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.function.Consumer;
public class OpenAIStreamClient {
private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
private static final String API_KEY = "your-api-key";
private final HttpClient httpClient;
public OpenAIStreamClient() {
this.httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(30))
.build();
}
/**
* Streaming call
* @param userMessage User message
* @param onContent Callback for each content received
*/
public void chatStream(String userMessage, Consumer<String> onContent) throws Exception {
// Build request body
String requestBody = """
{
"model": "qwen-plus",
"messages": [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "%s"}
],
"temperature": 0.7,
"stream": true
}
""".formatted(userMessage);
// Build request
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(API_URL))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer " + API_KEY)
.header("Accept", "text/event-stream")
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();
// Send streaming request
HttpResponse<java.util.stream.Stream<String>> response = httpClient.send(
request,
HttpResponse.BodyHandlers.ofLines()
);
// Handle streaming response
response.body().forEach(line -> {
if (line.startsWith("data: ") && !line.equals("data: [DONE]")) {
String jsonData = line.substring(6);
String content = extractDeltaContent(jsonData);
if (content != null && !content.isEmpty()) {
onContent.accept(content);
}
}
});
}
/**
* Extract delta content from SSE data
*/
private String extractDeltaContent(String jsonData) {
// Simple parsing (recommend using Jackson/Gson in actual projects)
try {
int deltaStart = jsonData.indexOf("\"delta\":");
if (deltaStart == -1) return null;
int contentStart = jsonData.indexOf("\"content\":\"", deltaStart);
if (contentStart == -1) return null;
contentStart += 11;
int contentEnd = jsonData.indexOf("\"", contentStart);
return jsonData.substring(contentStart, contentEnd)
.replace("\\n", "\n")
.replace("\\\"", "\"")
.replace("\\\\", "\\");
} catch (Exception e) {
return null;
}
}
public static void main(String[] args) throws Exception {
OpenAIStreamClient client = new OpenAIStreamClient();
System.out.println("AI Response:");
client.chatStream("Please write a short poem about spring.", content -> {
System.out.print(content); // Real-time print
});
System.out.println("\n--- Done ---");
}
}
Complete Example Using Jackson
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;
public class OpenAIJsonClient {
private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
private static final String API_KEY = "your-api-key";
private final HttpClient httpClient;
private final ObjectMapper objectMapper;
public OpenAIJsonClient() {
this.httpClient = HttpClient.newHttpClient();
this.objectMapper = new ObjectMapper();
}
// Request object
public static class ChatRequest {
public String model;
public List<Message> messages;
public Double temperature;
public Boolean stream;
public static class Message {
public String role;
public String content;
public Message(String role, String content) {
this.role = role;
this.content = content;
}
}
}
// Response object
public static class ChatResponse {
public String id;
public String object;
public Long created;
public String model;
public List<Choice> choices;
public Usage usage;
public static class Choice {
public Integer index;
public Message message;
public String finish_reason;
public static class Message {
public String role;
public String content;
@JsonProperty("tool_calls")
public List<ToolCall> toolCalls;
}
}
public static class Usage {
@JsonProperty("prompt_tokens")
public Integer promptTokens;
@JsonProperty("completion_tokens")
public Integer completionTokens;
@JsonProperty("total_tokens")
public Integer totalTokens;
}
public static class ToolCall {
public String id;
public String type;
public Function function;
public static class Function {
public String name;
public String arguments;
}
}
}
/**
* Non-streaming call (using strongly typed objects)
*/
public ChatResponse chat(String userMessage) throws Exception {
// Build request
ChatRequest request = new ChatRequest();
request.model = "qwen-plus";
request.temperature = 0.7;
request.stream = false;
request.messages = List.of(
new ChatRequest.Message("system", "You are a helpful AI assistant."),
new ChatRequest.Message("user", userMessage)
);
String requestBody = objectMapper.writeValueAsString(request);
HttpRequest httpRequest = HttpRequest.newBuilder()
.uri(URI.create(API_URL))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer " + API_KEY)
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();
HttpResponse<String> response = httpClient.send(
httpRequest,
HttpResponse.BodyHandlers.ofString()
);
if (response.statusCode() == 200) {
return objectMapper.readValue(response.body(), ChatResponse.class);
} else {
throw new RuntimeException("API call failed: " + response.body());
}
}
public static void main(String[] args) throws Exception {
OpenAIJsonClient client = new OpenAIJsonClient();
ChatResponse response = client.chat("What is Java's polymorphism?");
System.out.println("Model: " + response.model);
System.out.println("Response: " + response.choices.get(0).message.content);
System.out.println("Token Consumption: " + response.usage.totalTokens);
}
}
9. New API Gateway and Anthropic Claude
New API Gateway
New API is currently the most popular open-source API management/distribution system, fully compatible with the OpenAI API protocol:
Core Features:
- Supports multiple LLMs (OpenAI, Claude, Gemini, domestic models, etc.)
- Unified OpenAI protocol interface
- API Key management and billing
- Channel management and load balancing
Usage:
Just point base_url to the New API server address:
String baseUrl = "https://your-new-api-server/v1";
// Or local deployment
String baseUrl = "http://localhost:3000/v1";
Anthropic Claude’s Independent System
It’s worth noting that Anthropic’s Claude series models use a self-contained API protocol that differs from the OpenAI protocol:
| Comparison Item | OpenAI API | Anthropic API |
|---|---|---|
| Endpoint | /v1/chat/completions | /v1/messages |
| Message Format | messages array | messages + system separated |
| System Prompt | Placed in messages | Separate system parameter |
| Streaming Field | choices[0].delta.content | delta.text |
| Tool Calling | tool_calls | tool_use blocks in content |
Anthropic API Example:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"system": "You are a helpful AI assistant.",
"messages": [
{"role": "user", "content": "Hello!"}
]
}
Compatibility Solution:
Most gateways (like New API) will automatically convert protocols, allowing you to call Claude using the OpenAI protocol. However, if you call the Anthropic official API directly, you need to follow its native protocol.
10. Best Practices Summary
Parameter Configuration Recommendations
| Scenario | temperature | stream | Other Recommendations |
|---|---|---|---|
| Daily conversation | 0.7 | true | Default is fine |
| Code generation | 0.3 | false | Set seed for stability |
| Creative writing | 0.9 | true | Can combine with top_p |
| JSON output | 0.3 | false | Use response_format |
| AI Agent | 0.7 | false | Configure tools |
Development Recommendations
- Prioritize streaming output: Significantly improves user experience
- Set temperature reasonably: Choose based on scenario, don’t blindly use default values
- Monitor token consumption: usage field is the billing basis, monitor it well
- Use seed parameter wisely: Use when stable output is needed
- Framework encapsulation first: Frameworks like Spring AI simplify development, but understanding the underlying protocol is important
Error Handling
try {
ChatResponse response = client.chat(message);
// Handle response
} catch (Exception e) {
// Common error handling
if (e.getMessage().contains("401")) {
System.err.println("Invalid API Key");
} else if (e.getMessage().contains("429")) {
System.err.println("Rate limit exceeded, please retry later");
} else if (e.getMessage().contains("500")) {
System.err.println("Model service error");
}
}
Summary
The OpenAI API protocol has become the de facto standard for LLM application development. Mastering this protocol’s input parameters, output parameters, and streaming/non-streaming output is key to understanding the underlying principles of AI Agent development.
Core takeaways:
- messages is the most important input parameter, defining conversation context
- stream determines streaming or non-streaming output, affecting user experience
- tools/tool_calls is the core of AI Agent development
- choices[].message.content and usage are the most important output fields
- Frameworks like Spring AI and LangChain all encapsulate this protocol
- Gateways like New API enable unified access to multiple models
Welcome to follow the WeChat Official Account FishTech Notes for more discussions!