OpenAI API Protocol Complete Guide - The Universal Standard for LLM Application Development

Introduction

“The quality of your prompt determines the model’s output, but whether the model outputs correctly and how good the output is also depends on other parameters.”

This is a common confusion for many developers new to AI application development. Everyone knows the importance of writing good prompts, but understanding the various parameters in API calls remains a mystery.

In this article, I’ll guide you through the OpenAI API protocol in detail - the standard interface that has become the foundation for almost all LLM application development. Whether you use Spring AI, LangChain, or other agent frameworks, understanding this protocol is essential for advancing your AI development skills.

1. What is the OpenAI API Protocol

The OpenAI API protocol is a set of HTTP API interface specifications defined by OpenAI for interacting with large language models. Due to its first-mover advantage and wide range of applications, this protocol has become the de facto industry standard.

Why is it so important?

1. Industry Standard Status

Almost all mainstream LLMs are now fully compatible with OpenAI’s interface specifications. This means:

Learn one API, and you can call almost all LLMs on the market
Extremely low code migration cost - just change base_url and api_key
Various agent frameworks are built on top of this protocol

2. Framework Ecosystem Support

Mainstream agent frameworks have deeply encapsulated this protocol:

Spring AI / Spring AI Alibaba: First choice for Java ecosystem
LangChain: Most popular in Python/JavaScript ecosystem
AutoGen: Microsoft’s open-source multi-agent framework
LlamaIndex: Focused on RAG application development

But regardless of how frameworks encapsulate it, it all boils down to sending a formatted HTTP request to the LLM’s API endpoint. In fact, you can even develop agent applications using just an HttpClient.

3. API Gateway Ecosystem

Mainstream API gateways are all compatible with the OpenAI protocol:

New API: Open-source one-stop API management/distribution system
One API: Predecessor of New API
Various commercial relay services

2. Protocol Basics: API Endpoints and Authentication

Complete API Path Format

https://domain-address/v1/chat/completions

However, configuration varies slightly across different frameworks:

Framework/Platform	Configuration Method	Concatenation Rule
LangChain, AutoGen, etc.	Configure to `/v1/`	Framework auto-concatenates `chat/completions`
Spring AI	Configure before `/v1`	Framework auto-concatenates `/v1/chat/completions`
Direct HTTP call	Full path	No auto-concatenation

Common Model base_urls

Model Provider	base_url
OpenAI ChatGPT	https://api.openai.com/v1
Alibaba Cloud Qwen	https://dashscope.aliyuncs.com/compatible-mode/v1
DeepSeek	https://api.deepseek.com/v1
Zhipu AI (GLM)	https://open.bigmodel.cn/api/paas/v4
Moonshot (Kimi)	https://api.moonshot.cn/v1
Ollama Local Model	http://localhost:11434/v1
vLLM Local Deployment	http://localhost:8000/v1

Authentication Method

All requests need to carry an API Key in the HTTP request header:

Authorization: Bearer &lt;your-api-key&gt;

API Key is used for:

Identity verification
Access authorization
Billing statistics

3. Input Parameters Detailed Explanation

Complete Parameter Table

Parameter Name	Required	Data Type	Default	Range	Purpose	Description
model	✅ Required	String	-	-	Specify Model	Determines which specific LLM to use, e.g., `gpt-4`, `qwen-plus`, `deepseek-chat`
messages	✅ Required	Array	-	-	Message List	Conversation history or instructions passed to the model, containing `role` and `content`
temperature	❌ Optional	Float	1.0	0.0 - 2.0	Temperature Coefficient	Controls output randomness, low values are stable, high values are creative
top_p	❌ Optional	Float	1.0	0.0 - 1.0	Nucleus Sampling	Similar to temperature but different algorithm, recommend adjusting only one
max_tokens	❌ Optional	Integer	Model Max	-	Max Output Length	Limits the maximum number of tokens in a single response
n	❌ Optional	Integer	1	1-N	Generation Count	Generate n different responses for the same prompt
stream	❌ Optional	Boolean	false	true/false	Streaming Output	Whether to return tokens in streaming mode
stop	❌ Optional	String/Array	null	-	Stop Sequence	Stop generation immediately when specified string is encountered
presence_penalty	❌ Optional	Float	0	-2.0 - 2.0	Presence Penalty	Positive values encourage new topics, negative values reduce new topics
frequency_penalty	❌ Optional	Float	0	-2.0 - 2.0	Frequency Penalty	Positive values reduce repetition, negative values increase coherence
seed	❌ Optional	Integer	null	-	Reproducibility Seed	Improves result reproducibility (not 100% guaranteed)
tools	❌ Optional	Array	null	-	Tool Definition	Defines callable external tools/functions
tool_choice	❌ Optional	String/Object	auto	-	Tool Selection	Controls tool calling behavior
response_format	❌ Optional	Object	null	-	Response Format	Specifies output format like JSON
user	❌ Optional	String	null	-	User Identifier	Used to track and identify end users

Core Parameters Detailed

1. messages (Message List)

This is the most important parameter, defining the complete conversation context with the model. messages is an array where each element is a message object containing two core fields: role and content.

role field: Used to identify who said a message and its role in the conversation. The three most common roles are system, user, and assistant.

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I help you?"},
    {"role": "user", "content": "Please introduce yourself."}
  ]
}

Role Types Detailed Explanation:

Role	Purpose	Usage Scenario	Example
system	Set model behavior, role, and constraints	Place first in messages, define AI’s “persona” and response style	`"You are a professional Python programming assistant. Please provide code examples in your responses."`
user	User input or questions	Add to end of messages each time user asks a question	`"Please write a quicksort algorithm for me."`
assistant	Model’s historical responses	Save previous AI responses in multi-turn conversations to maintain context	`"Sure, here's the Python implementation of quicksort..."`
tool	Tool call return results	⚠️ Only for Function Calling, must immediately follow assistant message containing tool_calls	`{"role": "tool", "tool_call_id": "xxx", "content": "Beijing is sunny today, 25℃"}`

⚠️ Important Note: The tool role cannot be used alone. It must be a response to a previous assistant message’s tool_calls. If an assistant message doesn’t have tool_calls, you cannot follow it with a tool role message, otherwise it will cause an error.

Three Core Roles Explained:

🔹 system (System Role)

system messages are used to set the model’s “persona” and behavior specifications before the conversation begins. It’s like giving the AI a “job description”.

{"role": "system", "content": "You are a senior frontend engineer, proficient in React, Vue, TypeScript.\nWhen answering questions:\n1. Prioritize code examples\n2. Explain key concepts\n3. Provide best practice suggestions"}

Usage Tips:

Usually only one system message is placed at the beginning of the messages array
Can define role, tone, output format, prohibited items, etc.
The more specific the system prompt, the more the model’s output matches expectations
Some models follow system messages differently, test and adjust as needed

Common system prompt templates:

Scenario	System Prompt Example
Programming Assistant	`You are a {language} programming expert. Please provide code examples and comments in your responses.`
Translation Assistant	`You are a professional translator. Please translate user input into {target language}, maintaining the original tone and style.`
Copywriting	`You are a marketing copywriting expert. Please write copy in {style} style, keeping it within {word count} words.`
Data Analysis	`You are a data analyst. Please describe analysis data with clear tables and charts, providing professional insights.`
AI Agent	`You are an intelligent assistant that can call tools based on user needs. Available tools: {tool list}`

🔹 user (User Role)

user messages represent user input, the content the model needs to respond to. Each time a user asks a question, add the new message to the end of the messages array.

{"role": "user", "content": "Please explain what polymorphism is in Java?"}

Usage Tips:

Clear, specific questions get better answers
You can provide context or examples in user messages
Complex tasks can be broken into multiple user messages for step-by-step guidance

🔹 assistant (Assistant Role)

assistant messages represent the model’s historical responses. In multi-turn conversations, you need to add previous assistant responses to messages so the model “remembers” the conversation context.

{"role": "assistant", "content": "Polymorphism is one of the core concepts of object-oriented programming..."}

Usage Scenarios:

Multi-turn Conversation: Maintain conversation coherence

[
  {"role": "user", "content": "What is Java's polymorphism?"},
  {"role": "assistant", "content": "Polymorphism refers to the same operation acting on different objects..."},
  {"role": "user", "content": "Can you give me a code example?"}  // Model knows "it" refers to polymorphism
]

Few-shot Prompting: Guide model output format through examples

[
  {"role": "system", "content": "You are a sentiment analysis assistant, judge the sentiment tendency of user input."},
  {"role": "user", "content": "The weather is great today!"},
  {"role": "assistant", "content": "Positive sentiment"},
  {"role": "user", "content": "This movie is so boring."},
  {"role": "assistant", "content": "Negative sentiment"},
  {"role": "user", "content": "I just completed a project."}  // Model will learn to output "Neutral sentiment" or similar format
]

AI Agent Tool Calling Flow: assistant returns tool_calls, tool returns results

[
  {"role": "user", "content": "How's the weather in Beijing today?"},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"city\":\"Beijing\"}"}}]},
  {"role": "tool", "tool_call_id": "call_1", "content": "Beijing is sunny today, temperature 25℃"},
  {"role": "assistant", "content": "Beijing is sunny today with a temperature of 25 degrees Celsius, suitable for outdoor activities."}
]

⚠️ tool Role Usage Guidelines:

The tool role can only be used in Function Calling scenarios and must meet these requirements:

Requirement	Description
Prerequisite	The previous message must be assistant and contain `tool_calls` field
Required Fields	`role: "tool"`, `tool_call_id` (corresponding to id in tool_calls), `content` (tool return result)
Position Requirement	Must immediately follow the assistant message containing tool_calls

Wrong Example (will cause error):

// ❌ Wrong: tool message not preceded by tool_calls
[
  {"role": "user", "content": "Query weather"},
  {"role": "tool", "content": "Beijing sunny, 25℃"}  // Error! No assistant tool_calls before it
]

Correct Example:

// ✅ Correct: tool message immediately follows tool_calls
[
  {"role": "user", "content": "Query Beijing weather"},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "call_123", ...}]},
  {"role": "tool", "tool_call_id": "call_123", "content": "Beijing sunny, 25℃"}  // Correct!
]

2. temperature (Temperature Coefficient)

This is the key parameter controlling output diversity:

Temperature Value	Applicable Scenarios	Characteristics
0.1 - 0.3	Code generation, factual Q&A, legal text	High certainty, predictable
0.5 - 0.7	General conversation, translation, summarization	Balance between creativity and accuracy
0.8 - 1.0	Creative writing, brainstorming, marketing copy	High creativity, diversity
> 1.0	Extreme creativity scenarios	Output may be unpredictable

Recommendation: Usually default 0.7 is fine, set to 0.3 for more stable output.

3. stream (Streaming Output)

This parameter is crucial for user experience:

Value	Behavior	Applicable Scenarios
false	Wait for complete response then return all at once	Batch processing, background tasks
true	Return tokens one by one in streaming mode	Real-time conversation, improve user experience

Advantages of Streaming Output:

Users can see generated content in real-time, reducing waiting anxiety
Can interrupt unsatisfactory content early
More natural feel like human conversation

4. tools / tool_choice (Tool Calling)

This is the core parameter for AI Agent development, used for Function Calling:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a specified city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

tool_choice Options:

auto: Model automatically decides whether to call tools (default)
none: Do not call any tools
required: Must call tools
{"type": "function", "function": {"name": "xxx"}}: Force call specified tool

Parameter Usage Recommendations

Most common parameter combination for daily development:

{
  "model": "qwen-plus",
  "messages": [...],
  "stream": true,
  "temperature": 0.7
}

Scenarios requiring stable output:

{
  "model": "qwen-plus",
  "messages": [...],
  "stream": false,
  "temperature": 0.3,
  "seed": 42
}

AI Agent development scenarios:

{
  "model": "qwen-plus",
  "messages": [...],
  "tools": [...],
  "tool_choice": "auto",
  "stream": false
}

4. Output Parameters Detailed Explanation

Non-streaming Output Structure

When stream: false, the API returns a complete JSON response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am an AI assistant...",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

Non-streaming Output Parameter Table

Parameter Name	Data Type	Purpose	Description
id	String	Unique Identifier	Unique ID for this API request
object	String	Response Object Type	Fixed as `chat.completion`
model	String	Model Name	Actual model name used
created	Integer	Timestamp	Unix timestamp when API response was created
choices	Array	Output List	Contains all response options generated by the model
choices[i].index	Integer	Option Index	Index starting from 0
choices[i].message	Object	Message Content	Contains complete response information
choices[i].message.role	String	Role	Fixed as `assistant`
choices[i].message.content	String	Generated Text	All text content generated by the model
choices[i].message.tool_calls	Array	Tool Calls	AI Agent core: contains function names and parameters to call
choices[i].finish_reason	String	Stop Reason	`stop` (normal end), `length` (reached max_tokens), `tool_calls` (call tools)
usage	Object	Token Statistics	Core basis for billing
usage.prompt_tokens	Integer	Input Tokens	Number of tokens consumed by input messages
usage.completion_tokens	Integer	Output Tokens	Number of tokens consumed by model output
usage.total_tokens	Integer	Total Tokens	Total tokens consumed

Streaming Output Structure

When stream: true, the API returns data chunks through SSE (Server-Sent Events):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"You"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"Good"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"qwen-plus","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":20,"completion_tokens":3,"total_tokens":23}}

data: [DONE]

Streaming Output Parameter Table

Parameter Name	Data Type	Purpose	Description
id	String	Unique Identifier	Same in each chunk, identifies the same request
object	String	Response Object Type	Fixed as `chat.completion.chunk`
model	String	Model Name	Actual model name used
created	Integer	Timestamp	Unix timestamp when API response was created
choices	Array	Chunk Output List	Contains incremental content for this chunk
choices[i].index	Integer	Option Index	Index starting from 0
choices[i].delta	Object	Incremental Content	New content added in this data chunk
choices[i].delta.role	String	Role	Only appears in the first chunk
choices[i].delta.content	String	Incremental Text	Tiny portion of text added this time, needs concatenation
choices[i].delta.tool_calls	Array	Tool Call Incremental	Streaming tool calls, returned incrementally
choices[i].finish_reason	String	Stop Reason	Only returned in the last chunk
usage	Object	Token Statistics	Usually returned in the last chunk

finish_reason Complete Explanation

Value	Meaning	Handling Suggestion
stop	Normal completion	Output complete, can use directly
length	Reached max_tokens limit	Output truncated, may need to continue
tool_calls	Model requests to call tools	AI Agent core, need to execute tools and return results
content_filter	Content filtered by security	Input or output triggered security policy
function_call	Legacy function calling (deprecated)	Use tool_calls instead

5. Streaming vs Non-streaming Comparison

Comparison Dimension	Streaming Output (stream: true)	Non-streaming Output (stream: false)
Response Method	Return token by token	Return complete result at once
First Token Latency	Low, fast display	High, need to wait for complete generation
User Experience	Good, real-time feedback	Poor, need to wait
Implementation Complexity	Higher, need to handle SSE	Simple, normal HTTP request
Token Statistics	Returned in last chunk	Returned with response
Error Handling	Complex, may fail mid-stream	Simple, unified handling
Applicable Scenarios	Real-time conversation, chat applications	Batch processing, background tasks
Tool Calling	Incremental return	Complete return

6. Core Parameters in AI Agent Development

In AI Agent development, the following parameters are most critical:

1. messages - Conversation Context

Agents need to maintain complete conversation history, including:

System prompt (defining Agent behavior)
User messages
Assistant responses
Tool call records
Tool return results

2. tools / tool_calls - Tool Calling

This is the core for Agent interaction with the external world:

// Model requests to call tool
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\": \"Beijing\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

3. Multi-turn Conversation Flow

User Input → Model Judgment → Return tool_calls → Execute Tool →
Add Results to messages → Call Model Again → Return Final Answer

7. Spring AI and Spring AI Alibaba Integration

Spring AI Basic Configuration

# application.yml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      base-url: https://api.openai.com
      chat:
        options:
          model: gpt-4
          temperature: 0.7

Spring AI Alibaba Configuration

# application.yml
spring:
  ai:
    dashscope:
      api-key: ${DASHSCOPE_API_KEY}
      chat:
        options:
          model: qwen-plus
          temperature: 0.7

Code Example Comparison

Spring AI Call Example:

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    // Non-streaming call
    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
                .user(message)
                .call()
                .content();
    }

    // Streaming call
    @GetMapping("/chat/stream")
    public Flux&lt;String&gt; chatStream(@RequestParam String message) {
        return chatClient.prompt()
                .user(message)
                .stream()
                .content();
    }
}

Spring AI Complete Configuration Example:

@Configuration
public class AIConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("You are a helpful AI assistant.")
                .defaultOptions(ChatOptions.builder()
                        .model("qwen-plus")
                        .temperature(0.7)
                        .maxTokens(2000)
                        .build())
                .build();
    }
}

8. Native Java HTTP Call Examples

Non-streaming Call Example

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;

public class OpenAIClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;

    public OpenAIClient() {
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(30))
                .build();
    }

    /**
     * Non-streaming call
     */
    public String chat(String userMessage) throws Exception {
        // Build request body
        String requestBody = """
            {
              "model": "qwen-plus",
              "messages": [
                {"role": "system", "content": "You are a helpful AI assistant."},
                {"role": "user", "content": "%s"}
              ],
              "temperature": 0.7,
              "stream": false
            }
            """.formatted(userMessage);

        // Build request
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        // Send request
        HttpResponse&lt;String&gt; response = httpClient.send(
                request,
                HttpResponse.BodyHandlers.ofString()
        );

        // Parse response
        if (response.statusCode() == 200) {
            return parseContent(response.body());
        } else {
            throw new RuntimeException("API call failed: " + response.body());
        }
    }

    /**
     * Parse non-streaming response
     */
    private String parseContent(String responseBody) {
        // Simple parsing (recommend using Jackson/Gson in actual projects)
        int contentStart = responseBody.indexOf("\"content\":\"") + 11;
        int contentEnd = responseBody.indexOf("\"", contentStart);
        // Handle escape characters etc... should use JSON library
        return responseBody.substring(contentStart, contentEnd);
    }

    public static void main(String[] args) throws Exception {
        OpenAIClient client = new OpenAIClient();
        String response = client.chat("Hello, please introduce yourself.");
        System.out.println(response);
    }
}

Streaming Call Example

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.function.Consumer;

public class OpenAIStreamClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;

    public OpenAIStreamClient() {
        this.httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(30))
                .build();
    }

    /**
     * Streaming call
     * @param userMessage User message
     * @param onContent Callback for each content received
     */
    public void chatStream(String userMessage, Consumer&lt;String&gt; onContent) throws Exception {
        // Build request body
        String requestBody = """
            {
              "model": "qwen-plus",
              "messages": [
                {"role": "system", "content": "You are a helpful AI assistant."},
                {"role": "user", "content": "%s"}
              ],
              "temperature": 0.7,
              "stream": true
            }
            """.formatted(userMessage);

        // Build request
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .header("Accept", "text/event-stream")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        // Send streaming request
        HttpResponse&lt;java.util.stream.Stream&lt;String&gt;&gt; response = httpClient.send(
                request,
                HttpResponse.BodyHandlers.ofLines()
        );

        // Handle streaming response
        response.body().forEach(line -&gt; {
            if (line.startsWith("data: ") && !line.equals("data: [DONE]")) {
                String jsonData = line.substring(6);
                String content = extractDeltaContent(jsonData);
                if (content != null && !content.isEmpty()) {
                    onContent.accept(content);
                }
            }
        });
    }

    /**
     * Extract delta content from SSE data
     */
    private String extractDeltaContent(String jsonData) {
        // Simple parsing (recommend using Jackson/Gson in actual projects)
        try {
            int deltaStart = jsonData.indexOf("\"delta\":");
            if (deltaStart == -1) return null;

            int contentStart = jsonData.indexOf("\"content\":\"", deltaStart);
            if (contentStart == -1) return null;

            contentStart += 11;
            int contentEnd = jsonData.indexOf("\"", contentStart);

            return jsonData.substring(contentStart, contentEnd)
                    .replace("\\n", "\n")
                    .replace("\\\"", "\"")
                    .replace("\\\\", "\\");
        } catch (Exception e) {
            return null;
        }
    }

    public static void main(String[] args) throws Exception {
        OpenAIStreamClient client = new OpenAIStreamClient();

        System.out.println("AI Response:");
        client.chatStream("Please write a short poem about spring.", content -&gt; {
            System.out.print(content);  // Real-time print
        });
        System.out.println("\n--- Done ---");
    }
}

Complete Example Using Jackson

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;

public class OpenAIJsonClient {

    private static final String API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
    private static final String API_KEY = "your-api-key";

    private final HttpClient httpClient;
    private final ObjectMapper objectMapper;

    public OpenAIJsonClient() {
        this.httpClient = HttpClient.newHttpClient();
        this.objectMapper = new ObjectMapper();
    }

    // Request object
    public static class ChatRequest {
        public String model;
        public List&lt;Message&gt; messages;
        public Double temperature;
        public Boolean stream;

        public static class Message {
            public String role;
            public String content;

            public Message(String role, String content) {
                this.role = role;
                this.content = content;
            }
        }
    }

    // Response object
    public static class ChatResponse {
        public String id;
        public String object;
        public Long created;
        public String model;
        public List&lt;Choice&gt; choices;
        public Usage usage;

        public static class Choice {
            public Integer index;
            public Message message;
            public String finish_reason;

            public static class Message {
                public String role;
                public String content;
                @JsonProperty("tool_calls")
                public List&lt;ToolCall&gt; toolCalls;
            }
        }

        public static class Usage {
            @JsonProperty("prompt_tokens")
            public Integer promptTokens;
            @JsonProperty("completion_tokens")
            public Integer completionTokens;
            @JsonProperty("total_tokens")
            public Integer totalTokens;
        }

        public static class ToolCall {
            public String id;
            public String type;
            public Function function;

            public static class Function {
                public String name;
                public String arguments;
            }
        }
    }

    /**
     * Non-streaming call (using strongly typed objects)
     */
    public ChatResponse chat(String userMessage) throws Exception {
        // Build request
        ChatRequest request = new ChatRequest();
        request.model = "qwen-plus";
        request.temperature = 0.7;
        request.stream = false;
        request.messages = List.of(
                new ChatRequest.Message("system", "You are a helpful AI assistant."),
                new ChatRequest.Message("user", userMessage)
        );

        String requestBody = objectMapper.writeValueAsString(request);

        HttpRequest httpRequest = HttpRequest.newBuilder()
                .uri(URI.create(API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + API_KEY)
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        HttpResponse&lt;String&gt; response = httpClient.send(
                httpRequest,
                HttpResponse.BodyHandlers.ofString()
        );

        if (response.statusCode() == 200) {
            return objectMapper.readValue(response.body(), ChatResponse.class);
        } else {
            throw new RuntimeException("API call failed: " + response.body());
        }
    }

    public static void main(String[] args) throws Exception {
        OpenAIJsonClient client = new OpenAIJsonClient();

        ChatResponse response = client.chat("What is Java's polymorphism?");

        System.out.println("Model: " + response.model);
        System.out.println("Response: " + response.choices.get(0).message.content);
        System.out.println("Token Consumption: " + response.usage.totalTokens);
    }
}

9. New API Gateway and Anthropic Claude

New API Gateway

New API is currently the most popular open-source API management/distribution system, fully compatible with the OpenAI API protocol:

Core Features:

Supports multiple LLMs (OpenAI, Claude, Gemini, domestic models, etc.)
Unified OpenAI protocol interface
API Key management and billing
Channel management and load balancing

Usage:

Just point base_url to the New API server address:

String baseUrl = "https://your-new-api-server/v1";
// Or local deployment
String baseUrl = "http://localhost:3000/v1";

Anthropic Claude’s Independent System

It’s worth noting that Anthropic’s Claude series models use a self-contained API protocol that differs from the OpenAI protocol:

Comparison Item	OpenAI API	Anthropic API
Endpoint	`/v1/chat/completions`	`/v1/messages`
Message Format	`messages` array	`messages` + `system` separated
System Prompt	Placed in messages	Separate `system` parameter
Streaming Field	`choices[0].delta.content`	`delta.text`
Tool Calling	`tool_calls`	`tool_use` blocks in `content`

Anthropic API Example:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "system": "You are a helpful AI assistant.",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}

Compatibility Solution:

Most gateways (like New API) will automatically convert protocols, allowing you to call Claude using the OpenAI protocol. However, if you call the Anthropic official API directly, you need to follow its native protocol.

10. Best Practices Summary

Parameter Configuration Recommendations

Scenario	temperature	stream	Other Recommendations
Daily conversation	0.7	true	Default is fine
Code generation	0.3	false	Set seed for stability
Creative writing	0.9	true	Can combine with top_p
JSON output	0.3	false	Use response_format
AI Agent	0.7	false	Configure tools

Development Recommendations

Prioritize streaming output: Significantly improves user experience
Set temperature reasonably: Choose based on scenario, don’t blindly use default values
Monitor token consumption: usage field is the billing basis, monitor it well
Use seed parameter wisely: Use when stable output is needed
Framework encapsulation first: Frameworks like Spring AI simplify development, but understanding the underlying protocol is important

Error Handling

try {
    ChatResponse response = client.chat(message);
    // Handle response
} catch (Exception e) {
    // Common error handling
    if (e.getMessage().contains("401")) {
        System.err.println("Invalid API Key");
    } else if (e.getMessage().contains("429")) {
        System.err.println("Rate limit exceeded, please retry later");
    } else if (e.getMessage().contains("500")) {
        System.err.println("Model service error");
    }
}

Summary

The OpenAI API protocol has become the de facto standard for LLM application development. Mastering this protocol’s input parameters, output parameters, and streaming/non-streaming output is key to understanding the underlying principles of AI Agent development.

Core takeaways:

messages is the most important input parameter, defining conversation context
stream determines streaming or non-streaming output, affecting user experience
tools/tool_calls is the core of AI Agent development
choices[].message.content and usage are the most important output fields
Frameworks like Spring AI and LangChain all encapsulate this protocol
Gateways like New API enable unified access to multiple models

Welcome to follow the WeChat Official Account FishTech Notes for more discussions!