Billing overview | You are not charged for activating Alibaba Cloud Model Studio. But you are charged for model inference when using large language models (LLMs) for text generation. View bills: Go to Bill Details and Cost Analysis. View calling statistics: Go to Model Observation. | ||||||||||||||||
Billable items | Model inference (calling)
| ||||||||||||||||
Model inference (calling) | Overview & free quotaFor a complete list of prices and free quotas, see Models. For detailed performance information, see Throttling. You can view the number of calls and token consumption for a specific model in Model Observation. Note Learn about how to claim free quota and view remaining quota. Free quotaFor information about how to obtain free quota and check remaining free quota, see Free quota for new users. Flagship modelsFor prices and free quotas of other models, see Models.
Batch discountText generation models qwen-max, qwen-plus, qwen-turbo support batch calling. The cost for batch calling is 50% of real-time calling. Batch calling does not support discounts such as free quota or context cache. You can submit batch tasks as files for asynchronous execution. The system processes large-scale data offline during non-peak hours and returns the results when the task is completed or when the maximum wait time is reached. You can use batch inference tasks through console or API. Context cacheEnabling context cache does not require additional payment. If the system determines that your request hits the cache, the hit tokens will be charged as
The If you use the Batch mode, the discount of context cache is not available. For more information, see Context Cache. | ||||||||||||||||
FAQ | Billing rulesHow to calculate token count?Tokens are the basic units used by models to represent natural language text, which can be understood as "characters" or "words".
Different models may have different tokenization methods. You can use the SDK to view the tokenization data of the Qwen model locally.
You can use this local tokenizer to estimate the token amount of your text, but the result may not be completely consistent with the actual server. If you are interested in the details of the Qwen tokenizer, see Tokenization. How to view calling statistics?You can check the call count and token consumption for a specific model on the Model Observation page of the console. How is multi-round conversation billed?In multi-round conversations, the input and output from previous interactions are all billed as new input tokens. I created an LLM application and never used it. Am I billed for the application?No, you are not. Creating an application alone does not incur charges. You are only billed for model inference if you test or call the application. Cost managementHow to pay?If you encounter a balance shortage or overdue payment while using Model Studio, visit Expenses and Costs to pay. How to set monthly consumption alert?You can set quota alert in the Expenses and Costs center. How to stop the pay-as-you-go billing?You cannot stop pay-as-you-go billing. But as long as you stop using the features of Model Studio, you will not incur fees. To prevent unexpected API invocation fees, you can delete all your API Key. Additionally, you can set monthly consumption alert. You will be notified immediately in case of unexpected charges. About billsView the costs of Model Studio
View the costs of model inference
View the inference costs of a specific model
How to allocate costs based on payment details?Bills generated after September 7, 2024, can be allocated based on: workspace ID, model name, input/output type, and calling channel.
Calling channels include About APIAPI errors: Service activation or account balance1. Service not activated Use your Alibaba Cloud account to log on to Expenses and Costs. Activate Model Studio and claim the free quota. 2. Insufficient account balance
3. Set consumption alert to prevent repeated errors
|
Billing overview | You are not charged for activating Alibaba Cloud Model Studio. But you are charged for model inference when using large language models (LLMs) for text generation. View bills: Go to Bill Details and Cost Analysis. View calling statistics: Go to Model Observation. |
Billable items | |
Model inference (calling) | Overview & free quotaFree quotaFlagship modelsBatch discountContext cache |
FAQ | Billing rulesHow to calculate token count?How to view calling statistics?How is multi-round conversation billed?I created an LLM application and never used it. Am I billed for the application?Cost managementHow to pay?How to set monthly consumption alert?How to stop the pay-as-you-go billing?About billsView the costs of Model StudioView the costs of model inferenceView the inference costs of a specific modelHow to allocate costs based on payment details?About APIAPI errors: Service activation or account balance |