Pricing
Charge model for text prompting and output?
Charged per Token in and Token out
Pricing
Charge model for embedding models?
Charged for every input token processed
Pricing
Charge model for image models?
Charge for each image generated
Pricing
How is Batch inference price comapred to non-batch?
Batch can be up to 50% cheaper
Pricing
How does Provisioned work?
Purchase model units for a certain time (1 month, 6 months, …)
Pricing
What do you get with Provisioned Throughput?
Guaranteed throughput (max tokens in/out per minute)
Pricing
What types of models support Provisioned Throughput?
Base, Fine-tuned, and custom models
Pricing
Cheapest of the Model Improvement techniques?
Prompt engineering – no training, nothing special
Pricing
Next-cheapest of the Model Improvement techniques?
RAG: model doesn’t have to know everything, stored externally
Pricing
Next cheapest of the model improvement techniques after RAG?
Instruction-based fine-tuning
Pricing
Most expensive of the Model Improvement techniques?
Domain adaptation fine-tuning
Pricing
How can Provisioned Throughput save costs?
Can’t, it’s reserved capacity
Pricing
How can Temperature save costs?
Can’t
Pricing
How can Top-K / Top-P save costs?
Can’t
Pricing
Main cost driver for LLMs?
Number of input and output tokens