Article·ai.georgeliu.com
llminfrastructureautomationdevopsopen-sourcelm-studiolm-studio's-headless-cliclaude-code
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code
Run Google's Gemma 4 LLM locally using LM Studio's headless CLI for privacy, cost savings, and offline development. Automate LLM interactions directly on your machine for rapid prototyping and custom agent building.
intermediate30 min4 steps
The play
- Install LM StudioDownload and install the LM Studio application for your operating system from their official website. This provides the GUI for initial model downloads and the headless CLI functionality.
- Download Gemma 4 ModelOpen the LM Studio application. Use the built-in search functionality to find and download a compatible Gemma 4 model (e.g., 'gemma-2b-it-q4_k_m.gguf'). Ensure the model is fully downloaded before proceeding.
- Start LM Studio Headless ServerOpen your terminal or command prompt. Navigate to the directory where LM Studio is installed (or ensure it's in your PATH). Start the headless server, specifying the downloaded Gemma 4 model and the desired port. Replace `path/to/gemma-4-model.gguf` with the actual path to your downloaded model file and `1234` with your preferred port.
- Interact with Gemma 4 via APIOnce the LM Studio server is running, send API requests to your local Gemma 4 instance. Use a tool like `curl` or any HTTP client to interact with the model endpoint. The example below sends a simple chat completion request to the default `/v1/chat/completions` endpoint.
Starter code
lmstudio-cli start --model "path/to/gemma-4-model.gguf" --port 1234 && \
curl -X POST http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-4-model.gguf",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a short story."}
],
"temperature": 0.7
}'Source