Article·ai.georgeliu.com

llminfrastructureautomationdevopsopen-sourcelm-studiolm-studio's-headless-cliclaude-code

Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

Run Google's Gemma 4 LLM locally using LM Studio's headless CLI for privacy, cost savings, and offline development. Automate LLM interactions directly on your machine for rapid prototyping and custom agent building.

intermediate30 min4 steps

The play

Install LM Studio
Download and install the LM Studio application for your operating system from their official website. This provides the GUI for initial model downloads and the headless CLI functionality.
Download Gemma 4 Model
Open the LM Studio application. Use the built-in search functionality to find and download a compatible Gemma 4 model (e.g., 'gemma-2b-it-q4_k_m.gguf'). Ensure the model is fully downloaded before proceeding.
Start LM Studio Headless Server
Open your terminal or command prompt. Navigate to the directory where LM Studio is installed (or ensure it's in your PATH). Start the headless server, specifying the downloaded Gemma 4 model and the desired port. Replace `path/to/gemma-4-model.gguf` with the actual path to your downloaded model file and `1234` with your preferred port.
Interact with Gemma 4 via API
Once the LM Studio server is running, send API requests to your local Gemma 4 instance. Use a tool like `curl` or any HTTP client to interact with the model endpoint. The example below sends a simple chat completion request to the default `/v1/chat/completions` endpoint.

Starter code

lmstudio-cli start --model "path/to/gemma-4-model.gguf" --port 1234 && \
curl -X POST http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{ 
    "model": "gemma-4-model.gguf", 
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me a short story."}
    ],
    "temperature": 0.7
  }'

Source

Articleai.georgeliu.com