Hackers News

Ollama – NSHipster

“Only Apple can do this”
Variously attributed to Tim Cook

Apple introduced Apple Intelligence at WWDC 2024.
After waiting almost a year for Apple to,
in Craig Federighi’s words, “get it right”,
its promise of “AI for the rest of us” feels just as distant as ever.

While we wait for Apple Intelligence to arrive on our devices,
something remarkable is already running on our Macs.
Think of it as a locavore approach to artificial intelligence:
homegrown, sustainable, and available year-round.

This week on NSHipster,
we’ll look at how you can use Ollama to run
LLMs locally on your Mac —
both as an end-user and as a developer.


Homebrew
or directly from their website.
Then pull and run llama3.2 (2GB).

$ brew install --cask ollama
$ ollama run llama3.2
>>> Tell me a joke about Swift programming.
What's a Apple developer's favorite drink? 
The Kool-Aid.

Under the hood,
Ollama is powered by llama.cpp.
But where llama.cpp provides the engine,
Ollama gives you a vehicle you’d actually want to drive —
handling all the complexity of model management, optimization, and inference.

Similar to how Dockerfiles define container images,
Ollama uses Modelfiles to configure model behavior:

FROM mistral:latest
PARAMETER temperature 0.7
TEMPLATE """
You are a helpful assistant.

User: 
Assistant: """

Ollama uses the Open Container Initiative (OCI)
standard to distribute models.
Each model is split into layers and described by a manifest,
the same approach used by Docker containers:

{
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.ollama.image.config.v1+json",
    "digest": "sha256:..."
  },
  "layers": [
    {
      "mediaType": "application/vnd.ollama.image.layer.v1+json",
      "digest": "sha256:...",
      "size": 4019248935
    }
  ]
}

Overall, Ollama’s approach is thoughtful and well-engineered.
And best of all, it just works.

Jevons paradox states that,
as something becomes more efficient, we tend to use more of it, not less.

Having AI on your own device changes everything.
When computation becomes essentially free,
you start to see intelligence differently.

While frontier models like GPT-4 and Claude are undeniably miraculous,
there’s something to be said for the small miracle of running open models locally.

  • Privacy:
    Your data never leaves your device.
    Essential for working with sensitive information.
  • Cost:
    Run 24/7 without usage meters ticking.
    No more rationing prompts like ’90s cell phone minutes.
    Just a fixed, up-front cost for unlimited inference.
  • Latency:
    No network round-trips means faster responses.
    Your /M\d Mac((Book( Pro| Air)?)|Mini|Studio)/ can easily generate dozens of tokens per second.
    (Try to keep up!)
  • Control:
    No black-box RLHF or censorship.
    The AI works for you, not the other way around.
  • Reliability:
    No outages or API quota limits.
    100% uptime for your exocortex.
    Like having Wikipedia on a thumb drive.

HTTP API on port 11431
(leetspeak for llama 🦙).
This makes it easy to integrate with any programming language or tool.

To that end, we’ve created the Ollama Swift package
to help developers integrate Ollama into their apps.

Embeddings
convert text into high-dimensional vectors that capture semantic meaning.
These vectors can be used to find similar content or perform semantic search.

For example, if you wanted to find documents similar to a user’s query:

let documents: [String] = 

// Convert text into vectors we can compare for similarity
let embeddings = try await client.embeddings(
    model: "nomic-embed-text", 
    texts: documents
)

/// Finds relevant documents
func findRelevantDocuments(
    for query: String, 
    threshold: Float = 0.7, // cutoff for matching, tunable
    limit: Int = 5
) async throws -> [String] {
    // Get embedding for the query
    let [queryEmbedding] = try await client.embeddings(
        model: "llama3.2",
        texts: [query]
    )
 
    // See: https://en.wikipedia.org/wiki/Cosine_similarity
    func cosineSimilarity(_ a: [Float], _ b: [Float]) -> Float {
        let dotProduct = zip(a, b).map(*).reduce(0, +)
        let magnitude = { sqrt($0.map { $0 * $0 }.reduce(0, +)) }
        return dotProduct / (magnitude(a) * magnitude(b))
    }
    
    // Find documents above similarity threshold
    let rankedDocuments = zip(embeddings, documents)
        .map { embedding, document in
            (similarity: cosineSimilarity(embedding, queryEmbedding),
             document: document)
        }
        .filter { $0.similarity >= threshold }
        .sorted { $0.similarity > $1.similarity }
        .prefix(limit)
    
    return rankedDocuments.map(\.document)
}

Nominate
is a macOS app that uses Ollama to intelligently rename PDF files based on their contents.

Like many of us striving for a paperless lifestyle,
you might find yourself scanning documents only to end up with
cryptically-named PDFs like Scan2025-02-03_123456.pdf.
Nominate solves this by combining AI with traditional NLP techniques
to automatically generate descriptive filenames based on document contents.

The app leverages several technologies we’ve discussed:

  • Ollama’s API for content analysis via the ollama-swift package
  • Apple’s PDFKit for OCR
  • The Natural Language framework for text processing
  • Foundation’s DateFormatter for parsing dates

admin

The realistic wildlife fine art paintings and prints of Jacquie Vaux begin with a deep appreciation of wildlife and the environment. Jacquie Vaux grew up in the Pacific Northwest, soon developed an appreciation for nature by observing the native wildlife of the area. Encouraged by her grandmother, she began painting the creatures she loves and has continued for the past four decades. Now a resident of Ft. Collins, CO she is an avid hiker, but always carries her camera, and is ready to capture a nature or wildlife image, to use as a reference for her fine art paintings.

Related Articles

Leave a Reply