LIVE

ArthRoute Jyotish Assistant

A simple, classical, scholarly Vedic Astrology companion, grounded in 155 classical Jyotish texts, powered by a local LLM and a multi-stage RAG pipeline. Free from superstition, low on hallucination and classical text based in its reasoning.

Llama 3.1 8b RAG Pipeline Ollama ChromaDB BM25 + Embeddings Streamlit Python
155
Jyotish PDF Books
77K+
Text Chunks
8
RAG Pipeline Stages

🎯 Aim of the Assistant

🌕 The Aim of the Jyotish Assistant

When I started this project, the intention was simple yet profound: I wanted to build a LLM/RAG project to make myself familiar with all the steps, processes and terminologies involved in such projects. My other intention was to create a training / familiarisation use case for my team members — and in the end I should have a usable product. I am currently studying Astrology (Jyotish) and building a Jyotish Assistant under this project seemed like a very logical choice. And that's how I ended up creating a classical, scholarly, Jyotish companion — grounded on specific Jyotish classic texts to provide ancient wisdom with modern clarity, free from superstition and transparent in its reasoning.

This Jyotish Assistant is not a tool for prediction. It is a guide, designed for seekers, students, and practitioners who value depth over gimmick.

This is not a generic chatbot. It is a carefully architected system where each stage contributes to reliability, interpretability, and scholarly integrity.

Anyone who is interested in having a technical deep dive about this assistant development process and the steps which i followed can do so by clicking on the Technical Deep Dive button above, next to the Feature & UI button right above this section.

🖥️ Interface Walkthrough

🌒 The User Interface

⌨️ Query Input

A clean, and minimal input box invites natural questions like: "Explain Mars in the 7th house" or "What is Neecha Bhanga Raja Yoga?". The design is intentionally kept uncluttered to not to scare users and keep attention on the conversation.

🎚️ Answer Style Modes — Beginner & Scholar

The interface supports Answer Style modes that adapt to the tone and depth of responses:

  • Beginner Mode — gentle, explanatory, with more analogies, slower pacing, and minimal jargon.
  • Scholar Mode — denser, more technical, with Sanskrit terms, sutra‑style phrasing, and tighter commentary.

Both modes remain non‑fatalistic and text‑grounded; the difference is in how the insight is narrated, not in the underlying logic.

📜 Final Answer Panel

The assistant responds in a calm, classical tone with structured sections such as Core Principle, Classical References, Interpretive Meaning, and Practical Insight. Citations like [BPHS, p. 112] appear inline, and Sanskrit terms are explained clearly.

🧩 Fused Context Viewer

This panel shows how multiple classical texts were merged into a single coherent context. It reveals the "reasoning" behind the answer, building trust through transparency.

🔍 Compressed Context Viewer

A distilled, laser‑focused summary of the most relevant excerpts. This is the essence of the classical material that the assistant uses to answer the question.

📚 Source Summaries

Each classical text (BPHS, Saravali, Phaladeepika, Jaimini Sutras, etc.) gets its own fused summary. This preserves the voice and philosophy of each author and highlights where they agree, differ, or add unique nuance.

🕉️ Sanskrit Glossary

The assistant automatically extracts Sanskrit and technical Jyotish terms from the answer and generates a concise glossary. Terms like Neecha, Drishti, or Shadbala are defined in 1–2 classical, non‑fatalistic sentences.

🔎 Search Past Queries

A built‑in search capability lets the user revisit past queries and their answers. Built to quickly look up earlier explorations like "Saturn in the 8th" or "Argala in Jaimini" without re-asking.

This turns the assistant into a personal Jyotish notebook, where the journey of questions becomes a searchable knowledge trail.

🗂️ Categorized History & Learning

Past queries are saved under categories such as Grahas, Bhavas, Yogas, Dashas, or Remedies. Over time, the system can observe patterns of queries or study.

This enables a gentle form of learning from history — not by changing classical meanings, but by prioritizing explanations, examples, and cross‑links that match evolving interests and depth.

🖼️ Main Interface

A visual snapshot of the primary interface — query input, answer panel, and context viewers arranged in a minimal, focused layout. You can also notice the option available in the UI to select answer type from Beginner or Scholar mode — same grounding, different narrative depth.

Jyotish Assistant main interface screenshot

🎭 Answer Styles — Beginner & Scholar

This screenshot showcases how the final answer is output by the assistant. The length and depth of the final answer will depend upon the chosen mode — Beginner vs Scholar. The assistant also provides options for Focused, Compressed, Source Summaries and Sanskrit Glossary.

Jyotish Assistant answer styles screenshot

🕉️ Glossary — for Sanskrit Terms in Classics

This screenshot showcases the Glossary feature, which is very useful due to the heavy usage of Sanskrit terms in ancient classics and modern notes.

Sanskrit glossary feature screenshot

🔎 Learn from Past Queries

This screenshot showcases the Past Query feature — storing and retrieving past queries. The assistant also learns from past queries making every query a stepping stone.

Past queries feature screenshot
⭐ Feature Highlights

🌓 Key Features

⚖️ Hybrid Retrieval

Combines BM25 (lexical) and embeddings (semantic) to handle both Sanskrit terms and conceptual similarity. This ensures that classical terms and slightly nuanced questions are both captured effectively.

🎯 Cross‑Encoder Re‑Ranking

A cross‑encoder model re‑ranks the retrieved chunks, improving the quality of the top‑k context. This reduces noise and sharpens the relevance of the data used for answering.

📖 Source‑Aware Fusion

Each classical text is fused independently before cross‑source synthesis. This preserves authorial voice and avoids mixing interpretations prematurely.

🧬 Cross‑Source Synthesis

A meta‑fusion step synthesizes the fused summaries from multiple texts, highlighting agreements, d ifferences, and unique contributions — like a scholarly commentary.

🪶 Context Compression

The fused context is compressed into a concise, technically accurate summary. This improves answer quality and latency while keeping the reasoning grounded in the original texts.

📎 Citation‑Aware Answers

The final prompt includes an explicit list of allowed citations (e.g., BPHS, p. 112), ensuring that the assistant never hallucinates references and only cites what truly exists in the corpus.

🧪 Example Interactions

🌔 Example Queries

🔥 "Explain Mars in the 7th house."

The assistant explains the core principle (fire element in relationships), classical references, interpretive meaning, conditions (dignity, aspects), and practical insights — with citations and glossary entries for key terms.

👑 "What is Neecha Bhanga Raja Yoga?"

The assistant defines the yoga, lists classical rules for cancellation of debility, explains conditions, and offers structured interpretation — grounded in classical sources.

🪐 "Give remedies for Shani in the 8th."

The assistant responds in a non‑fatalistic, practical way, avoiding superstition and fear. It focuses on mindset, discipline, and constructive engagement with Saturnian themes, rather than deterministic predictions.

🧭 Design Philosophy

🌕 The Philosophy Behind the Assistant

This assistant is built on five core principles:

  • Clarity — no jargon without explanation.
  • Classical grounding — rooted in authoritative Jyotish texts.
  • Non‑fatalism — possibilities, not certainties.
  • Transparency — context, sources, and reasoning are visible.
  • Minimalism — calm, intentional, and free of noise.

It reflects the ethos of ArthRoute Studios: engineering precision, philosophical depth, and a deep respect for both tradition and the present moment.

📑 Technical Deep Dive — Quick Navigation

🧭 Mini Table of Contents

A quick navigation guide to all major engineering components of the Jyotish Assistant. Each item corresponds to a major section in the Technical Deep Dive.

  • 📁 Project Layout — Folder structure and module responsibilities.
  • 🤖 LLM Selection — Basis and steps for the selection of the LLM used for this assistant.
  • 🛠️ Building the RAG Pipeline — Steps describing how the RAG system was constructed.
  • 🧠 Architecture Overview — High‑level pipeline diagram.
  • 🧩 Detailed Technical Breakdown — Stage‑by‑stage engineering explanation.
  • 🧬 Model Choices — Embeddings, BM25, cross‑encoder, LLM.
  • 🧱 Build Journey — Step‑by‑step construction of the RAG system.
  • 📚 Teaching Value — What learners gain from this architecture.
  • 🔮 Future Directions — Planned extensions and improvements.
  • 🧘 Closing — Final reflections on design philosophy.
📁 Project Layout

🌓 Folder Structure

Project Structure
project/
│
├── app/
│   ├── rag_core.py        ← main RAG engine
│   ├── config.py          ← model + DB settings
│   ├── ingestion.py       ← PDF → chunks → embeddings
│   ├── utils.py           ← helpers
│   └── __init__.py
│
├── data/
│   └── chroma/            ← vector DB
│
├── ui_app.py              ← Streamlit UI
└── requirements.txt
🤖 LLM Selection & Installation

🧠 How the LLM Model Was Selected & Installed

Before building the RAG pipeline, I needed a local LLM that I could install and run from my laptop and which could serve as the "reasoning engine" of the assistant. This section explains how I went about selecting the LLM model, what criteria were considered, and how it was installed on the laptop for offline, private use.

🎯 1. Defining the Selection Criteria

The model needed to satisfy several constraints specific to a classical Jyotish assistant:

  • Local execution — full privacy, no cloud dependency.
  • Strong reasoning — able to follow structured prompts.
  • Low hallucination tendency — essential requirment for classical Jyotish GB VRAM and texts.
  • Efficient — must run smoothly on a laptop CPU/GPU which i had (i7, RTX5060 8GB VRAM, 32GB RAM) .
  • Good instruction following — for persona, citations, glossary.
  • Stable output — predictable behavior across queries.

These constraints ruled out extremely large models and pointed toward optimized, instruction‑tuned models that balance quality and performance.

🤖 2. Model Selected — Llama‑3.1 8b (via Ollama)

After evaluating several options (Mistral, Gemma, Phi, Qwen), and also listening to some of the chatbots, the chosen model was:

Llama‑3.1 8b (instruct variant), served locally through Ollama.

Why Llama‑3.1 8b: Well it has nothing to do with my intelligence — it's just that for my requirements, ChatGPT, Gemini and Claude all suggested this model, so I went with it. But I did provide them my requirements to come up with this recommendation.

💻 3. Installing the Model on the Laptop

The installation process was initially a bit tricky as I was not aware of some commands/syntax — and again chatbots came to my help. Now when I know it, it looks incredibly simple. The steps I followed were:

  • Install Ollama — the local LLM runtime.
  • Pull the Llama‑3 model using a single command (ollama pull llama3).
  • Verify the model by running a test prompt (ollama run llama3).
  • Integrate it with Python using a lightweight wrapper.

Once installed, the model could be invoked instantly from the RAG pipeline without any internet connection.

🧩 4. Important Engineering Considerations

Several practical decisions ensured the model behaved consistently and efficiently:

  • Quantization — using Q4/Q5 versions for speed without losing quality.
  • Temperature tuning — kept low to reduce hallucinations.
  • Max token limits — optimized for long classical context.
  • Prompt templates — enforced scholarly, vedic tone and citation rules.

These choices ensured the assistant remained reliable, classical, and grounded — even on modest hardware.

Selecting and installing the LLM was the foundation of the entire system. Once the model was ready, the next major step was constructing the RAG pipeline — ingestion, chunking, embeddings, vector storage, retrieval, fusion, and compression.

🧱 Building the RAG Pipeline

🛠️ How the RAG System Was Constructed Once the LLM Was Ready

Once the local LLM (Llama‑3 via Ollama) was installed, the next major milestone was building the entire Retrieval‑Augmented Generation (RAG) pipeline. This stage transformed the assistant from a generic language model into a scholarly, citation‑aware Jyotish system.

📚 1. Collecting Classical Jyotish Texts

The first step was gathering authoritative Jyotish sources. This was the easiest step for me as I already have a large collection of Jyotish classics and books in PDF format. I finally chose a total of 155 PDF books consisting of classics like BPHS, Saravali, Phaladeepika, Jaimini Sutras, Uttara Kalamrita, and curated notes. These PDFs became the raw knowledge base for the assistant.

Why this matters: The LLM alone does not "know" Jyotish. It needs classical text context.

📄 2. PDF Ingestion & Text Extraction

Using Python's pdfplumber library (built on top of pdfminer.six, but with cleaner APIs and more consistent text extraction), each PDF was parsed into clean text while preserving:

  • page numbers
  • source name
  • Sanskrit terms
  • paragraph boundaries

This metadata later enabled accurate citations like [BPHS, p. 112] I actually read a lot about this topic on internet before actually performing this step. thats how i know of all these terminologies involved.

🧩 3. Chunking the Text into Meaningful Units

As many of you already know that Chunking is the "hidden superpower" of any AI assistant. I initially tried libraries like LangChain's RecursiveCharacterTextSplitter, but it produced broken Sanskrit words, chunks that ignored page boundaries, inconsistent overlap and loss of metadata. To avoid this I researched and found guidance on building a custom chunker specifically for this type of application.

The chunking strategy used was:

  • Step 1 — Extract page‑level text using pdfplumber.
  • Step 2 — Normalize whitespace: remove double spaces, fix line breaks, preserve Sanskrit diacritics.
  • Step 3 — Split into paragraphs using blank lines and punctuation to detect natural boundaries.
  • Step 4 — Build chunks of ~250–350 tokens with an overlap of ~50 tokens.
  • Step 5 — Attach metadata: source, page, chunk_id, text, token_count.
  • Step 6 — Store in ChromaDB. Each chunk became a vector + metadata entry.

Final result: 77,059 chunks across all classical texts with an average of ~280 tokens per chunk — widely cited as the sweet spot for RAG with models like Llama 3.

Metadata stored per chunk: source, page, chunk_id, text.

🧬 4. Generating Embeddings for Each Chunk

Embedding is the process of converting text into a dense numerical vector — a list of floating‑point numbers — that captures the meaning of the text. Think of it as:

  • a mathematical fingerprint of meaning
  • a way to compare text semantically
  • the foundation of retrieval in RAG

Two texts with similar meaning → vectors close together. Two unrelated texts → vectors far apart. This is what allows the assistant to retrieve the right classical passages even when the user phrases the question differently.

I used a SentenceTransformer model "MiniLM‑L6‑v2" from the sentence-transformers library. This is a compact, fast, high‑quality embedding model ideal for local RAG. Each chunk was converted into a dense vector representing its meaning.

Why MiniLM: fast, accurate, lightweight — ideal for local RAG.

🗄️ 5. Storing Everything in ChromaDB

All chunks and embeddings were stored in ChromaDB, a local vector database. Each entry included: embedding vector, chunk text, source, page number, and chunk_id.

This created the searchable knowledge base that powers the assistant's classical grounding.

⚖️ 6. Implementing Hybrid Retrieval (BM25 + Embeddings)

My initial iterations were producing some unexpected results due to lots of Hindi and Sanskrit text. Research suggested combining:

  • BM25 for lexical matches (e.g., "Neecha", "Drishti", "Shadbala").
  • Embedding similarity for semantic matches.

This hybrid approach ensured the assistant retrieves the right classical passages every time.

🎯 7. Cross‑Encoder Re‑Ranking

Using cross‑encoder/ms-marco-MiniLM-L-6-v2, all retrieved chunks were scored against the query. This step removed noise and sharpened relevance.

Outcome: Only the top‑k most relevant classical excerpts move forward.

🔗 8. Fusion & Synthesis Pipeline

The ranked chunks were then processed through a multi‑stage fusion pipeline:

  • Source‑aware grouping (BPHS separate from Saravali, etc.)
  • Per‑source fusion (LLM summarizes each text independently)
  • Cross‑source synthesis (LLM compares and merges sources)
  • Context compression (LLM produces a concise, dense summary)

This is what gives the assistant its classical, scholarly, multi‑text perspective.

🧱 9. Building the Final Prompt for the LLM

The prompt builder assembled: user query, compressed context, allowed citations, persona rules, and glossary extraction instructions.

This ensured the LLM produced reasonable, classical, non‑fatalistic answers.

🖥️ 10. Integrating the RAG Pipeline with the UI

Finally, the entire RAG pipeline was connected to the Streamlit UI, enabling: answer panel, fused context viewer, compressed context viewer, Sanskrit glossary, past query search, and Beginner/Scholar answer styles.

At this stage, building the RAG pipeline is what transformed the assistant from a generic LLM into a classical Jyotish scholar capable of citing texts, comparing sources, and teaching with clarity.

🧠 System Architecture

🌒 Architecture Overview

High‑Level Flow
User Query
   ↓
Query Classifier
   ↓
Hybrid Retrieval (BM25 + Embeddings)
   ↓
Cross‑Encoder Re‑Ranking
   ↓
Source‑Aware Grouping
   ↓
Per‑Source Fusion
   ↓
Cross‑Source Fusion
   ↓
Context Compression
   ↓
Prompt Builder (with citations)
   ↓
LLM Answer
   ↓
Sanskrit Term Extraction
   ↓
Glossary Generation
   ↓
UI Rendering
🧩 Detailed Technical Breakdown — Stage‑By‑Stage Pipeline

🛠️ How the Entire System Works Internally

This section explains, in a technical yet beginner‑friendly way, how each stage of the Jyotish Assistant pipeline works — including tools, libraries, algorithms, data flow, prompt construction, and storage.

🧭 1. Query Classifier

Purpose: Identify what type of question the user is asking.

  • Uses a LLM call (Ollama Llama‑3) to classify the query.
  • Categories include: Graha, Bhava, Yoga, Dasha, Philosophy, Remedies, Misc.
  • Output is used to guide retrieval and later categorization.

Data Flow: User query → classifier → category label.

🔍 2. Hybrid Retrieval (BM25 + Embeddings)

Libraries Used: rank_bm25, sentence-transformers, ChromaDB.

  • BM25 retrieves lexical matches (Sanskrit terms, sutras, shlokas).
  • Embeddings (MiniLM L6-v2) retrieve semantic matches.
  • ChromaDB stores vectors and metadata (source, page, chunk ID).

Data Flow: Query → BM25 + Embedding search → merged candidate chunks.

🎯 3. Cross‑Encoder Re‑Ranking

Model: cross-encoder/ms-marco-MiniLM-L-6-v2.

Each candidate chunk is paired with the query and scored. This step removes noise and ensures only the most relevant classical text excerpts are used.

Data Flow: Candidate chunks → cross‑encoder → top‑k ranked chunks.

📚 4. Source‑Aware Grouping

Retrieved chunks are grouped by source (BPHS, Saravali, Phaladeepika, Jaimini, etc.). This prevents mixing authors prematurely.

Data Flow: Ranked chunks → grouped by source.

🧬 5. Per‑Source Fusion

For each classical text, the LLM produces a fused summary of all relevant chunks from that text.

Data Flow: Grouped chunks → LLM → fused summaries per source.

🔗 6. Cross‑Source Synthesis

The fused summaries from all sources are combined into a single scholarly synthesis. The LLM is instructed to highlight agreements, differences, and unique contributions.

Data Flow: Per‑source summaries → LLM → unified synthesis.

🪶 7. Context Compression

The unified synthesis is compressed into a short, dense, technically accurate summary. This improves answer quality and reduces token usage.

Data Flow: Synthesis → LLM → compressed context.

🧱 8. Prompt Builder (with Citations)

Modules Used: rag_core.py, config.py.

  • Injects: user query, compressed context, allowed citations, persona rules.
  • Ensures the LLM cannot hallucinate page numbers.
  • Includes instructions for glossary extraction.

Data Flow: Context + citations + query → final prompt.

🤖 9. LLM Answer Generation

Model: Llama‑3 (via Ollama).

The LLM produces a structured answer with: Core Principle, Classical References, Interpretive Meaning, Practical Insight, and Citations.

🕉️ 10. Sanskrit Glossary Extraction

The LLM extracts Sanskrit terms from the answer and generates concise definitions. This makes the assistant accessible to beginners.

🗂️ 11. Past Query Storage

Storage: local JSON file (past_queries.json).

  • Stores query, answer, timestamp, category, embedding vector.
  • Used for search and personalization.

🔎 12. Search Past Queries

Search Method: Hybrid (BM25 + embedding similarity).

This allows the assistant to retrieve earlier questions and build continuity across sessions.

🖥️ 13. UI Rendering

Framework: Streamlit.

  • Renders answer panel, fused context, compressed context, glossary.
  • Displays categories and past query history.
  • Implements Beginner/Scholar answer style toggle.
🧬 Model Choices

🌔 Models Used & Why

🧱 SentenceTransformer Embeddings

A MiniLM‑based SentenceTransformer model is used for embeddings. It offers a strong balance between speed and semantic quality, ideal for local or resource‑constrained environments.

📎 BM25 Lexical Retrieval

BM25 handles exact and near‑exact matches, especially important for Sanskrit and technical terms. It complements embeddings by capturing lexical signals that pure semantic models may miss.

🎯 Cross‑Encoder Re‑Ranker

A cross‑encoder model re‑ranks candidate chunks based on query‑document pairs. This significantly improves the quality of the top‑k context, reducing noise and sharpening relevance.

🤖 LLM via Ollama

The LLM is served locally via Ollama, providing control, privacy, and predictable behavior. The prompt is carefully engineered to enforce a classical, non‑fatalistic, scholarly persona.

🧱 Build Journey

🌕 Step‑By‑Step Build Process

📄 Step 1 — PDF Ingestion

Challenge: PDFs had inconsistent formatting, mixed Sanskrit and English, and varying page layouts.

Solution: Custom chunking with page‑level metadata (including source and page).

Why: Citations like [BPHS, p. 112] require reliable page‑level information.

⚖️ Step 2 — Hybrid Retrieval

Challenge: Embeddings alone sometimes missed classical terms or transliterations.

Solution: Combine BM25 (lexical) with embeddings (semantic).

Why: Hybrid retrieval captures both exact terms and conceptual similarity, crucial for Jyotish terminology.

🎯 Step 3 — Re‑Ranking

Challenge: Even hybrid retrieval returned some noisy or tangential chunks.

Solution: Use a cross‑encoder to re‑rank the retrieved chunks based on query‑document relevance.

Why: This improves the quality of the top‑k context, which directly affects answer quality.

📖 Step 4 — Source‑Aware Fusion

Challenge: Mixing content from different texts too early blurred authorial voices and caused subtle hallucinations.

Solution: Group chunks by source and fuse each text independently.

Why: This preserves the intent and style of each classical author.

🧬 Step 5 — Cross‑Source Synthesis

Challenge: Classical texts sometimes differ or emphasize different aspects.

Solution: A meta‑fusion step synthesizes per‑source summaries, explicitly instructed to respect differences and highlight unique contributions.

Why: This mirrors how a human scholar would compare texts.

🪶 Step 6 — Context Compression

Challenge: Fused context could still be long and redundant.

Solution: Use the LLM to compress the fused context into a concise, technically accurate summary.

Why: Shorter, focused context improves answer quality and latency.

📎 Step 7 — Citation‑Aware Answer Generation

Challenge: The LLM could hallucinate citations or invent page numbers.

Solution: Build a citation list from metadata and pass it explicitly into the final prompt, instructing the model to only use those citations.

Why: This enforces traceability and scholarly integrity.

🕉️ Step 8 — Sanskrit Glossary Extraction

Challenge: Sanskrit and technical terms could be opaque to beginners.

Solution: Extract terms from the final answer and generate a concise glossary section.

Why: This makes the assistant accessible to both beginners and intermediate students.

📚 Teaching Value

🌖 Lessons Learned

Retrieval quality matters more than model size.

Fusion architecture determines coherence and faithfulness.

Transparency (showing context, sources, and citations) builds trust.

Classical texts require careful prompting.

Minimalism in UI and behavior leads to clarity and focus.

This project doubles as a training blueprint for anyone who wants to learn how to build LLM‑powered assistants with real‑world constraints and real classical content.

🔮 Future Directions

🌗 Future Extensions

  • Prashna‑style interactive flows.
  • Confidence scoring for retrieval and answers.
  • Multi‑lingual support (e.g., Hindi + English).

Each of these can be layered on top of the existing architecture without breaking its core design principles.

🧘 Closing

🌘 Conclusion

This Jyotish assistant is a living example of how modern AI techniques, available coding assistants, eagerness to learn & build and intentional design can come together.

On a deeper level, this Jyotish Assistant is a bridge between ancient wisdom and modern intelligence — built not as a gimmick, but as a thoughtful companion for serious seekers. Apart from that it definitely served its purpose as a demonstrator use case for anyone aspiring to build assistants with LLMs and RAG pipelines.


Explore More Projects 🛠️

See the other products being built under the ArthRoute Studios banner, or reach out to discuss this project.