A simple, classical, scholarly Vedic Astrology companion, grounded in 155 classical Jyotish texts, powered by a local LLM and a multi-stage RAG pipeline. Free from superstition, low on hallucination and classical text based in its reasoning.
When I started this project, the intention was simple yet profound: I wanted to build a LLM/RAG project to make myself familiar with all the steps, processes and terminologies involved in such projects. My other intention was to create a training / familiarisation use case for my team members — and in the end I should have a usable product. I am currently studying Astrology (Jyotish) and building a Jyotish Assistant under this project seemed like a very logical choice. And that's how I ended up creating a classical, scholarly, Jyotish companion — grounded on specific Jyotish classic texts to provide ancient wisdom with modern clarity, free from superstition and transparent in its reasoning.
This Jyotish Assistant is not a tool for prediction. It is a guide, designed for seekers, students, and practitioners who value depth over gimmick.
This is not a generic chatbot. It is a carefully architected system where each stage contributes to reliability, interpretability, and scholarly integrity.
Anyone who is interested in having a technical deep dive about this assistant development process and the steps which i followed can do so by clicking on the Technical Deep Dive button above, next to the Feature & UI button right above this section.
A clean, and minimal input box invites natural questions like: "Explain Mars in the 7th house" or "What is Neecha Bhanga Raja Yoga?". The design is intentionally kept uncluttered to not to scare users and keep attention on the conversation.
The interface supports Answer Style modes that adapt to the tone and depth of responses:
Both modes remain non‑fatalistic and text‑grounded; the difference is in how the insight is narrated, not in the underlying logic.
The assistant responds in a calm, classical tone with structured sections such as Core Principle, Classical References, Interpretive Meaning, and Practical Insight. Citations like [BPHS, p. 112] appear inline, and Sanskrit terms are explained clearly.
This panel shows how multiple classical texts were merged into a single coherent context. It reveals the "reasoning" behind the answer, building trust through transparency.
A distilled, laser‑focused summary of the most relevant excerpts. This is the essence of the classical material that the assistant uses to answer the question.
Each classical text (BPHS, Saravali, Phaladeepika, Jaimini Sutras, etc.) gets its own fused summary. This preserves the voice and philosophy of each author and highlights where they agree, differ, or add unique nuance.
The assistant automatically extracts Sanskrit and technical Jyotish terms from the answer and generates a concise glossary. Terms like Neecha, Drishti, or Shadbala are defined in 1–2 classical, non‑fatalistic sentences.
A built‑in search capability lets the user revisit past queries and their answers. Built to quickly look up earlier explorations like "Saturn in the 8th" or "Argala in Jaimini" without re-asking.
This turns the assistant into a personal Jyotish notebook, where the journey of questions becomes a searchable knowledge trail.
Past queries are saved under categories such as Grahas, Bhavas, Yogas, Dashas, or Remedies. Over time, the system can observe patterns of queries or study.
This enables a gentle form of learning from history — not by changing classical meanings, but by prioritizing explanations, examples, and cross‑links that match evolving interests and depth.
A visual snapshot of the primary interface — query input, answer panel, and context viewers arranged in a minimal, focused layout. You can also notice the option available in the UI to select answer type from Beginner or Scholar mode — same grounding, different narrative depth.
This screenshot showcases how the final answer is output by the assistant. The length and depth of the final answer will depend upon the chosen mode — Beginner vs Scholar. The assistant also provides options for Focused, Compressed, Source Summaries and Sanskrit Glossary.
This screenshot showcases the Glossary feature, which is very useful due to the heavy usage of Sanskrit terms in ancient classics and modern notes.
This screenshot showcases the Past Query feature — storing and retrieving past queries. The assistant also learns from past queries making every query a stepping stone.
Combines BM25 (lexical) and embeddings (semantic) to handle both Sanskrit terms and conceptual similarity. This ensures that classical terms and slightly nuanced questions are both captured effectively.
A cross‑encoder model re‑ranks the retrieved chunks, improving the quality of the top‑k context. This reduces noise and sharpens the relevance of the data used for answering.
Each classical text is fused independently before cross‑source synthesis. This preserves authorial voice and avoids mixing interpretations prematurely.
A meta‑fusion step synthesizes the fused summaries from multiple texts, highlighting agreements, d ifferences, and unique contributions — like a scholarly commentary.
The fused context is compressed into a concise, technically accurate summary. This improves answer quality and latency while keeping the reasoning grounded in the original texts.
The final prompt includes an explicit list of allowed citations (e.g., BPHS, p. 112), ensuring that the assistant never hallucinates references and only cites what truly exists in the corpus.
The assistant explains the core principle (fire element in relationships), classical references, interpretive meaning, conditions (dignity, aspects), and practical insights — with citations and glossary entries for key terms.
The assistant defines the yoga, lists classical rules for cancellation of debility, explains conditions, and offers structured interpretation — grounded in classical sources.
The assistant responds in a non‑fatalistic, practical way, avoiding superstition and fear. It focuses on mindset, discipline, and constructive engagement with Saturnian themes, rather than deterministic predictions.
This assistant is built on five core principles:
It reflects the ethos of ArthRoute Studios: engineering precision, philosophical depth, and a deep respect for both tradition and the present moment.
A quick navigation guide to all major engineering components of the Jyotish Assistant. Each item corresponds to a major section in the Technical Deep Dive.
project/ │ ├── app/ │ ├── rag_core.py ← main RAG engine │ ├── config.py ← model + DB settings │ ├── ingestion.py ← PDF → chunks → embeddings │ ├── utils.py ← helpers │ └── __init__.py │ ├── data/ │ └── chroma/ ← vector DB │ ├── ui_app.py ← Streamlit UI └── requirements.txt
Before building the RAG pipeline, I needed a local LLM that I could install and run from my laptop and which could serve as the "reasoning engine" of the assistant. This section explains how I went about selecting the LLM model, what criteria were considered, and how it was installed on the laptop for offline, private use.
The model needed to satisfy several constraints specific to a classical Jyotish assistant:
These constraints ruled out extremely large models and pointed toward optimized, instruction‑tuned models that balance quality and performance.
After evaluating several options (Mistral, Gemma, Phi, Qwen), and also listening to some of the chatbots, the chosen model was:
Llama‑3.1 8b (instruct variant), served locally through Ollama.
Why Llama‑3.1 8b: Well it has nothing to do with my intelligence — it's just that for my requirements, ChatGPT, Gemini and Claude all suggested this model, so I went with it. But I did provide them my requirements to come up with this recommendation.
The installation process was initially a bit tricky as I was not aware of some commands/syntax — and again chatbots came to my help. Now when I know it, it looks incredibly simple. The steps I followed were:
ollama pull llama3).ollama run llama3).Once installed, the model could be invoked instantly from the RAG pipeline without any internet connection.
Several practical decisions ensured the model behaved consistently and efficiently:
These choices ensured the assistant remained reliable, classical, and grounded — even on modest hardware.
Selecting and installing the LLM was the foundation of the entire system. Once the model was ready, the next major step was constructing the RAG pipeline — ingestion, chunking, embeddings, vector storage, retrieval, fusion, and compression.
Once the local LLM (Llama‑3 via Ollama) was installed, the next major milestone was building the entire Retrieval‑Augmented Generation (RAG) pipeline. This stage transformed the assistant from a generic language model into a scholarly, citation‑aware Jyotish system.
The first step was gathering authoritative Jyotish sources. This was the easiest step for me as I already have a large collection of Jyotish classics and books in PDF format. I finally chose a total of 155 PDF books consisting of classics like BPHS, Saravali, Phaladeepika, Jaimini Sutras, Uttara Kalamrita, and curated notes. These PDFs became the raw knowledge base for the assistant.
Why this matters: The LLM alone does not "know" Jyotish. It needs classical text context.
Using Python's pdfplumber library (built on top of pdfminer.six, but with cleaner APIs and more consistent text extraction), each PDF was parsed into clean text while preserving:
This metadata later enabled accurate citations like [BPHS, p. 112] I actually read a lot about this topic on internet before actually performing this step. thats how i know of all these terminologies involved.
As many of you already know that Chunking is the "hidden superpower" of any AI assistant. I initially tried libraries like LangChain's RecursiveCharacterTextSplitter, but it produced broken Sanskrit words, chunks that ignored page boundaries, inconsistent overlap and loss of metadata. To avoid this I researched and found guidance on building a custom chunker specifically for this type of application.
The chunking strategy used was:
Final result: 77,059 chunks across all classical texts with an average of ~280 tokens per chunk — widely cited as the sweet spot for RAG with models like Llama 3.
Metadata stored per chunk: source, page, chunk_id, text.
Embedding is the process of converting text into a dense numerical vector — a list of floating‑point numbers — that captures the meaning of the text. Think of it as:
Two texts with similar meaning → vectors close together. Two unrelated texts → vectors far apart. This is what allows the assistant to retrieve the right classical passages even when the user phrases the question differently.
I used a SentenceTransformer model "MiniLM‑L6‑v2" from the sentence-transformers library. This is a compact, fast, high‑quality embedding model ideal for local RAG. Each chunk was converted into a dense vector representing its meaning.
Why MiniLM: fast, accurate, lightweight — ideal for local RAG.
All chunks and embeddings were stored in ChromaDB, a local vector database. Each entry included: embedding vector, chunk text, source, page number, and chunk_id.
This created the searchable knowledge base that powers the assistant's classical grounding.
My initial iterations were producing some unexpected results due to lots of Hindi and Sanskrit text. Research suggested combining:
This hybrid approach ensured the assistant retrieves the right classical passages every time.
Using cross‑encoder/ms-marco-MiniLM-L-6-v2, all retrieved chunks were scored against the query. This step removed noise and sharpened relevance.
Outcome: Only the top‑k most relevant classical excerpts move forward.
The ranked chunks were then processed through a multi‑stage fusion pipeline:
This is what gives the assistant its classical, scholarly, multi‑text perspective.
The prompt builder assembled: user query, compressed context, allowed citations, persona rules, and glossary extraction instructions.
This ensured the LLM produced reasonable, classical, non‑fatalistic answers.
Finally, the entire RAG pipeline was connected to the Streamlit UI, enabling: answer panel, fused context viewer, compressed context viewer, Sanskrit glossary, past query search, and Beginner/Scholar answer styles.
At this stage, building the RAG pipeline is what transformed the assistant from a generic LLM into a classical Jyotish scholar capable of citing texts, comparing sources, and teaching with clarity.
User Query ↓ Query Classifier ↓ Hybrid Retrieval (BM25 + Embeddings) ↓ Cross‑Encoder Re‑Ranking ↓ Source‑Aware Grouping ↓ Per‑Source Fusion ↓ Cross‑Source Fusion ↓ Context Compression ↓ Prompt Builder (with citations) ↓ LLM Answer ↓ Sanskrit Term Extraction ↓ Glossary Generation ↓ UI Rendering
This section explains, in a technical yet beginner‑friendly way, how each stage of the Jyotish Assistant pipeline works — including tools, libraries, algorithms, data flow, prompt construction, and storage.
Purpose: Identify what type of question the user is asking.
Data Flow: User query → classifier → category label.
Libraries Used: rank_bm25, sentence-transformers, ChromaDB.
Data Flow: Query → BM25 + Embedding search → merged candidate chunks.
Model: cross-encoder/ms-marco-MiniLM-L-6-v2.
Each candidate chunk is paired with the query and scored. This step removes noise and ensures only the most relevant classical text excerpts are used.
Data Flow: Candidate chunks → cross‑encoder → top‑k ranked chunks.
Retrieved chunks are grouped by source (BPHS, Saravali, Phaladeepika,
Jaimini, etc.). This prevents mixing authors prematurely.
Data Flow: Ranked chunks → grouped by source.
For each classical text, the LLM produces a fused summary of all relevant chunks from that text.
Data Flow: Grouped chunks → LLM → fused summaries per source.
The fused summaries from all sources are combined into a single scholarly synthesis. The LLM is instructed to highlight agreements, differences, and unique contributions.
Data Flow: Per‑source summaries → LLM → unified synthesis.
The unified synthesis is compressed into a short, dense, technically accurate summary. This improves answer quality and reduces token usage.
Data Flow: Synthesis → LLM → compressed context.
Modules Used: rag_core.py, config.py.
Data Flow: Context + citations + query → final prompt.
Model: Llama‑3 (via Ollama).
The LLM produces a structured answer with: Core Principle, Classical References, Interpretive Meaning, Practical Insight, and Citations.
The LLM extracts Sanskrit terms from the answer and generates concise definitions. This makes the assistant accessible to beginners.
Storage: local JSON file (past_queries.json).
Search Method: Hybrid (BM25 + embedding similarity).
This allows the assistant to retrieve earlier questions and build continuity across sessions.
Framework: Streamlit.
A MiniLM‑based SentenceTransformer model is used for embeddings. It offers a strong balance between speed and semantic quality, ideal for local or resource‑constrained environments.
BM25 handles exact and near‑exact matches, especially important for Sanskrit and technical terms. It complements embeddings by capturing lexical signals that pure semantic models may miss.
A cross‑encoder model re‑ranks candidate chunks based on query‑document pairs. This significantly improves the quality of the top‑k context, reducing noise and sharpening relevance.
The LLM is served locally via Ollama, providing control, privacy, and predictable behavior. The prompt is carefully engineered to enforce a classical, non‑fatalistic, scholarly persona.
Challenge: PDFs had inconsistent formatting, mixed Sanskrit and English, and varying page layouts.
Solution: Custom chunking with page‑level metadata (including source
and page).
Why: Citations like [BPHS, p. 112] require reliable page‑level information.
Challenge: Embeddings alone sometimes missed classical terms or transliterations.
Solution: Combine BM25 (lexical) with embeddings (semantic).
Why: Hybrid retrieval captures both exact terms and conceptual similarity, crucial for Jyotish terminology.
Challenge: Even hybrid retrieval returned some noisy or tangential chunks.
Solution: Use a cross‑encoder to re‑rank the retrieved chunks based on query‑document relevance.
Why: This improves the quality of the top‑k context, which directly affects answer quality.
Challenge: Mixing content from different texts too early blurred authorial voices and caused subtle hallucinations.
Solution: Group chunks by source and fuse each text independently.
Why: This preserves the intent and style of each classical author.
Challenge: Classical texts sometimes differ or emphasize different aspects.
Solution: A meta‑fusion step synthesizes per‑source summaries, explicitly instructed to respect differences and highlight unique contributions.
Why: This mirrors how a human scholar would compare texts.
Challenge: Fused context could still be long and redundant.
Solution: Use the LLM to compress the fused context into a concise, technically accurate summary.
Why: Shorter, focused context improves answer quality and latency.
Challenge: The LLM could hallucinate citations or invent page numbers.
Solution: Build a citation list from metadata and pass it explicitly into the final prompt, instructing the model to only use those citations.
Why: This enforces traceability and scholarly integrity.
Challenge: Sanskrit and technical terms could be opaque to beginners.
Solution: Extract terms from the final answer and generate a concise glossary section.
Why: This makes the assistant accessible to both beginners and intermediate students.
Retrieval quality matters more than model size.
Fusion architecture determines coherence and faithfulness.
Transparency (showing context, sources, and citations) builds trust.
Classical texts require careful prompting.
Minimalism in UI and behavior leads to clarity and focus.
This project doubles as a training blueprint for anyone who wants to learn how to build LLM‑powered assistants with real‑world constraints and real classical content.
Each of these can be layered on top of the existing architecture without breaking its core design principles.
This Jyotish assistant is a living example of how modern AI techniques, available coding assistants, eagerness to learn & build and intentional design can come together.
On a deeper level, this Jyotish Assistant is a bridge between ancient wisdom and modern intelligence — built not as a gimmick, but as a thoughtful companion for serious seekers. Apart from that it definitely served its purpose as a demonstrator use case for anyone aspiring to build assistants with LLMs and RAG pipelines.
See the other products being built under the ArthRoute Studios banner, or reach out to discuss this project.