Machine Learning Book Recommender (Portfolio)

An end‑to‑end project exploring data cleaning, recommendation modeling, and deployment.

Project Note: This app uses the Book‑Crossing dataset as a starting point. The raw data contained significant noise and inconsistencies. I implemented extensive cleaning and enrichment (e.g., standardized metadata, subject mapping), but a small number of errors can still persist. The app exists to showcase the ML pipeline and engineering, not perfect catalog data.

What This Demonstrates

How to Use the App

  1. Create Your Profile: Sign up and select your favorite genres. These are used to generate your first set of recommendations.
  2. Explore Initial Recommendations: Until you've rated at least 10 books, suggestions are mostly based on your favorite genres. You can update them anytime from your profile.
  3. Browse Popular Books: Visit the Search page without entering a title to see books ranked by popularity (using a Bayesian formula). This is a great way to find books you’ve already read and rate them quickly.
  4. Rate at Least 10 Books: Once you’ve rated 10 or more books, the system unlocks fully personalized recommendations based on your own preferences.
  5. Keep Exploring and Rating: The more books you rate, the better your recommendations become. You can also find similar books on any book's detail page.

Chat with the Librarian (Demo)

The chatbot is a demo assistant that can browse the web with limited tools to suggest books and read internal documentation about this website so it can explain features and onboard new users.

What it does now
  • Uses a small set of web tools to find book suggestions and quick references.
  • Answers onboarding questions using internal docs about this site (navigation, features, tips).
  • Profile-aware suggestions using interactions and favorite subjects.
Current demo limits
  • No access to internal ML tools yet (ALS, FAISS, subject embeddings).
  • Basic rate limits may apply (portfolio demo).
Roadmap (planned)
  • Direct access to internal ML engines (ALS, FAISS, attention-pooled embeddings).
  • A true RAG-style agent that retrieves from the site’s catalog and models, then explains the picks.

Behind the Scenes