FeedMe: Ingredient Prediction

Github Repository

Slide Deck
When visiting a mom-and-pop restaurant or other small establishment, dietary restrictions can make ordering food at some restaurants confusing, and perhaps even stressful. I've experienced this myself, as a vegetarian. In order to help mitigate this issue, myself and Jacqueline Zhang built a set of neural networks to answer a simple question:

By using a dish’s name and description, which may (but won’t always) contain hints to the ingredients, can we predict if that dish contains ingredients that may violate dietary restrictions?

Audience

As I mentioned earlier, I am vegetarian. However, there are many other groups who can also benefit from this tool:

  • Vegans
  • People with food allergies
  • Pescatarians
  • Even dogs and other pets! Did you know dogs can't eat grapes?

Data

Our data came from an organization called PepperridgeAPI. The files were hosted on Kaggle but were later delisted and removed. It contained several high-quality CSV files describing restaurants in Boston, MA. From this set, we used three tables describing different dishes, their menu descriptions, and their ingredients.

Preprocessing

While the dataset was already quite clean, we did do some preprocessing. We removed rows with empty data values for necessary fields, and subsampled the data to 20000 records. After that, we removed stop words such as the, is, and at. Then, we stemmed the remaining words to reduce the domain of inputs to the model (prepared and preparation both stem to prepar, for example). And finally, we created indicator columns for each of the ingredients The ingredints we are predicting are peanuts, eggs, sesame, fish, shellfish, soy, and meat. This list is a combination of common allergens and other dietary restrictions, and can easily be expanded later on. we wanted to predict.

To encode the textual data - menu names, menu descriptions, and menu section - into a numerical vector of fixed length, we used a metric called Term Frequency, Inverse Document Frequency (tf-idf).

Prediction and Results

We tried a variety of approaches to predict the ingredients of the dishes; they are summarized in the Slide Deck.

In the end, a simple set of neural networks proved to be the most effective; one network for each ingredient. Below, we can see validation metrics:

Future Improvements

This project is meant for quick, simple, everyday use, which is not one of Jupyter Notebook’s strong suits. We’d like to eventually package this up into a smartphone app, and distribute it that way for on-the-go use. A potential avenue for improvement if we do so is to use the phone camera to take a picture of the menu and run text recognition on it, automatically capturing the needed data.