You shouldn’t do like this.

Model prediction is a backend as it can take. Here you transfer responsibilities for requests queue on system or on front http server like nginx. And most of the queue will be lost on load.

To make this production ready you need:

  1. Run background worker which is listening request queue.
  2. On request to front server it add params and data to the request queue.
  3. Worker get new task and run prediction. Results are saving in result queue.
  4. request from server wait until client ask for results(in ping style), or using realtime connections like websockets just send result right away to him

Thus you can see load in your queue, can add additional workers to handle load, all requests will be served.

--

--

Machine Learning and Computer Vision Researcher. Founder LearnML.Today

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store