Python redis and celery too many clients, different errors on each execution | Tasks connect to MySQL using...












1















I am currently working on an app, which has to process several long running tasks.
I am using python 3, flask, celery, redis.



I have a working solution on localhost, but on heroku there are many errors and every execution of the app triggers everytime a different set of errors. I know it cant be random so I am trying to figure out where to start looking.



I have a feeling something must be wrong with redis and I am trying to understand what clients are and where they come from, but I am not able to find an official documentation or explanation on this topic.



Question:



If the redis server is started (even on localhost) many clients are connected, eventough I haven't done anything. On heroku (I am using heroku-redis) I always have 6 clients, on localhost 11 clients.



I have done some research and I am able to display them with:



if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

# see what keys are in Redis
all_keys = redis_db.keys()
print (all_keys)

all_clients = redis_db.client_list()
print (all_clients)


I see all these clients but the information there doesn't help me at all. What are they? Why are they there? Where are they coming from?



All the heroku redis add-ons have a client limit, so I need to understand and optimize this. First I thought clientsnumber == tasknumber, but thats not it.



In total I have 12 tasks defined, but I am testing now with 2 tasks (both finish in less than 30 sec.).



When I execute the tasks on localhost the clients increase, from 11 to 16. If I execute once again from 16 to 18 and after this they always stay at 18 doesnt matter how often I execute the tasks.



So what is going on here? I have 2 tasks, why the clients increase from 11 to 16 and then from 16 to 18? Why are they not closed after the task is finished?



I am struggling with the whole issue now for a few days (eventhough it always works perfectly on localhost) so any help or ideas are welcome. I need to start looking somewhere, so currently I am trying to understand the clients.



EDIT:



I installed flower and tryed to monitor the 2 tasks on localhost, everything looks good. It processes two tasks and both succeed in a few seconds. The return value is correct (but it always worked great on localhost).



Still the issue is, after I started flower the amount of clients jumped to 30. I still have no clue: what are clients? With the amount of client I generate I would need a 100$ add-on to just process two tasks, which need a few seconds to finish, this cant be true, I am still thinking something is wrong with redis, even on localhost.



My redis setup is pretty simple:



if 'DYNO' in os.environ:
app.config['CELERY_BROKER_URL'] = 'redis://[the full URL from the redis add-on]'
app.config['CELERY_RESULT_BACKEND'] = 'redis://[the full URL from the redis add-on]'
else:
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost'

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'])


Here is an example of a task:



@celery.task(bind=True)
def get_users_deregistrations_task(self, g_start_date, g_end_date):

start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')

a1 = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
a2 = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
a3 = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()

if a1 is None:
a1 = 0

if a2 is None:
a2 = 0

if a3 is None:
a3 = 0

amount = a1 + a2 + a3

return {'some_value' : amount}

# Selects user deregistrations between selected dates
@app.route('/get-users-deregistration', methods=["POST"])
@basic_auth.required
@check_verified
def get_users_deregistrations():
if request.method == "POST":

# init task
task = get_users_deregistrations_task.apply_async([session['g_start_date'], session['g_end_date']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_users_deregistrations', task_id=task.id)}

@app.route('/status/<task_id>')
def taskstatus_get_users_deregistrations(task_id):
task = get_users_deregistrations_task.AsyncResult(task_id)
if task.state == 'PENDING':
response = {
'state': task.state,
'current': 0,
'total': 1,
'status': 'Pending...'
}
elif task.state != 'FAILURE':
response = {
'state': task.state,
'current': task.info['current'],
'total': task.info['total'],
'status': 'Finished',
'statistic': task.info['statistic'],
'final_dataset': task.info
}
if 'result' in task.info:
response['result'] = task.info['result']
else:
print ('in else')
# something went wrong in the background job
response = {
'state': task.state,
'current': 1,
'total': 1,
'status': str(task.info), # this is the exception raised
}
return json.dumps(response)


EDIT:



Here is my procfile for heroku:



web: gunicorn stats_main:app
worker: celery worker -A stats_main.celery --loglevel=info


EDIT



I am thinking the issue might be the connection pool (on the redis side), which I am not using properly.



I have also found some configurations for celery and added them:



celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'], redis_max_connections=20, BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None)


I uploaded everything again to heroku with these configurations. I am still testing with only 2 tasks, which are both fast.



I have executed the tasks on heroku 10 times in a row, 7 times they worked. 3 times it looked like they finished too early: the returned result was wrong (correct result is f.e. 30000 and it returned 3 times 18000).



The clients quickly jumped to 20, but they never went above 20, so at least the max client error and lost connection to redis error are resolved.



The big issue now is that the tasks can finish too early, its very important that the returned results are correct, performance is not important at all.



EDIT



Nevermind, nothing is solved, everything seems random.
I added two print() in one of the tasks to further debug and uploaded to heroku. After 2 executions I see again connection to redis lost, maximum number of clients reached (Eventhough my redismonitor add-on shows that clients never went above 20)



EDIT



The high amount of clients might be caused by idle clients, which are for some reason never closed (found in a blog post by heroku):




By default, Redis will never close idle connections, which means that
if you don't close your Redis connections explicitly, you will lock
yourself out of your instance.



To ensure this doesn't happen, Heroku Redis sets a default connection
timeout of 300 seconds. This timeout doesn’t apply to
non-publish/subscribe clients, and other blocking operations.




I added now a kill function for idle clients right before EVERY one of my tasks:



def kill_idle_clients():
if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

all_clients = redis_db.client_list()
counter = 0
for client in all_clients:
if int(client['idle']) >= 15:
redis_db.client_kill(client['addr'])
counter += 1

print ('killing idle clients:', counter)


Before a task starts it closes all clients, which idle more than 15 sec. It works again on localhost (but no surprise, it always worked on localhost). I have less clients, but on heroku it worked now only 2 times of 10. 8 times the tasks finished too early again. Maybe the idle clients were not really idle, I have no clue.



Its also almost impossible to test, as every execution of the tasks has a different outcome (Loses connection to redis, reached client limit, finishes too early, works perfect).



EDIT



It seems celery settings were ignored all the time. I was suspicious about this all the time and decided to test it by adding some random arguments and changing values to non-sense. I restarted the celery worker ofc.



I expected to see some errors, but it works like nothing happened.



Everything works like before with these non-sense configurations:



celery = Celery(app.name, broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'], redis_max_connections='pups', BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None, broker_connection_timeout='pups', pups="pups")
celery.conf.broker_transport_options = {'visibility_timeout': 'pups'}


EDIT



I changed the way I load configurations for celery (from a seperate config file). Seems to work now, but the issues remain the same.



celery_task = Celery(broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'])
celery_task.config_from_object('celeryconfig')


EDIT



With these configurations I managed to cap the amount of clients on localhost at 18 for all tasks (I tried all 12 tasks). However on heroku it "somehow" works. There are less clients, but the amount reached 20 once, eventhough I thought I could not exceed 18. (I tested on heroku with 4 tasks).



Testing on heroku with all 12 tasks triggers many different SQL errors. I am now more confused than before. It seems the same task is executed multiple times, but I see only 12 task URL's.



I think that because the SQL errors are f.e.:



sqlalchemy.exc.InternalError: (pymysql.err.InternalError) Packet sequence number wrong - got 117 expected 1


or



sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')


or



Multiple rows were found for one()


I tested a few times on heroku with 4 tasks and there were times the task results were returned, but the results were super weird.



This time the tasks did not finish too early but returned increased values, it looked like task A has returned the value 2 times and summed it.



Example: Task A must return 10k, but it returned 20k, so the task has been executed twice and the result has been summed.



Here are my current configurations. I still dont understand the math 100%, but I think its (for the amount of clients):



max-conncurency * CELERYD_MAX_TASKS_PER_CHILD


On localhost I found a new CLI command to inspect worker stats and I had max-conncurecy=3 and CELERYD_MAX_TASKS_PER_CHILD=6



CLI command:



celery -A stats_main.celery_task inspect stats


My current configurations:



worker start:



celery worker -A stats_main.celery_task --loglevel=info --autoscale=10,3


config:



CELERY_REDIS_MAX_CONNECTIONS=20
BROKER_POOL_LIMIT=None
CELERYD_WORKER_LOST_WAIT=20
CELERYD_MAX_TASKS_PER_CHILD=6
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 18000} # 5 hours
CELERY_RESULT_DB_SHORT_LIVED_SESSIONS = True #useful if: For example, intermittent errors like (OperationalError) (2006, ‘MySQL server has gone away’)


EDIT



Seeing all these SQL errors now I decided to research into a completely different direction. My new theory is, that it could be a MySQL issue.



I adjusted my connection to the MySQL server as described in the answer of this question.



I also found out that pymsql has threadsafety=1, I dont know yet whether this could be an issue, but it seems MySQL has something to do with connections and connection pools.



At the moment I also can say that memory can not be an issue, because if the packages were too big it shouldn't work on localhost, which means I left the max_allowed_packet at the default value, which is around 4MB.



I have also created 3 dummy tasks, which make some simple calculations without connecting to an external MySQL DB. I have executed now 5 times on heroku and there were no errors, the results were always correct, so I assume the issue is not celery, redis, but MySQL, eventhough I have no clue why it would work on localhost. Maybe its a combination of all 3, which lead to the issues on heroku.



EDIT



I adjusted my JS file. Now every task is called one after another, which means they are not async (I still use celery's apply_async because apply did not worked)



So its a hard workaround. I simply created a var for each task, f.e. var task_1_rdy = false;



I also created a function, which runs every 2 seconds and checks whether one task is ready, if ready it will start the next task. I think its easy to understand what I did here.



Tested this on heroku and had no errors at all, even with multiple tasks, so the issue is maybe solved. I need to make more tests but it looks very promising. Ofc. I am not using the async functionality and running task after task will probably have the worst performance, but hey it works now. I will benchmark the performance difference and update the question on monday.



EDIT



I have done a lot of testing today. The time it takes until the tasks complete is the same (sync vs. async) I dont know why, but it it the same.



Working with all 12 tasks on heroku and selecting a huge timerange (huge timerange = tasks take longer, because more data to process):



Again the task results are not precise, the returned values are wrong, only slightly wrong, but wrong and therefore not reliable, f.e. task A must return 20k and on heroku it returned 19500. I dont know how it is possible that data is lost / task returns too early, but after 2 weeks I will give up and try to use a completely different system.










share|improve this question

























  • Can you provide some of the SQL statements involved?

    – Rick James
    Nov 23 '18 at 3:13











  • I am using SQLalchemy and there is an example of a calculation in the example task above.

    – Roman
    Nov 26 '18 at 6:55
















1















I am currently working on an app, which has to process several long running tasks.
I am using python 3, flask, celery, redis.



I have a working solution on localhost, but on heroku there are many errors and every execution of the app triggers everytime a different set of errors. I know it cant be random so I am trying to figure out where to start looking.



I have a feeling something must be wrong with redis and I am trying to understand what clients are and where they come from, but I am not able to find an official documentation or explanation on this topic.



Question:



If the redis server is started (even on localhost) many clients are connected, eventough I haven't done anything. On heroku (I am using heroku-redis) I always have 6 clients, on localhost 11 clients.



I have done some research and I am able to display them with:



if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

# see what keys are in Redis
all_keys = redis_db.keys()
print (all_keys)

all_clients = redis_db.client_list()
print (all_clients)


I see all these clients but the information there doesn't help me at all. What are they? Why are they there? Where are they coming from?



All the heroku redis add-ons have a client limit, so I need to understand and optimize this. First I thought clientsnumber == tasknumber, but thats not it.



In total I have 12 tasks defined, but I am testing now with 2 tasks (both finish in less than 30 sec.).



When I execute the tasks on localhost the clients increase, from 11 to 16. If I execute once again from 16 to 18 and after this they always stay at 18 doesnt matter how often I execute the tasks.



So what is going on here? I have 2 tasks, why the clients increase from 11 to 16 and then from 16 to 18? Why are they not closed after the task is finished?



I am struggling with the whole issue now for a few days (eventhough it always works perfectly on localhost) so any help or ideas are welcome. I need to start looking somewhere, so currently I am trying to understand the clients.



EDIT:



I installed flower and tryed to monitor the 2 tasks on localhost, everything looks good. It processes two tasks and both succeed in a few seconds. The return value is correct (but it always worked great on localhost).



Still the issue is, after I started flower the amount of clients jumped to 30. I still have no clue: what are clients? With the amount of client I generate I would need a 100$ add-on to just process two tasks, which need a few seconds to finish, this cant be true, I am still thinking something is wrong with redis, even on localhost.



My redis setup is pretty simple:



if 'DYNO' in os.environ:
app.config['CELERY_BROKER_URL'] = 'redis://[the full URL from the redis add-on]'
app.config['CELERY_RESULT_BACKEND'] = 'redis://[the full URL from the redis add-on]'
else:
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost'

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'])


Here is an example of a task:



@celery.task(bind=True)
def get_users_deregistrations_task(self, g_start_date, g_end_date):

start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')

a1 = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
a2 = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
a3 = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()

if a1 is None:
a1 = 0

if a2 is None:
a2 = 0

if a3 is None:
a3 = 0

amount = a1 + a2 + a3

return {'some_value' : amount}

# Selects user deregistrations between selected dates
@app.route('/get-users-deregistration', methods=["POST"])
@basic_auth.required
@check_verified
def get_users_deregistrations():
if request.method == "POST":

# init task
task = get_users_deregistrations_task.apply_async([session['g_start_date'], session['g_end_date']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_users_deregistrations', task_id=task.id)}

@app.route('/status/<task_id>')
def taskstatus_get_users_deregistrations(task_id):
task = get_users_deregistrations_task.AsyncResult(task_id)
if task.state == 'PENDING':
response = {
'state': task.state,
'current': 0,
'total': 1,
'status': 'Pending...'
}
elif task.state != 'FAILURE':
response = {
'state': task.state,
'current': task.info['current'],
'total': task.info['total'],
'status': 'Finished',
'statistic': task.info['statistic'],
'final_dataset': task.info
}
if 'result' in task.info:
response['result'] = task.info['result']
else:
print ('in else')
# something went wrong in the background job
response = {
'state': task.state,
'current': 1,
'total': 1,
'status': str(task.info), # this is the exception raised
}
return json.dumps(response)


EDIT:



Here is my procfile for heroku:



web: gunicorn stats_main:app
worker: celery worker -A stats_main.celery --loglevel=info


EDIT



I am thinking the issue might be the connection pool (on the redis side), which I am not using properly.



I have also found some configurations for celery and added them:



celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'], redis_max_connections=20, BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None)


I uploaded everything again to heroku with these configurations. I am still testing with only 2 tasks, which are both fast.



I have executed the tasks on heroku 10 times in a row, 7 times they worked. 3 times it looked like they finished too early: the returned result was wrong (correct result is f.e. 30000 and it returned 3 times 18000).



The clients quickly jumped to 20, but they never went above 20, so at least the max client error and lost connection to redis error are resolved.



The big issue now is that the tasks can finish too early, its very important that the returned results are correct, performance is not important at all.



EDIT



Nevermind, nothing is solved, everything seems random.
I added two print() in one of the tasks to further debug and uploaded to heroku. After 2 executions I see again connection to redis lost, maximum number of clients reached (Eventhough my redismonitor add-on shows that clients never went above 20)



EDIT



The high amount of clients might be caused by idle clients, which are for some reason never closed (found in a blog post by heroku):




By default, Redis will never close idle connections, which means that
if you don't close your Redis connections explicitly, you will lock
yourself out of your instance.



To ensure this doesn't happen, Heroku Redis sets a default connection
timeout of 300 seconds. This timeout doesn’t apply to
non-publish/subscribe clients, and other blocking operations.




I added now a kill function for idle clients right before EVERY one of my tasks:



def kill_idle_clients():
if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

all_clients = redis_db.client_list()
counter = 0
for client in all_clients:
if int(client['idle']) >= 15:
redis_db.client_kill(client['addr'])
counter += 1

print ('killing idle clients:', counter)


Before a task starts it closes all clients, which idle more than 15 sec. It works again on localhost (but no surprise, it always worked on localhost). I have less clients, but on heroku it worked now only 2 times of 10. 8 times the tasks finished too early again. Maybe the idle clients were not really idle, I have no clue.



Its also almost impossible to test, as every execution of the tasks has a different outcome (Loses connection to redis, reached client limit, finishes too early, works perfect).



EDIT



It seems celery settings were ignored all the time. I was suspicious about this all the time and decided to test it by adding some random arguments and changing values to non-sense. I restarted the celery worker ofc.



I expected to see some errors, but it works like nothing happened.



Everything works like before with these non-sense configurations:



celery = Celery(app.name, broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'], redis_max_connections='pups', BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None, broker_connection_timeout='pups', pups="pups")
celery.conf.broker_transport_options = {'visibility_timeout': 'pups'}


EDIT



I changed the way I load configurations for celery (from a seperate config file). Seems to work now, but the issues remain the same.



celery_task = Celery(broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'])
celery_task.config_from_object('celeryconfig')


EDIT



With these configurations I managed to cap the amount of clients on localhost at 18 for all tasks (I tried all 12 tasks). However on heroku it "somehow" works. There are less clients, but the amount reached 20 once, eventhough I thought I could not exceed 18. (I tested on heroku with 4 tasks).



Testing on heroku with all 12 tasks triggers many different SQL errors. I am now more confused than before. It seems the same task is executed multiple times, but I see only 12 task URL's.



I think that because the SQL errors are f.e.:



sqlalchemy.exc.InternalError: (pymysql.err.InternalError) Packet sequence number wrong - got 117 expected 1


or



sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')


or



Multiple rows were found for one()


I tested a few times on heroku with 4 tasks and there were times the task results were returned, but the results were super weird.



This time the tasks did not finish too early but returned increased values, it looked like task A has returned the value 2 times and summed it.



Example: Task A must return 10k, but it returned 20k, so the task has been executed twice and the result has been summed.



Here are my current configurations. I still dont understand the math 100%, but I think its (for the amount of clients):



max-conncurency * CELERYD_MAX_TASKS_PER_CHILD


On localhost I found a new CLI command to inspect worker stats and I had max-conncurecy=3 and CELERYD_MAX_TASKS_PER_CHILD=6



CLI command:



celery -A stats_main.celery_task inspect stats


My current configurations:



worker start:



celery worker -A stats_main.celery_task --loglevel=info --autoscale=10,3


config:



CELERY_REDIS_MAX_CONNECTIONS=20
BROKER_POOL_LIMIT=None
CELERYD_WORKER_LOST_WAIT=20
CELERYD_MAX_TASKS_PER_CHILD=6
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 18000} # 5 hours
CELERY_RESULT_DB_SHORT_LIVED_SESSIONS = True #useful if: For example, intermittent errors like (OperationalError) (2006, ‘MySQL server has gone away’)


EDIT



Seeing all these SQL errors now I decided to research into a completely different direction. My new theory is, that it could be a MySQL issue.



I adjusted my connection to the MySQL server as described in the answer of this question.



I also found out that pymsql has threadsafety=1, I dont know yet whether this could be an issue, but it seems MySQL has something to do with connections and connection pools.



At the moment I also can say that memory can not be an issue, because if the packages were too big it shouldn't work on localhost, which means I left the max_allowed_packet at the default value, which is around 4MB.



I have also created 3 dummy tasks, which make some simple calculations without connecting to an external MySQL DB. I have executed now 5 times on heroku and there were no errors, the results were always correct, so I assume the issue is not celery, redis, but MySQL, eventhough I have no clue why it would work on localhost. Maybe its a combination of all 3, which lead to the issues on heroku.



EDIT



I adjusted my JS file. Now every task is called one after another, which means they are not async (I still use celery's apply_async because apply did not worked)



So its a hard workaround. I simply created a var for each task, f.e. var task_1_rdy = false;



I also created a function, which runs every 2 seconds and checks whether one task is ready, if ready it will start the next task. I think its easy to understand what I did here.



Tested this on heroku and had no errors at all, even with multiple tasks, so the issue is maybe solved. I need to make more tests but it looks very promising. Ofc. I am not using the async functionality and running task after task will probably have the worst performance, but hey it works now. I will benchmark the performance difference and update the question on monday.



EDIT



I have done a lot of testing today. The time it takes until the tasks complete is the same (sync vs. async) I dont know why, but it it the same.



Working with all 12 tasks on heroku and selecting a huge timerange (huge timerange = tasks take longer, because more data to process):



Again the task results are not precise, the returned values are wrong, only slightly wrong, but wrong and therefore not reliable, f.e. task A must return 20k and on heroku it returned 19500. I dont know how it is possible that data is lost / task returns too early, but after 2 weeks I will give up and try to use a completely different system.










share|improve this question

























  • Can you provide some of the SQL statements involved?

    – Rick James
    Nov 23 '18 at 3:13











  • I am using SQLalchemy and there is an example of a calculation in the example task above.

    – Roman
    Nov 26 '18 at 6:55














1












1








1








I am currently working on an app, which has to process several long running tasks.
I am using python 3, flask, celery, redis.



I have a working solution on localhost, but on heroku there are many errors and every execution of the app triggers everytime a different set of errors. I know it cant be random so I am trying to figure out where to start looking.



I have a feeling something must be wrong with redis and I am trying to understand what clients are and where they come from, but I am not able to find an official documentation or explanation on this topic.



Question:



If the redis server is started (even on localhost) many clients are connected, eventough I haven't done anything. On heroku (I am using heroku-redis) I always have 6 clients, on localhost 11 clients.



I have done some research and I am able to display them with:



if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

# see what keys are in Redis
all_keys = redis_db.keys()
print (all_keys)

all_clients = redis_db.client_list()
print (all_clients)


I see all these clients but the information there doesn't help me at all. What are they? Why are they there? Where are they coming from?



All the heroku redis add-ons have a client limit, so I need to understand and optimize this. First I thought clientsnumber == tasknumber, but thats not it.



In total I have 12 tasks defined, but I am testing now with 2 tasks (both finish in less than 30 sec.).



When I execute the tasks on localhost the clients increase, from 11 to 16. If I execute once again from 16 to 18 and after this they always stay at 18 doesnt matter how often I execute the tasks.



So what is going on here? I have 2 tasks, why the clients increase from 11 to 16 and then from 16 to 18? Why are they not closed after the task is finished?



I am struggling with the whole issue now for a few days (eventhough it always works perfectly on localhost) so any help or ideas are welcome. I need to start looking somewhere, so currently I am trying to understand the clients.



EDIT:



I installed flower and tryed to monitor the 2 tasks on localhost, everything looks good. It processes two tasks and both succeed in a few seconds. The return value is correct (but it always worked great on localhost).



Still the issue is, after I started flower the amount of clients jumped to 30. I still have no clue: what are clients? With the amount of client I generate I would need a 100$ add-on to just process two tasks, which need a few seconds to finish, this cant be true, I am still thinking something is wrong with redis, even on localhost.



My redis setup is pretty simple:



if 'DYNO' in os.environ:
app.config['CELERY_BROKER_URL'] = 'redis://[the full URL from the redis add-on]'
app.config['CELERY_RESULT_BACKEND'] = 'redis://[the full URL from the redis add-on]'
else:
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost'

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'])


Here is an example of a task:



@celery.task(bind=True)
def get_users_deregistrations_task(self, g_start_date, g_end_date):

start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')

a1 = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
a2 = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
a3 = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()

if a1 is None:
a1 = 0

if a2 is None:
a2 = 0

if a3 is None:
a3 = 0

amount = a1 + a2 + a3

return {'some_value' : amount}

# Selects user deregistrations between selected dates
@app.route('/get-users-deregistration', methods=["POST"])
@basic_auth.required
@check_verified
def get_users_deregistrations():
if request.method == "POST":

# init task
task = get_users_deregistrations_task.apply_async([session['g_start_date'], session['g_end_date']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_users_deregistrations', task_id=task.id)}

@app.route('/status/<task_id>')
def taskstatus_get_users_deregistrations(task_id):
task = get_users_deregistrations_task.AsyncResult(task_id)
if task.state == 'PENDING':
response = {
'state': task.state,
'current': 0,
'total': 1,
'status': 'Pending...'
}
elif task.state != 'FAILURE':
response = {
'state': task.state,
'current': task.info['current'],
'total': task.info['total'],
'status': 'Finished',
'statistic': task.info['statistic'],
'final_dataset': task.info
}
if 'result' in task.info:
response['result'] = task.info['result']
else:
print ('in else')
# something went wrong in the background job
response = {
'state': task.state,
'current': 1,
'total': 1,
'status': str(task.info), # this is the exception raised
}
return json.dumps(response)


EDIT:



Here is my procfile for heroku:



web: gunicorn stats_main:app
worker: celery worker -A stats_main.celery --loglevel=info


EDIT



I am thinking the issue might be the connection pool (on the redis side), which I am not using properly.



I have also found some configurations for celery and added them:



celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'], redis_max_connections=20, BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None)


I uploaded everything again to heroku with these configurations. I am still testing with only 2 tasks, which are both fast.



I have executed the tasks on heroku 10 times in a row, 7 times they worked. 3 times it looked like they finished too early: the returned result was wrong (correct result is f.e. 30000 and it returned 3 times 18000).



The clients quickly jumped to 20, but they never went above 20, so at least the max client error and lost connection to redis error are resolved.



The big issue now is that the tasks can finish too early, its very important that the returned results are correct, performance is not important at all.



EDIT



Nevermind, nothing is solved, everything seems random.
I added two print() in one of the tasks to further debug and uploaded to heroku. After 2 executions I see again connection to redis lost, maximum number of clients reached (Eventhough my redismonitor add-on shows that clients never went above 20)



EDIT



The high amount of clients might be caused by idle clients, which are for some reason never closed (found in a blog post by heroku):




By default, Redis will never close idle connections, which means that
if you don't close your Redis connections explicitly, you will lock
yourself out of your instance.



To ensure this doesn't happen, Heroku Redis sets a default connection
timeout of 300 seconds. This timeout doesn’t apply to
non-publish/subscribe clients, and other blocking operations.




I added now a kill function for idle clients right before EVERY one of my tasks:



def kill_idle_clients():
if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

all_clients = redis_db.client_list()
counter = 0
for client in all_clients:
if int(client['idle']) >= 15:
redis_db.client_kill(client['addr'])
counter += 1

print ('killing idle clients:', counter)


Before a task starts it closes all clients, which idle more than 15 sec. It works again on localhost (but no surprise, it always worked on localhost). I have less clients, but on heroku it worked now only 2 times of 10. 8 times the tasks finished too early again. Maybe the idle clients were not really idle, I have no clue.



Its also almost impossible to test, as every execution of the tasks has a different outcome (Loses connection to redis, reached client limit, finishes too early, works perfect).



EDIT



It seems celery settings were ignored all the time. I was suspicious about this all the time and decided to test it by adding some random arguments and changing values to non-sense. I restarted the celery worker ofc.



I expected to see some errors, but it works like nothing happened.



Everything works like before with these non-sense configurations:



celery = Celery(app.name, broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'], redis_max_connections='pups', BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None, broker_connection_timeout='pups', pups="pups")
celery.conf.broker_transport_options = {'visibility_timeout': 'pups'}


EDIT



I changed the way I load configurations for celery (from a seperate config file). Seems to work now, but the issues remain the same.



celery_task = Celery(broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'])
celery_task.config_from_object('celeryconfig')


EDIT



With these configurations I managed to cap the amount of clients on localhost at 18 for all tasks (I tried all 12 tasks). However on heroku it "somehow" works. There are less clients, but the amount reached 20 once, eventhough I thought I could not exceed 18. (I tested on heroku with 4 tasks).



Testing on heroku with all 12 tasks triggers many different SQL errors. I am now more confused than before. It seems the same task is executed multiple times, but I see only 12 task URL's.



I think that because the SQL errors are f.e.:



sqlalchemy.exc.InternalError: (pymysql.err.InternalError) Packet sequence number wrong - got 117 expected 1


or



sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')


or



Multiple rows were found for one()


I tested a few times on heroku with 4 tasks and there were times the task results were returned, but the results were super weird.



This time the tasks did not finish too early but returned increased values, it looked like task A has returned the value 2 times and summed it.



Example: Task A must return 10k, but it returned 20k, so the task has been executed twice and the result has been summed.



Here are my current configurations. I still dont understand the math 100%, but I think its (for the amount of clients):



max-conncurency * CELERYD_MAX_TASKS_PER_CHILD


On localhost I found a new CLI command to inspect worker stats and I had max-conncurecy=3 and CELERYD_MAX_TASKS_PER_CHILD=6



CLI command:



celery -A stats_main.celery_task inspect stats


My current configurations:



worker start:



celery worker -A stats_main.celery_task --loglevel=info --autoscale=10,3


config:



CELERY_REDIS_MAX_CONNECTIONS=20
BROKER_POOL_LIMIT=None
CELERYD_WORKER_LOST_WAIT=20
CELERYD_MAX_TASKS_PER_CHILD=6
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 18000} # 5 hours
CELERY_RESULT_DB_SHORT_LIVED_SESSIONS = True #useful if: For example, intermittent errors like (OperationalError) (2006, ‘MySQL server has gone away’)


EDIT



Seeing all these SQL errors now I decided to research into a completely different direction. My new theory is, that it could be a MySQL issue.



I adjusted my connection to the MySQL server as described in the answer of this question.



I also found out that pymsql has threadsafety=1, I dont know yet whether this could be an issue, but it seems MySQL has something to do with connections and connection pools.



At the moment I also can say that memory can not be an issue, because if the packages were too big it shouldn't work on localhost, which means I left the max_allowed_packet at the default value, which is around 4MB.



I have also created 3 dummy tasks, which make some simple calculations without connecting to an external MySQL DB. I have executed now 5 times on heroku and there were no errors, the results were always correct, so I assume the issue is not celery, redis, but MySQL, eventhough I have no clue why it would work on localhost. Maybe its a combination of all 3, which lead to the issues on heroku.



EDIT



I adjusted my JS file. Now every task is called one after another, which means they are not async (I still use celery's apply_async because apply did not worked)



So its a hard workaround. I simply created a var for each task, f.e. var task_1_rdy = false;



I also created a function, which runs every 2 seconds and checks whether one task is ready, if ready it will start the next task. I think its easy to understand what I did here.



Tested this on heroku and had no errors at all, even with multiple tasks, so the issue is maybe solved. I need to make more tests but it looks very promising. Ofc. I am not using the async functionality and running task after task will probably have the worst performance, but hey it works now. I will benchmark the performance difference and update the question on monday.



EDIT



I have done a lot of testing today. The time it takes until the tasks complete is the same (sync vs. async) I dont know why, but it it the same.



Working with all 12 tasks on heroku and selecting a huge timerange (huge timerange = tasks take longer, because more data to process):



Again the task results are not precise, the returned values are wrong, only slightly wrong, but wrong and therefore not reliable, f.e. task A must return 20k and on heroku it returned 19500. I dont know how it is possible that data is lost / task returns too early, but after 2 weeks I will give up and try to use a completely different system.










share|improve this question
















I am currently working on an app, which has to process several long running tasks.
I am using python 3, flask, celery, redis.



I have a working solution on localhost, but on heroku there are many errors and every execution of the app triggers everytime a different set of errors. I know it cant be random so I am trying to figure out where to start looking.



I have a feeling something must be wrong with redis and I am trying to understand what clients are and where they come from, but I am not able to find an official documentation or explanation on this topic.



Question:



If the redis server is started (even on localhost) many clients are connected, eventough I haven't done anything. On heroku (I am using heroku-redis) I always have 6 clients, on localhost 11 clients.



I have done some research and I am able to display them with:



if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

# see what keys are in Redis
all_keys = redis_db.keys()
print (all_keys)

all_clients = redis_db.client_list()
print (all_clients)


I see all these clients but the information there doesn't help me at all. What are they? Why are they there? Where are they coming from?



All the heroku redis add-ons have a client limit, so I need to understand and optimize this. First I thought clientsnumber == tasknumber, but thats not it.



In total I have 12 tasks defined, but I am testing now with 2 tasks (both finish in less than 30 sec.).



When I execute the tasks on localhost the clients increase, from 11 to 16. If I execute once again from 16 to 18 and after this they always stay at 18 doesnt matter how often I execute the tasks.



So what is going on here? I have 2 tasks, why the clients increase from 11 to 16 and then from 16 to 18? Why are they not closed after the task is finished?



I am struggling with the whole issue now for a few days (eventhough it always works perfectly on localhost) so any help or ideas are welcome. I need to start looking somewhere, so currently I am trying to understand the clients.



EDIT:



I installed flower and tryed to monitor the 2 tasks on localhost, everything looks good. It processes two tasks and both succeed in a few seconds. The return value is correct (but it always worked great on localhost).



Still the issue is, after I started flower the amount of clients jumped to 30. I still have no clue: what are clients? With the amount of client I generate I would need a 100$ add-on to just process two tasks, which need a few seconds to finish, this cant be true, I am still thinking something is wrong with redis, even on localhost.



My redis setup is pretty simple:



if 'DYNO' in os.environ:
app.config['CELERY_BROKER_URL'] = 'redis://[the full URL from the redis add-on]'
app.config['CELERY_RESULT_BACKEND'] = 'redis://[the full URL from the redis add-on]'
else:
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost'

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'])


Here is an example of a task:



@celery.task(bind=True)
def get_users_deregistrations_task(self, g_start_date, g_end_date):

start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')

a1 = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
a2 = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
a3 = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()

if a1 is None:
a1 = 0

if a2 is None:
a2 = 0

if a3 is None:
a3 = 0

amount = a1 + a2 + a3

return {'some_value' : amount}

# Selects user deregistrations between selected dates
@app.route('/get-users-deregistration', methods=["POST"])
@basic_auth.required
@check_verified
def get_users_deregistrations():
if request.method == "POST":

# init task
task = get_users_deregistrations_task.apply_async([session['g_start_date'], session['g_end_date']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_users_deregistrations', task_id=task.id)}

@app.route('/status/<task_id>')
def taskstatus_get_users_deregistrations(task_id):
task = get_users_deregistrations_task.AsyncResult(task_id)
if task.state == 'PENDING':
response = {
'state': task.state,
'current': 0,
'total': 1,
'status': 'Pending...'
}
elif task.state != 'FAILURE':
response = {
'state': task.state,
'current': task.info['current'],
'total': task.info['total'],
'status': 'Finished',
'statistic': task.info['statistic'],
'final_dataset': task.info
}
if 'result' in task.info:
response['result'] = task.info['result']
else:
print ('in else')
# something went wrong in the background job
response = {
'state': task.state,
'current': 1,
'total': 1,
'status': str(task.info), # this is the exception raised
}
return json.dumps(response)


EDIT:



Here is my procfile for heroku:



web: gunicorn stats_main:app
worker: celery worker -A stats_main.celery --loglevel=info


EDIT



I am thinking the issue might be the connection pool (on the redis side), which I am not using properly.



I have also found some configurations for celery and added them:



celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'], redis_max_connections=20, BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None)


I uploaded everything again to heroku with these configurations. I am still testing with only 2 tasks, which are both fast.



I have executed the tasks on heroku 10 times in a row, 7 times they worked. 3 times it looked like they finished too early: the returned result was wrong (correct result is f.e. 30000 and it returned 3 times 18000).



The clients quickly jumped to 20, but they never went above 20, so at least the max client error and lost connection to redis error are resolved.



The big issue now is that the tasks can finish too early, its very important that the returned results are correct, performance is not important at all.



EDIT



Nevermind, nothing is solved, everything seems random.
I added two print() in one of the tasks to further debug and uploaded to heroku. After 2 executions I see again connection to redis lost, maximum number of clients reached (Eventhough my redismonitor add-on shows that clients never went above 20)



EDIT



The high amount of clients might be caused by idle clients, which are for some reason never closed (found in a blog post by heroku):




By default, Redis will never close idle connections, which means that
if you don't close your Redis connections explicitly, you will lock
yourself out of your instance.



To ensure this doesn't happen, Heroku Redis sets a default connection
timeout of 300 seconds. This timeout doesn’t apply to
non-publish/subscribe clients, and other blocking operations.




I added now a kill function for idle clients right before EVERY one of my tasks:



def kill_idle_clients():
if 'DYNO' in os.environ:
redis_db = redis.StrictRedis(host='HOST', port=15249, password='REDISDBPW')
else:
redis_db = redis.StrictRedis()

all_clients = redis_db.client_list()
counter = 0
for client in all_clients:
if int(client['idle']) >= 15:
redis_db.client_kill(client['addr'])
counter += 1

print ('killing idle clients:', counter)


Before a task starts it closes all clients, which idle more than 15 sec. It works again on localhost (but no surprise, it always worked on localhost). I have less clients, but on heroku it worked now only 2 times of 10. 8 times the tasks finished too early again. Maybe the idle clients were not really idle, I have no clue.



Its also almost impossible to test, as every execution of the tasks has a different outcome (Loses connection to redis, reached client limit, finishes too early, works perfect).



EDIT



It seems celery settings were ignored all the time. I was suspicious about this all the time and decided to test it by adding some random arguments and changing values to non-sense. I restarted the celery worker ofc.



I expected to see some errors, but it works like nothing happened.



Everything works like before with these non-sense configurations:



celery = Celery(app.name, broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'], redis_max_connections='pups', BROKER_TRANSPORT_OPTIONS = {
'max_connections': 20,
}, broker_pool_limit=None, broker_connection_timeout='pups', pups="pups")
celery.conf.broker_transport_options = {'visibility_timeout': 'pups'}


EDIT



I changed the way I load configurations for celery (from a seperate config file). Seems to work now, but the issues remain the same.



celery_task = Celery(broker=app.config['REDIS_URL'], backend=app.config['REDIS_URL'])
celery_task.config_from_object('celeryconfig')


EDIT



With these configurations I managed to cap the amount of clients on localhost at 18 for all tasks (I tried all 12 tasks). However on heroku it "somehow" works. There are less clients, but the amount reached 20 once, eventhough I thought I could not exceed 18. (I tested on heroku with 4 tasks).



Testing on heroku with all 12 tasks triggers many different SQL errors. I am now more confused than before. It seems the same task is executed multiple times, but I see only 12 task URL's.



I think that because the SQL errors are f.e.:



sqlalchemy.exc.InternalError: (pymysql.err.InternalError) Packet sequence number wrong - got 117 expected 1


or



sqlalchemy.exc.InterfaceError: (pymysql.err.InterfaceError) (0, '')


or



Multiple rows were found for one()


I tested a few times on heroku with 4 tasks and there were times the task results were returned, but the results were super weird.



This time the tasks did not finish too early but returned increased values, it looked like task A has returned the value 2 times and summed it.



Example: Task A must return 10k, but it returned 20k, so the task has been executed twice and the result has been summed.



Here are my current configurations. I still dont understand the math 100%, but I think its (for the amount of clients):



max-conncurency * CELERYD_MAX_TASKS_PER_CHILD


On localhost I found a new CLI command to inspect worker stats and I had max-conncurecy=3 and CELERYD_MAX_TASKS_PER_CHILD=6



CLI command:



celery -A stats_main.celery_task inspect stats


My current configurations:



worker start:



celery worker -A stats_main.celery_task --loglevel=info --autoscale=10,3


config:



CELERY_REDIS_MAX_CONNECTIONS=20
BROKER_POOL_LIMIT=None
CELERYD_WORKER_LOST_WAIT=20
CELERYD_MAX_TASKS_PER_CHILD=6
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 18000} # 5 hours
CELERY_RESULT_DB_SHORT_LIVED_SESSIONS = True #useful if: For example, intermittent errors like (OperationalError) (2006, ‘MySQL server has gone away’)


EDIT



Seeing all these SQL errors now I decided to research into a completely different direction. My new theory is, that it could be a MySQL issue.



I adjusted my connection to the MySQL server as described in the answer of this question.



I also found out that pymsql has threadsafety=1, I dont know yet whether this could be an issue, but it seems MySQL has something to do with connections and connection pools.



At the moment I also can say that memory can not be an issue, because if the packages were too big it shouldn't work on localhost, which means I left the max_allowed_packet at the default value, which is around 4MB.



I have also created 3 dummy tasks, which make some simple calculations without connecting to an external MySQL DB. I have executed now 5 times on heroku and there were no errors, the results were always correct, so I assume the issue is not celery, redis, but MySQL, eventhough I have no clue why it would work on localhost. Maybe its a combination of all 3, which lead to the issues on heroku.



EDIT



I adjusted my JS file. Now every task is called one after another, which means they are not async (I still use celery's apply_async because apply did not worked)



So its a hard workaround. I simply created a var for each task, f.e. var task_1_rdy = false;



I also created a function, which runs every 2 seconds and checks whether one task is ready, if ready it will start the next task. I think its easy to understand what I did here.



Tested this on heroku and had no errors at all, even with multiple tasks, so the issue is maybe solved. I need to make more tests but it looks very promising. Ofc. I am not using the async functionality and running task after task will probably have the worst performance, but hey it works now. I will benchmark the performance difference and update the question on monday.



EDIT



I have done a lot of testing today. The time it takes until the tasks complete is the same (sync vs. async) I dont know why, but it it the same.



Working with all 12 tasks on heroku and selecting a huge timerange (huge timerange = tasks take longer, because more data to process):



Again the task results are not precise, the returned values are wrong, only slightly wrong, but wrong and therefore not reliable, f.e. task A must return 20k and on heroku it returned 19500. I dont know how it is possible that data is lost / task returns too early, but after 2 weeks I will give up and try to use a completely different system.







python mysql heroku celery pymysql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 13:50







Roman

















asked Nov 15 '18 at 14:01









RomanRoman

6581739




6581739













  • Can you provide some of the SQL statements involved?

    – Rick James
    Nov 23 '18 at 3:13











  • I am using SQLalchemy and there is an example of a calculation in the example task above.

    – Roman
    Nov 26 '18 at 6:55



















  • Can you provide some of the SQL statements involved?

    – Rick James
    Nov 23 '18 at 3:13











  • I am using SQLalchemy and there is an example of a calculation in the example task above.

    – Roman
    Nov 26 '18 at 6:55

















Can you provide some of the SQL statements involved?

– Rick James
Nov 23 '18 at 3:13





Can you provide some of the SQL statements involved?

– Rick James
Nov 23 '18 at 3:13













I am using SQLalchemy and there is an example of a calculation in the example task above.

– Roman
Nov 26 '18 at 6:55





I am using SQLalchemy and there is an example of a calculation in the example task above.

– Roman
Nov 26 '18 at 6:55












2 Answers
2






active

oldest

votes


















1














sounds like you r a rest-api using celery worker redis as msg queue.
here is the chk list:



1 in your client did you close the connection after the logic finish



2 celery will new workers, the workers may cause trouble, try monitor celery with flower



3 make sure your client finish the task, try debug with print something, sometimes staging and local has network issues which are stopping you from ending the celery task



4 if you are using redis for celery msg queue, try monitor the number of queues, maybe they auto scale up?






share|improve this answer


























  • Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

    – Roman
    Nov 15 '18 at 14:11











  • if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

    – 陈海栋
    Nov 15 '18 at 14:15











  • seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

    – 陈海栋
    Nov 15 '18 at 14:20











  • I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

    – Roman
    Nov 15 '18 at 14:21











  • I have updated my question. Thanks so far!

    – Roman
    Nov 16 '18 at 9:00





















1














Now i am 60% sure that it is your task that is taking too long and the server cannot respond within a default web request return time. The 70% / 30% thing is applicable when you are on a local machine, where the network is very fast. On the cloud platform latency is the problem and sometimes it affects your program. Before that, if the celery worker failed it will auto create another worker to finish the unfinished job because of gunicon and celery, which causes the increase of the connection.



So the solution is:




  • Option 1 make your task finish faster


  • Option 2 return an acknowledgement first , calculate at background, and make another api call to send back the results







share|improve this answer


























  • Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

    – Roman
    Nov 19 '18 at 7:36











  • From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

    – Roman
    Nov 19 '18 at 7:38











  • can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

    – 陈海栋
    Nov 20 '18 at 5:26











  • sorry @Roman most of the credits and bounties are stolen by some random person.

    – 陈海栋
    Nov 27 '18 at 1:40











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321145%2fpython-redis-and-celery-too-many-clients-different-errors-on-each-execution-t%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














sounds like you r a rest-api using celery worker redis as msg queue.
here is the chk list:



1 in your client did you close the connection after the logic finish



2 celery will new workers, the workers may cause trouble, try monitor celery with flower



3 make sure your client finish the task, try debug with print something, sometimes staging and local has network issues which are stopping you from ending the celery task



4 if you are using redis for celery msg queue, try monitor the number of queues, maybe they auto scale up?






share|improve this answer


























  • Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

    – Roman
    Nov 15 '18 at 14:11











  • if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

    – 陈海栋
    Nov 15 '18 at 14:15











  • seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

    – 陈海栋
    Nov 15 '18 at 14:20











  • I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

    – Roman
    Nov 15 '18 at 14:21











  • I have updated my question. Thanks so far!

    – Roman
    Nov 16 '18 at 9:00


















1














sounds like you r a rest-api using celery worker redis as msg queue.
here is the chk list:



1 in your client did you close the connection after the logic finish



2 celery will new workers, the workers may cause trouble, try monitor celery with flower



3 make sure your client finish the task, try debug with print something, sometimes staging and local has network issues which are stopping you from ending the celery task



4 if you are using redis for celery msg queue, try monitor the number of queues, maybe they auto scale up?






share|improve this answer


























  • Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

    – Roman
    Nov 15 '18 at 14:11











  • if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

    – 陈海栋
    Nov 15 '18 at 14:15











  • seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

    – 陈海栋
    Nov 15 '18 at 14:20











  • I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

    – Roman
    Nov 15 '18 at 14:21











  • I have updated my question. Thanks so far!

    – Roman
    Nov 16 '18 at 9:00
















1












1








1







sounds like you r a rest-api using celery worker redis as msg queue.
here is the chk list:



1 in your client did you close the connection after the logic finish



2 celery will new workers, the workers may cause trouble, try monitor celery with flower



3 make sure your client finish the task, try debug with print something, sometimes staging and local has network issues which are stopping you from ending the celery task



4 if you are using redis for celery msg queue, try monitor the number of queues, maybe they auto scale up?






share|improve this answer















sounds like you r a rest-api using celery worker redis as msg queue.
here is the chk list:



1 in your client did you close the connection after the logic finish



2 celery will new workers, the workers may cause trouble, try monitor celery with flower



3 make sure your client finish the task, try debug with print something, sometimes staging and local has network issues which are stopping you from ending the celery task



4 if you are using redis for celery msg queue, try monitor the number of queues, maybe they auto scale up?







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 14:13

























answered Nov 15 '18 at 14:09









陈海栋陈海栋

504




504













  • Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

    – Roman
    Nov 15 '18 at 14:11











  • if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

    – 陈海栋
    Nov 15 '18 at 14:15











  • seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

    – 陈海栋
    Nov 15 '18 at 14:20











  • I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

    – Roman
    Nov 15 '18 at 14:21











  • I have updated my question. Thanks so far!

    – Roman
    Nov 16 '18 at 9:00





















  • Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

    – Roman
    Nov 15 '18 at 14:11











  • if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

    – 陈海栋
    Nov 15 '18 at 14:15











  • seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

    – 陈海栋
    Nov 15 '18 at 14:20











  • I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

    – Roman
    Nov 15 '18 at 14:21











  • I have updated my question. Thanks so far!

    – Roman
    Nov 16 '18 at 9:00



















Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

– Roman
Nov 15 '18 at 14:11





Useful hints, but what are clients? Why are there so many clients connected on startup, even before a task is executed?

– Roman
Nov 15 '18 at 14:11













if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

– 陈海栋
Nov 15 '18 at 14:15





if you use redis as msg queue, it will set up a queue for just message, r u using redis for just msg queue , or other cache purposes?

– 陈海栋
Nov 15 '18 at 14:15













seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

– 陈海栋
Nov 15 '18 at 14:20





seems the prob will be at celery: i suggest u setup flower, flower.readthedocs.io/en/latest

– 陈海栋
Nov 15 '18 at 14:20













I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

– Roman
Nov 15 '18 at 14:21





I'll check out flower, right now I am looking at everything what celery does on the console, there everything looks okay.

– Roman
Nov 15 '18 at 14:21













I have updated my question. Thanks so far!

– Roman
Nov 16 '18 at 9:00







I have updated my question. Thanks so far!

– Roman
Nov 16 '18 at 9:00















1














Now i am 60% sure that it is your task that is taking too long and the server cannot respond within a default web request return time. The 70% / 30% thing is applicable when you are on a local machine, where the network is very fast. On the cloud platform latency is the problem and sometimes it affects your program. Before that, if the celery worker failed it will auto create another worker to finish the unfinished job because of gunicon and celery, which causes the increase of the connection.



So the solution is:




  • Option 1 make your task finish faster


  • Option 2 return an acknowledgement first , calculate at background, and make another api call to send back the results







share|improve this answer


























  • Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

    – Roman
    Nov 19 '18 at 7:36











  • From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

    – Roman
    Nov 19 '18 at 7:38











  • can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

    – 陈海栋
    Nov 20 '18 at 5:26











  • sorry @Roman most of the credits and bounties are stolen by some random person.

    – 陈海栋
    Nov 27 '18 at 1:40
















1














Now i am 60% sure that it is your task that is taking too long and the server cannot respond within a default web request return time. The 70% / 30% thing is applicable when you are on a local machine, where the network is very fast. On the cloud platform latency is the problem and sometimes it affects your program. Before that, if the celery worker failed it will auto create another worker to finish the unfinished job because of gunicon and celery, which causes the increase of the connection.



So the solution is:




  • Option 1 make your task finish faster


  • Option 2 return an acknowledgement first , calculate at background, and make another api call to send back the results







share|improve this answer


























  • Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

    – Roman
    Nov 19 '18 at 7:36











  • From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

    – Roman
    Nov 19 '18 at 7:38











  • can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

    – 陈海栋
    Nov 20 '18 at 5:26











  • sorry @Roman most of the credits and bounties are stolen by some random person.

    – 陈海栋
    Nov 27 '18 at 1:40














1












1








1







Now i am 60% sure that it is your task that is taking too long and the server cannot respond within a default web request return time. The 70% / 30% thing is applicable when you are on a local machine, where the network is very fast. On the cloud platform latency is the problem and sometimes it affects your program. Before that, if the celery worker failed it will auto create another worker to finish the unfinished job because of gunicon and celery, which causes the increase of the connection.



So the solution is:




  • Option 1 make your task finish faster


  • Option 2 return an acknowledgement first , calculate at background, and make another api call to send back the results







share|improve this answer















Now i am 60% sure that it is your task that is taking too long and the server cannot respond within a default web request return time. The 70% / 30% thing is applicable when you are on a local machine, where the network is very fast. On the cloud platform latency is the problem and sometimes it affects your program. Before that, if the celery worker failed it will auto create another worker to finish the unfinished job because of gunicon and celery, which causes the increase of the connection.



So the solution is:




  • Option 1 make your task finish faster


  • Option 2 return an acknowledgement first , calculate at background, and make another api call to send back the results








share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 26 '18 at 8:52









will-hart

2,37012742




2,37012742










answered Nov 17 '18 at 11:13









陈海栋陈海栋

504




504













  • Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

    – Roman
    Nov 19 '18 at 7:36











  • From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

    – Roman
    Nov 19 '18 at 7:38











  • can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

    – 陈海栋
    Nov 20 '18 at 5:26











  • sorry @Roman most of the credits and bounties are stolen by some random person.

    – 陈海栋
    Nov 27 '18 at 1:40



















  • Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

    – Roman
    Nov 19 '18 at 7:36











  • From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

    – Roman
    Nov 19 '18 at 7:38











  • can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

    – 陈海栋
    Nov 20 '18 at 5:26











  • sorry @Roman most of the credits and bounties are stolen by some random person.

    – 陈海栋
    Nov 27 '18 at 1:40

















Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

– Roman
Nov 19 '18 at 7:36





Thanks again for looking at my issue. I will investigate this, but my first thought is: Your idea sounds logical, but the 2 tasks I am testing with take only a few seconds both until they finish.

– Roman
Nov 19 '18 at 7:36













From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

– Roman
Nov 19 '18 at 7:38





From the docs: ...client has a configurable timeout — the deadline in seconds for replies to arrive in. This timeout defaults to one second. It could be what you said.

– Roman
Nov 19 '18 at 7:38













can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

– 陈海栋
Nov 20 '18 at 5:26





can you try replace the business logic to dummy, like do nothing in the api, see if thing are getting better, thanks for the stackoverflow points, BTW

– 陈海栋
Nov 20 '18 at 5:26













sorry @Roman most of the credits and bounties are stolen by some random person.

– 陈海栋
Nov 27 '18 at 1:40





sorry @Roman most of the credits and bounties are stolen by some random person.

– 陈海栋
Nov 27 '18 at 1:40


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321145%2fpython-redis-and-celery-too-many-clients-different-errors-on-each-execution-t%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen