14:31:30 #startmeeting Pulp Triage 2020-04-21 14:31:30 !start 14:31:30 Meeting started Tue Apr 21 14:31:30 2020 UTC. The chair is daviddavis. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:31:30 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:31:30 The meeting name has been set to 'pulp_triage_2020-04-21' 14:31:30 #info daviddavis has joined triage 14:31:30 daviddavis: daviddavis has joined triage 14:31:38 ah holiday 14:31:41 #info ttereshc has joined triage 14:31:41 !here 14:31:41 ttereshc: ttereshc has joined triage 14:31:45 #info dalley has joined triage 14:31:45 !here 14:31:45 dalley: dalley has joined triage 14:31:53 #info x9c4 has joined triage 14:31:53 !here 14:31:53 x9c4: x9c4 has joined triage 14:32:14 #info dkliban has joined triage 14:32:14 !here 14:32:14 dkliban: dkliban has joined triage 14:32:17 !next 14:32:18 daviddavis: 5 issues left to triage: 6534, 6533, 6521, 6520, 6463 14:32:18 #topic https://pulp.plan.io/issues/6534 14:32:19 RM 6534 - ttereshc - NEW - Having same content in one batch can cause issues in _post_save of ContentSaver 14:32:20 https://pulp.plan.io/issues/6534 14:32:46 #info bmbouter has joined triage 14:32:46 !here 14:32:46 bmbouter: bmbouter has joined triage 14:33:10 I put this on the agenda for the pulpcore meeting 14:33:20 just accept? 14:33:25 I think so 14:33:25 skip? 14:33:27 :) 14:33:28 skip 14:33:30 ok 14:33:36 !skip 14:33:37 #topic https://pulp.plan.io/issues/6533 14:33:37 daviddavis: 4 issues left to triage: 6533, 6521, 6520, 6463 14:33:38 RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state 14:33:39 https://pulp.plan.io/issues/6533 14:33:57 #info ggainey has joined triage 14:33:57 !here 14:33:57 ggainey: ggainey has joined triage 14:34:00 this is the issue i was experiencing on friday 14:34:05 we need to accept and add to sprint 14:34:12 I think it might be the same issue with rq 14:34:24 #idea Proposed for #6533: accept and add to sprint 14:34:24 !propose other accept and add to sprint 14:34:24 daviddavis: Proposed for #6533: accept and add to sprint 14:34:24 #info ipanova has joined triage 14:34:24 !here 14:34:25 ipanova: ipanova has joined triage 14:34:33 +1 14:34:38 +1 14:34:47 #agreed accept and add to sprint 14:34:47 !accept 14:34:47 daviddavis: Current proposal accepted: accept and add to sprint 14:34:48 #topic https://pulp.plan.io/issues/6521 14:34:48 daviddavis: 3 issues left to triage: 6521, 6520, 6463 14:34:49 RM 6521 - lmjachky - NEW - An internal server error is raised when creating a new content using repository_version instead of repository 14:34:50 when I experienced the issue, I had 68 queues and some work was dispatched to the stalled/old workers 14:34:51 https://pulp.plan.io/issues/6521 14:35:05 #idea Proposed for #6521: accept and add to sprint 14:35:05 !propose other accept and add to sprint 14:35:05 ipanova: Proposed for #6521: accept and add to sprint 14:35:10 +1 14:35:20 +1 14:35:33 +1 14:35:35 ttereshc: but did the tasks in your case have a 'waiting' state or 'running' state? 14:35:54 first running and after subsequent restarts waitning 14:36:02 until full clean up 14:36:03 I've seen that before 14:36:14 interesting 14:36:16 whenever that happens, for me, redis-cli FLUSHALL usually fixes it 14:36:31 the subsequent restarts waiting bit I mean 14:36:31 we have this bug from bin-li and the reproducer from david also 14:36:53 if you restart redis postgresql holds onto the task and it doesn't cancel correctyl but it's lost from RQ 14:37:11 maybe we chat more at open floor about it and skip for now 14:37:14 ok 14:37:16 ok 14:37:18 #agreed accept and add to sprint 14:37:18 !accept 14:37:18 daviddavis: Current proposal accepted: accept and add to sprint 14:37:19 daviddavis: 2 issues left to triage: 6520, 6463 14:37:19 #topic https://pulp.plan.io/issues/6520 14:37:20 RM 6520 - ipanova@redhat.com - POST - Regression: publishing an empty ISO repo no longer publishes PULP_MANIFEST 14:37:21 https://pulp.plan.io/issues/6520 14:37:22 I'm ok to add to sprint also but without a reproducer who can take 14:37:32 at POST, accept? 14:37:46 yes, actually should be modified 14:37:57 will move 14:37:57 +1 14:38:10 can you accept it too? 14:38:14 this is confusing ... is this pulp 2? 14:38:15 #idea Proposed for #6520: Leave the issue as-is, accepting its current state. 14:38:15 !propose accept 14:38:15 daviddavis: Proposed for #6520: Leave the issue as-is, accepting its current state. 14:38:18 it is pulp2 14:38:19 dkliban: yes 14:38:21 ok 14:38:22 it's tagged pulp 2 14:38:29 i see that now 14:38:31 thanks 14:38:33 cool 14:38:39 done 14:38:45 #agreed Leave the issue as-is, accepting its current state. 14:38:45 !accept 14:38:45 daviddavis: Current proposal accepted: Leave the issue as-is, accepting its current state. 14:38:46 #topic https://pulp.plan.io/issues/6463 14:38:46 daviddavis: 1 issues left to triage: 6463 14:38:47 RM 6463 - binlinf0 - NEW - pulp 3.2.1 duplicate key error when sync 14:38:48 https://pulp.plan.io/issues/6463 14:39:00 let's skip again 14:39:05 !skip 14:39:06 daviddavis: No issues to triage. 14:39:08 agreed, we emailed asking for more info again today 14:39:23 dkliban: he replied on the list that re-creating repos fixes the issue 14:39:43 i know ... i think he must have created the repos originally when there was a bug related to this 14:40:00 can be 14:40:07 cause he's been using pulpcore since 3.0.0 14:40:21 since before then, but i suspect he did a rebuild at 3.0.0 14:40:50 i am going to try to find an issue related ... if there is one 14:41:02 #info mikedep333 has joined triage 14:41:02 !here 14:41:02 mikedep333: mikedep333 has joined triage 14:41:52 do we want to discuss https://pulp.plan.io/issues/6533 more? 14:42:44 yes 14:43:00 !issue 6533 14:43:01 #topic https://pulp.plan.io/issues/6533 14:43:01 RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state 14:43:02 https://pulp.plan.io/issues/6533 14:43:16 i have been able to reproduce consistently by running the migration plan and then trying to sync a migrated repository 14:43:28 this is on EL7 and python 3.6.8 14:43:49 if there is a reproducer then someone could take it 14:43:56 bmbouter: there is a reproducer 14:43:57 i'll post it on the issue 14:44:10 i have provided all the steps there 14:44:40 ipanova: oh yeah ... you have the exact same reproduction stteps i had in mind 14:45:15 I wonder why migration plugin task triggers that issue :/ 14:46:01 yeah 14:46:02 yea strange 14:46:07 so accept and add to sprint? 14:46:11 yes please 14:46:13 +1 14:46:15 +1 14:46:23 +1 14:46:24 +1 if someone has capacity to investigate that 14:46:26 so here's more I want to share about it 14:46:27 #idea Proposed for #6533: accept and add to sprint 14:46:27 !propose other accept and add to sprint 14:46:27 daviddavis: Proposed for #6533: accept and add to sprint 14:46:30 #agreed accept and add to sprint 14:46:30 !accept 14:46:30 daviddavis: Current proposal accepted: accept and add to sprint 14:46:31 daviddavis: No issues to triage. 14:46:34 and I'll write on the issue the same but here for more visibility 14:46:52 we tried py-spy and it didn't yield results showing what the task is doing which is ... strange 14:47:10 weird 14:47:25 so I reocmmend a gdb coredump of the child process (the process that is forked from the RQ parent for each task) 14:47:37 that will for sure show each thread and where in the python code the interpreter is "stuck" 14:47:53 and we should take a few core dumps over a few seconds and compare them to see if it's "halted" or "looping in one area" 14:48:13 we need insight into what it's doing to learn more 14:48:27 I'll write this on the issue also along w/ some links and commands on how to do this 14:48:28 cool, sounds like a plan 14:48:50 There is also a builtin async debug option in python. 14:49:20 To show if any coroutine blocks the processor too long or if futures are not awaited. 14:51:39 interesting, did not know that 14:51:59 we should write up a guide to debugging stuck tasks 14:52:08 capture all this info 14:52:18 Put a not in the issue. 14:52:27 note 14:52:38 +1 14:52:49 alright, last call for triage 14:53:27 daviddavis: agreed, if someone filed it I could write such docs 14:53:33 ok, I'll do it 14:53:39 x9c4: is this what you were talking about https://docs.python.org/3.8/library/asyncio-dev.html 14:53:48 Yes! 14:53:52 awesome 14:53:59 i linked it in the issue. 14:54:04 great 14:54:13 #endmeeting 14:54:13 !end