14:31:30 <daviddavis> #startmeeting Pulp Triage 2020-04-21
14:31:30 <daviddavis> !start
14:31:30 <pulpbot> Meeting started Tue Apr 21 14:31:30 2020 UTC.  The chair is daviddavis. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:31:30 <pulpbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:31:30 <pulpbot> The meeting name has been set to 'pulp_triage_2020-04-21'
14:31:30 <daviddavis> #info daviddavis has joined triage
14:31:30 <pulpbot> daviddavis: daviddavis has joined triage
14:31:38 <ttereshc> ah holiday
14:31:41 <ttereshc> #info ttereshc has joined triage
14:31:41 <ttereshc> !here
14:31:41 <pulpbot> ttereshc: ttereshc has joined triage
14:31:45 <dalley> #info dalley has joined triage
14:31:45 <dalley> !here
14:31:45 <pulpbot> dalley: dalley has joined triage
14:31:53 <x9c4> #info x9c4 has joined triage
14:31:53 <x9c4> !here
14:31:53 <pulpbot> x9c4: x9c4 has joined triage
14:32:14 <dkliban> #info dkliban has joined triage
14:32:14 <dkliban> !here
14:32:14 <pulpbot> dkliban: dkliban has joined triage
14:32:17 <daviddavis> !next
14:32:18 <pulpbot> daviddavis: 5 issues left to triage: 6534, 6533, 6521, 6520, 6463
14:32:18 <daviddavis> #topic https://pulp.plan.io/issues/6534
14:32:19 <pulpbot> RM 6534 - ttereshc - NEW - Having same content in one batch can cause issues in _post_save of ContentSaver
14:32:20 <pulpbot> https://pulp.plan.io/issues/6534
14:32:46 <bmbouter> #info bmbouter has joined triage
14:32:46 <bmbouter> !here
14:32:46 <pulpbot> bmbouter: bmbouter has joined triage
14:33:10 <bmbouter> I put this on the agenda for the pulpcore meeting
14:33:20 <dkliban> just accept?
14:33:25 <bmbouter> I think so
14:33:25 <ttereshc> skip?
14:33:27 <ttereshc> :)
14:33:28 <bmbouter> skip
14:33:30 <dkliban> ok
14:33:36 <daviddavis> !skip
14:33:37 <daviddavis> #topic https://pulp.plan.io/issues/6533
14:33:37 <pulpbot> daviddavis: 4 issues left to triage: 6533, 6521, 6520, 6463
14:33:38 <pulpbot> RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state
14:33:39 <pulpbot> https://pulp.plan.io/issues/6533
14:33:57 <ggainey> #info ggainey has joined triage
14:33:57 <ggainey> !here
14:33:57 <pulpbot> ggainey: ggainey has joined triage
14:34:00 <dkliban> this is the issue i was experiencing on friday
14:34:05 <dkliban> we need to accept and add to sprint
14:34:12 <ttereshc> I think it might be the same issue with rq
14:34:24 <daviddavis> #idea Proposed for #6533: accept and add to sprint
14:34:24 <daviddavis> !propose other accept and add to sprint
14:34:24 <pulpbot> daviddavis: Proposed for #6533: accept and add to sprint
14:34:24 <ipanova> #info ipanova has joined triage
14:34:24 <ipanova> !here
14:34:25 <pulpbot> ipanova: ipanova has joined triage
14:34:33 <x9c4> +1
14:34:38 <ipanova> +1
14:34:47 <daviddavis> #agreed accept and add to sprint
14:34:47 <daviddavis> !accept
14:34:47 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint
14:34:48 <daviddavis> #topic https://pulp.plan.io/issues/6521
14:34:48 <pulpbot> daviddavis: 3 issues left to triage: 6521, 6520, 6463
14:34:49 <pulpbot> RM 6521 - lmjachky - NEW - An internal server error is raised when creating a new content using repository_version instead of repository
14:34:50 <ttereshc> when I experienced the issue, I had 68 queues and some work was dispatched to the stalled/old workers
14:34:51 <pulpbot> https://pulp.plan.io/issues/6521
14:35:05 <ipanova> #idea Proposed for #6521: accept and add to sprint
14:35:05 <ipanova> !propose other accept and add to sprint
14:35:05 <pulpbot> ipanova: Proposed for #6521: accept and add to sprint
14:35:10 <dkliban> +1
14:35:20 <x9c4> +1
14:35:33 <ttereshc> +1
14:35:35 <dkliban> ttereshc: but did the tasks in your case have a 'waiting' state or 'running' state?
14:35:54 <ttereshc> first running and after subsequent restarts waitning
14:36:02 <ttereshc> until full clean up
14:36:03 <dalley> I've seen that before
14:36:14 <daviddavis> interesting
14:36:16 <dalley> whenever that happens, for me, redis-cli FLUSHALL usually fixes it
14:36:31 <dalley> the subsequent restarts waiting bit I mean
14:36:31 <bmbouter> we have this bug from bin-li and the reproducer from david also
14:36:53 <bmbouter> if you restart redis postgresql holds onto the task and it doesn't cancel correctyl but it's lost from RQ
14:37:11 <bmbouter> maybe we chat more at open floor about it and skip for now
14:37:14 <daviddavis> ok
14:37:16 <dkliban> ok
14:37:18 <daviddavis> #agreed accept and add to sprint
14:37:18 <daviddavis> !accept
14:37:18 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint
14:37:19 <pulpbot> daviddavis: 2 issues left to triage: 6520, 6463
14:37:19 <daviddavis> #topic https://pulp.plan.io/issues/6520
14:37:20 <pulpbot> RM 6520 - ipanova@redhat.com - POST - Regression: publishing an empty ISO repo no longer publishes PULP_MANIFEST
14:37:21 <pulpbot> https://pulp.plan.io/issues/6520
14:37:22 <bmbouter> I'm ok to add to sprint also but without a reproducer who can take
14:37:32 <bmbouter> at POST, accept?
14:37:46 <ipanova> yes, actually should be modified
14:37:57 <ipanova> will move
14:37:57 <daviddavis> +1
14:38:10 <daviddavis> can you accept it too?
14:38:14 <dkliban> this is confusing ... is this pulp 2?
14:38:15 <daviddavis> #idea Proposed for #6520: Leave the issue as-is, accepting its current state.
14:38:15 <daviddavis> !propose accept
14:38:15 <pulpbot> daviddavis: Proposed for #6520: Leave the issue as-is, accepting its current state.
14:38:18 <ipanova> it is pulp2
14:38:19 <daviddavis> dkliban: yes
14:38:21 <dkliban> ok
14:38:22 <daviddavis> it's tagged pulp 2
14:38:29 <dkliban> i see that now
14:38:31 <dkliban> thanks
14:38:33 <daviddavis> cool
14:38:39 <ipanova> done
14:38:45 <daviddavis> #agreed Leave the issue as-is, accepting its current state.
14:38:45 <daviddavis> !accept
14:38:45 <pulpbot> daviddavis: Current proposal accepted: Leave the issue as-is, accepting its current state.
14:38:46 <daviddavis> #topic https://pulp.plan.io/issues/6463
14:38:46 <pulpbot> daviddavis: 1 issues left to triage: 6463
14:38:47 <pulpbot> RM 6463 - binlinf0 - NEW - pulp 3.2.1 duplicate key error when sync
14:38:48 <pulpbot> https://pulp.plan.io/issues/6463
14:39:00 <dkliban> let's skip again
14:39:05 <daviddavis> !skip
14:39:06 <pulpbot> daviddavis: No issues to triage.
14:39:08 <bmbouter> agreed, we emailed asking for more info again today
14:39:23 <ipanova> dkliban: he replied on the list that re-creating repos fixes the issue
14:39:43 <dkliban> i know ... i think he must have created the repos originally when there was a bug related to this
14:40:00 <ipanova> can be
14:40:07 <dkliban> cause he's been using pulpcore since 3.0.0
14:40:21 <dkliban> since before then, but i suspect he did a rebuild at 3.0.0
14:40:50 <dkliban> i am going to try to find an issue related ... if there is one
14:41:02 <mikedep333> #info mikedep333 has joined triage
14:41:02 <mikedep333> !here
14:41:02 <pulpbot> mikedep333: mikedep333 has joined triage
14:41:52 <daviddavis> do we want to discuss https://pulp.plan.io/issues/6533 more?
14:42:44 <dkliban> yes
14:43:00 <daviddavis> !issue 6533
14:43:01 <daviddavis> #topic https://pulp.plan.io/issues/6533
14:43:01 <pulpbot> RM 6533 - ipanova@redhat.com - NEW - Task get stuck in 'running' state
14:43:02 <pulpbot> https://pulp.plan.io/issues/6533
14:43:16 <dkliban> i have been able to reproduce consistently by running the migration plan and then trying to sync a migrated repository
14:43:28 <dkliban> this is on EL7 and python 3.6.8
14:43:49 <bmbouter> if there is a reproducer then someone could take it
14:43:56 <ipanova> bmbouter: there is a reproducer
14:43:57 <dkliban> i'll post it on the issue
14:44:10 <ipanova> i have provided all the steps there
14:44:40 <dkliban> ipanova: oh yeah ... you have the exact same reproduction stteps i had in mind
14:45:15 <ttereshc> I wonder why migration plugin task triggers that issue :/
14:46:01 <ipanova> yeah
14:46:02 <daviddavis> yea strange
14:46:07 <daviddavis> so accept and add to sprint?
14:46:11 <dkliban> yes please
14:46:13 <x9c4> +1
14:46:15 <ipanova> +1
14:46:23 <dalley> +1
14:46:24 <ttereshc> +1 if someone has capacity to investigate that
14:46:26 <bmbouter> so here's more I want to share about it
14:46:27 <daviddavis> #idea Proposed for #6533: accept and add to sprint
14:46:27 <daviddavis> !propose other accept and add to sprint
14:46:27 <pulpbot> daviddavis: Proposed for #6533: accept and add to sprint
14:46:30 <daviddavis> #agreed accept and add to sprint
14:46:30 <daviddavis> !accept
14:46:30 <pulpbot> daviddavis: Current proposal accepted: accept and add to sprint
14:46:31 <pulpbot> daviddavis: No issues to triage.
14:46:34 <bmbouter> and I'll write on the issue the same but here for more visibility
14:46:52 <bmbouter> we tried py-spy and it didn't yield results showing what the task is doing which is ... strange
14:47:10 <daviddavis> weird
14:47:25 <bmbouter> so I reocmmend a gdb coredump of the child process (the process that is forked from the RQ parent for each task)
14:47:37 <bmbouter> that will for sure show each thread and where in the python code the interpreter is "stuck"
14:47:53 <bmbouter> and we should take a few core dumps over a few seconds and compare them to see if it's "halted" or "looping in one area"
14:48:13 <bmbouter> we need insight into what it's doing to learn more
14:48:27 <bmbouter> I'll write this on the issue also along w/ some links and commands on how to do this
14:48:28 <daviddavis> cool, sounds like a plan
14:48:50 <x9c4> There is also a builtin async debug option in python.
14:49:20 <x9c4> To show if any coroutine blocks the processor too long or if futures are not awaited.
14:51:39 <daviddavis> interesting, did not know that
14:51:59 <daviddavis> we should write up a guide to debugging stuck tasks
14:52:08 <daviddavis> capture all this info
14:52:18 <x9c4> Put a not in the issue.
14:52:27 <x9c4> note
14:52:38 <daviddavis> +1
14:52:49 <daviddavis> alright, last call for triage
14:53:27 <bmbouter> daviddavis: agreed, if someone filed it I could write such docs
14:53:33 <daviddavis> ok, I'll do it
14:53:39 <daviddavis> x9c4: is this what you were talking about https://docs.python.org/3.8/library/asyncio-dev.html
14:53:48 <x9c4> Yes!
14:53:52 <daviddavis> awesome
14:53:59 <x9c4> i linked it in the issue.
14:54:04 <daviddavis> great
14:54:13 <daviddavis> #endmeeting