15:31:21 <fao89> #startmeeting Pulp Triage 2020-12-01 15:31:21 <fao89> #info fao89 has joined triage 15:31:21 <fao89> !start 15:31:21 <pulpbot> Meeting started Tue Dec 1 15:31:21 2020 UTC. The chair is fao89. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:31:21 <pulpbot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:31:21 <pulpbot> The meeting name has been set to 'pulp_triage_2020-12-01' 15:31:21 <pulpbot> fao89: fao89 has joined triage 15:31:28 <fao89> Open floor! 15:31:46 <bmbouter> #info bmbouter has joined triage 15:31:46 <bmbouter> !here 15:31:48 <pulpbot> bmbouter: bmbouter has joined triage 15:31:50 <ttereshc> #info ttereshc has joined triage 15:31:50 <ttereshc> !here 15:31:50 <pulpbot> ttereshc: ttereshc has joined triage 15:31:56 <fao89> no items on the agenda so far 15:32:08 <daviddavis> #info daviddavis has joined triage 15:32:08 <daviddavis> !here 15:32:08 <pulpbot> daviddavis: daviddavis has joined triage 15:32:31 <fao89> should we start triage? 15:32:51 <daviddavis> I think so 15:33:23 <fao89> !next 15:33:24 <fao89> #topic https://pulp.plan.io/issues/7909 15:33:24 <pulpbot> fao89: 7 issues left to triage: 7909, 7908, 7907, 7904, 7876, 7857, 5502 15:33:25 <pulpbot> RM 7909 - ipanova@redhat.com - POST - Downloader map from repair feature contains only core downloaders 15:33:26 <pulpbot> https://pulp.plan.io/issues/7909 15:33:32 <ipanova> #info ipanova has joined triage 15:33:32 <ipanova> !here 15:33:32 <pulpbot> ipanova: ipanova has joined triage 15:33:49 <fao89> #idea Proposed for #7909: accept and add to sprint 15:33:49 <fao89> !propose other accept and add to sprint 15:33:49 <pulpbot> fao89: Proposed for #7909: accept and add to sprint 15:34:00 <ipanova> +1 15:34:09 <ttereshc> +1 15:34:19 <fao89> #agreed accept and add to sprint 15:34:19 <fao89> !accept 15:34:19 <pulpbot> fao89: Current proposal accepted: accept and add to sprint 15:34:20 <fao89> #topic https://pulp.plan.io/issues/7908 15:34:20 <pulpbot> fao89: 6 issues left to triage: 7908, 7907, 7904, 7876, 7857, 5502 15:34:21 <pulpbot> RM 7908 - ggainey - NEW - Make sure all exceptions live in pulpcore.plugin.exceptions 15:34:22 <pulpbot> https://pulp.plan.io/issues/7908 15:34:33 <ggainey> #info ggainey has joined triage 15:34:33 <ggainey> !here 15:34:33 <pulpbot> ggainey: ggainey has joined triage 15:34:48 <mikedep333> #info mikedep333 has joined triage 15:34:48 <mikedep333> !here 15:34:49 <pulpbot> mikedep333: mikedep333 has joined triage 15:35:03 <fao89> story? 15:35:05 <ttereshc> why deprecation is needed? 15:35:05 <ggainey> 7908 is to clean up some misplaced exceptions - it'll require a deprecation cycle to complete cleanly 15:35:25 <ggainey> ttereshc: because we're moving an exception, anything using it where it is now would break 15:35:46 <ggainey> fao89: maybe task? 15:35:48 <ttereshc> plugins are not allowed to use anything outside plugin api 15:35:56 <ttereshc> is it moving within plugin api? 15:36:02 <fao89> #idea Proposed for #7908: convert to task 15:36:02 <fao89> !propose other convert to task 15:36:02 <pulpbot> fao89: Proposed for #7908: convert to task 15:37:05 <ggainey> ttereshc: that's a good point, I think the one exception I *know* is involved, shoud be in plugins but isn't currently 15:37:13 <daviddavis> I see at least one UnsupportedDigestValidationError 15:37:21 <daviddavis> it's in pulpcore.plugin.models 15:37:30 <ggainey> ah right, ok 15:37:34 <ggainey> daviddavis++ 15:37:34 <pulpbot> ggainey: daviddavis's karma is now 401 15:37:34 <ttereshc> I'm all for moving it into one place 15:37:39 <ttereshc> just without deprecation 15:38:20 <ggainey> ttereshc: daviddavis put his finger on the problem 15:39:03 <ttereshc> what do you mean by that? 15:39:56 <ggainey> ttereshc: the current location is exposed to plugins in pulpcore.plugins.modles, and it needs to move from there 15:40:00 <ggainey> models even 15:40:07 <ttereshc> ah ok 15:40:16 <daviddavis> so like I'm imagining we add the exception(s) to their final place in 3.9 and then remove them from their old place in 3.10 15:40:35 <ggainey> daviddavis: yupyup, sounds right 15:40:46 <ttereshc> yeah 15:40:48 <daviddavis> woudl you be able to do this this week? https://pulp.plan.io/issues/7908 15:41:04 <ggainey> daviddavis: yup, it's on my list for 'asap' 15:41:11 <daviddavis> ggainey++ 15:41:11 <pulpbot> daviddavis: ggainey's karma is now 64 15:41:16 <ggainey> precisely so it can get into 3.9 15:41:26 <daviddavis> !propose convert to task and add to sprint 15:41:26 <pulpbot> daviddavis: Error: "propose" is not a valid command. 15:41:32 <daviddavis> I'll add the 3.9 milestone 15:41:36 <ggainey> thanks 15:41:56 <bmbouter> why not as the pr introducing the new exception? 15:41:57 <fao89> #idea Proposed for #7908: convert to task and add to sprint 15:41:57 <fao89> !propose other convert to task and add to sprint 15:41:57 <pulpbot> fao89: Proposed for #7908: convert to task and add to sprint 15:42:20 <ggainey> concur 15:42:22 <bmbouter> or maybe this is to move the other ones? 15:42:43 <ipanova> bmbouter: the other ones as well 15:42:43 <bmbouter> or maybe the value of this other issue is to have a plugin writer release note introducing it? 15:42:58 <bmbouter> that makes sense, ty 15:43:35 <fao89> #agreed convert to task and add to sprint 15:43:35 <fao89> !accept 15:43:35 <pulpbot> fao89: Current proposal accepted: convert to task and add to sprint 15:43:36 <fao89> #topic https://pulp.plan.io/issues/7907 15:43:37 <pulpbot> fao89: 5 issues left to triage: 7907, 7904, 7876, 7857, 5502 15:43:38 <pulpbot> RM 7907 - osapryki - NEW - Failed curate_synclist_repository task did not clean up properly resource reservations 15:43:39 <pulpbot> https://pulp.plan.io/issues/7907 15:44:26 <bmbouter> oh yeah I got contacted on slack this morning about this one 15:44:43 <ggainey> ow, painful 15:44:56 <ggainey> (the issue, not necessarily slack-contact :) ) 15:44:59 <fao89> accept or accept and add? 15:45:42 <daviddavis> I lean towards accept and add. maybe even high severity? 15:45:49 <daviddavis> this puts the system in an unusable state 15:45:53 <bmbouter> agreed, but the issue is what to dooooo 15:46:04 <fao89> #idea Proposed for #7907: accept and add to sprint 15:46:04 <fao89> !propose other accept and add to sprint 15:46:04 <pulpbot> fao89: Proposed for #7907: accept and add to sprint 15:46:05 <dkliban> yeah 15:46:07 <bmbouter> we know that redis queues are unstable 15:46:08 <dkliban> #info dkliban has joined triage 15:46:08 <dkliban> !here 15:46:08 <pulpbot> dkliban: dkliban has joined triage 15:46:18 <bmbouter> this is not new information 15:46:34 <ipanova> apparently this happens only when there is 1 worker 15:46:45 <ttereshc> any cleanup we can do on task failure? or does it happen at other point? 15:46:55 <bmbouter> we know that RQ doesn't support clustered redis to guard against redis failure 15:47:03 <fao89> should we make workers=2 as default? 15:47:05 <bmbouter> the issue is the task has not failed... 15:47:13 <bmbouter> worker count is not part of the issue I believe 15:47:20 <ttereshc> I guess the full blockage is easier to reproduce with one worker 15:47:29 <ttereshc> but eventually all can be blocked probably 15:47:31 <bmbouter> true 15:48:07 <bmbouter> so here's the issue ... redis fails and the task is lost from redis, the worker stays running and heartbeating so pulp can neve rknow that there was a prolbem 15:48:13 <bmbouter> the heartbeats flow to postgresql 15:48:48 <bmbouter> this is one main motivation why I think we should eliminate the resource manager and move to tasking recorded entirely in postgresql 15:49:11 <bmbouter> because our correctness is "split" across two systems, and note RQ doesn't support clustered redis so redis is a single point of failure 15:49:29 <bmbouter> so one of those systems is unreliable and that is beyond our control (or influence) 15:49:50 <dkliban> makes sense 15:50:00 <bmbouter> fwiw I think this is a dupe of https://pulp.plan.io/issues/7386 15:50:42 <bmbouter> so this is triage and we can move on but the thing is we can't just "put it on the sprint without more action" 15:50:57 <ttereshc> can we periodically check with redis if the task is there and act accordingly? It's not the solution but maybe a less invasive workaround while we have a resource_manager. 15:51:46 <bmbouter> we could have the resource manager probably do checking 15:51:57 <bmbouter> but then we'll get another bug I suspect "pulp mysteriously cancels my tasks" 15:52:23 <ttereshc> heh, yeah 15:52:33 <bmbouter> the args and kwargs live in the redis task alone so we can't redispatch 15:52:37 <ipanova> let's just triage it for now, and have more discussion on the issue? 15:52:54 <ggainey> sounds fair 15:52:54 <bmbouter> the problem is we're getting "stern talkings to" about this 15:53:05 <dalley> bmbouter, an alternative is have the heartbeats record in redis 15:53:25 <bmbouter> that's possible, but I think that further splits our system though 15:53:40 <dalley> indeed 15:53:51 <bmbouter> I can schedule a 30 min call to discuss if folks would want to join 15:54:00 <bmbouter> or discuss on the issue 15:54:21 <bmbouter> the only viable road I see is resource manager elimination and removal of redis from the architecture 15:54:30 <dalley> I don't have any objection to a queue design based on postgresql, we don't have high throughput requirements (where high is measured in terms of tasks/second) 15:54:31 <bmbouter> I can sufficiently motivate this on the issue I guess would be a good next step 15:55:00 <bmbouter> let me post on the issue and send out to pulp-dev for feedback, how does that sound? 15:55:06 <ggainey> +1 15:55:10 <ttereshc> I'm also not against removing the resource manager 15:55:16 <ttereshc> +1 thank you bmbouter 15:55:22 <daviddavis> +1 15:55:31 <ttereshc> I was more thinking what can be done if we need to fix it asap 15:55:31 <bmbouter> yeah I perceive that, the proposal of a less invasive workaround is a good one 15:55:53 <ipanova> +1 to also have a call a talk about the resource manager elimination 15:56:06 <bmbouter> I will include the workaround in my post also, perhaps we'll do both 15:56:20 <bmbouter> and schedule a call following pulp-dev posting 15:56:31 <ipanova> ty 15:56:39 <fao89> which action should I take? accept? skip? accept and add? 15:56:48 <bmbouter> accept I prio and I'll post on it 15:57:02 <bmbouter> "accept w/ high prio" that should have said... 15:57:05 <dalley> minorly related to https://pulp.plan.io/issues/4343 also perhaps 15:57:18 <fao89> #idea Proposed for #7907: accept with high prio 15:57:18 <fao89> !propose other accept with high prio 15:57:18 <pulpbot> fao89: Proposed for #7907: accept with high prio 15:57:25 <fao89> #agreed accept with high prio 15:57:25 <fao89> !accept 15:57:25 <pulpbot> fao89: Current proposal accepted: accept with high prio 15:57:26 <fao89> #topic https://pulp.plan.io/issues/7904 15:57:27 <pulpbot> fao89: 4 issues left to triage: 7904, 7876, 7857, 5502 15:57:28 <pulpbot> RM 7904 - ggainey - NEW - PulpImport can deadlock when importing Centos*-base and app-stream in one import file 15:57:29 <pulpbot> https://pulp.plan.io/issues/7904 15:57:47 <daviddavis> accept and add to sprint I think 15:57:51 <ggainey> I hit this yesterday -opened the issue to record the scenario and failure-output 15:57:58 <ggainey> aye 15:58:12 <fao89> #idea Proposed for #7904: accept and add to sprint 15:58:12 <fao89> !propose other accept and add to sprint 15:58:12 <pulpbot> fao89: Proposed for #7904: accept and add to sprint 15:58:24 <fao89> #agreed accept and add to sprint 15:58:24 <fao89> !accept 15:58:24 <pulpbot> fao89: Current proposal accepted: accept and add to sprint 15:58:25 <fao89> #topic https://pulp.plan.io/issues/7876 15:58:25 <pulpbot> fao89: 3 issues left to triage: 7876, 7857, 5502 15:58:26 <pulpbot> RM 7876 - adam.winberg@smhi.se - NEW - NoneType' object has no attribute 'pk' 15:58:27 <pulpbot> https://pulp.plan.io/issues/7876 15:59:11 <ttereshc> we discussed this one last week at pulpcore meeting and agreed to fix in pulpcore 15:59:33 <ttereshc> the issue can be reproduced using the migration plugin 15:59:34 <fao89> #idea Proposed for #7876: Leave the issue as-is, accepting its current state. 15:59:34 <fao89> !propose accept 15:59:34 <pulpbot> fao89: Proposed for #7876: Leave the issue as-is, accepting its current state. 15:59:43 <dkliban> +1 15:59:47 <ggainey> +1 15:59:56 <dkliban> ttereshc: add to sprint? 16:00:49 <ipanova> +1 not against adding to sprint 16:00:59 <ttereshc> can be 16:01:04 <ipanova> the fix should be small and easy if i remember correctly 16:01:16 <ttereshc> it's not the highest priority but need to be fixed 16:01:22 <fao89> #idea Proposed for #7876: accept and add to sprint 16:01:22 <fao89> !propose other accept and add to sprint 16:01:22 <pulpbot> fao89: Proposed for #7876: accept and add to sprint 16:01:26 <fao89> #agreed accept and add to sprint 16:01:26 <fao89> !accept 16:01:27 <pulpbot> fao89: Current proposal accepted: accept and add to sprint 16:01:27 <fao89> #topic https://pulp.plan.io/issues/7857 16:01:28 <pulpbot> fao89: 2 issues left to triage: 7857, 5502 16:01:29 <pulpbot> RM 7857 - jsherril@redhat.com - NEW - 502 proxy error when trying to yum install a package 16:01:30 <pulpbot> https://pulp.plan.io/issues/7857 16:03:14 <ggainey> not a lot to go on in the issue 16:03:37 <fao89> we skipped it last time, but I don't remember if we had an AI 16:04:57 <fao89> skip again? 16:06:03 <dkliban> could this be related to the django cleanup integration we had? 16:06:06 <ttereshc> we can close it asking to provide more info to reproduce if someone encounters it again 16:06:20 <ttereshc> dkliban, it was introduced in 3.7 16:06:23 <dkliban> gotcha 16:06:24 <ttereshc> we ruled it out last time 16:06:45 <dkliban> yeah ... let's close it out for now 16:07:04 <ttereshc> fao89, I can do it if you want 16:07:20 <fao89> please, do it 16:07:24 <ipanova> i found in the logs that we meant to ping jsherrill and ask more info about setup 16:07:49 <ttereshc> we can close and still do it I think 16:07:50 <fao89> #idea Proposed for #7857: close and ask for more info 16:07:50 <fao89> !propose other close and ask for more info 16:07:50 <pulpbot> fao89: Proposed for #7857: close and ask for more info 16:07:59 <ipanova> ok 16:08:26 <fao89> #agreed close and ask for more info 16:08:26 <fao89> !accept 16:08:26 <pulpbot> fao89: Current proposal accepted: close and ask for more info 16:08:27 <pulpbot> fao89: 1 issues left to triage: 5502 16:08:27 <fao89> #topic https://pulp.plan.io/issues/5502 16:08:28 <pulpbot> RM 5502 - mihai.ibanescu@gmail.com - ASSIGNED - worker fails to mark a task as failed when critical conditions are encountered 16:08:29 <pulpbot> https://pulp.plan.io/issues/5502 16:09:10 <fao89> bmbouter: ^ 16:09:17 <ipanova> this already has pr 16:09:50 <ipanova> just need to update the commit message 16:09:56 <ipanova> let's triage and put on the sprint 16:10:08 * bmbouter looks 16:10:23 <fao89> #idea Proposed for #5502: accept and add to sprint 16:10:23 <fao89> !propose other accept and add to sprint 16:10:23 <pulpbot> fao89: Proposed for #5502: accept and add to sprint 16:10:29 <bmbouter> yes this PR is open 16:10:52 <ttereshc> +1 16:10:53 <fao89> #agreed accept and add to sprint 16:10:53 <fao89> !accept 16:10:53 <pulpbot> fao89: Current proposal accepted: accept and add to sprint 16:10:54 <pulpbot> fao89: No issues to triage. 16:10:54 <bmbouter> I need to add a changelog to it 16:10:57 <bmbouter> +1 16:11:11 <fao89> #endmeeting