14:32:00 #startmeeting Pulp Triage 2020-07-28 14:32:00 !start 14:32:00 #info fao89 has joined triage 14:32:00 Meeting started Tue Jul 28 14:32:00 2020 UTC. The chair is fao89. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:32:00 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:32:00 The meeting name has been set to 'pulp_triage_2020-07-28' 14:32:00 fao89: fao89 has joined triage 14:32:04 #info dkliban has joined triage 14:32:04 !here 14:32:04 dkliban: dkliban has joined triage 14:32:05 #info daviddavis has joined triage 14:32:05 !here 14:32:06 daviddavis: daviddavis has joined triage 14:32:07 #info ppicka has joined triage 14:32:07 !here 14:32:07 ppicka: ppicka has joined triage 14:32:12 !next 14:32:13 #topic https://pulp.plan.io/issues/7220 14:32:13 fao89: 7 issues left to triage: 7220, 7212, 7205, 7185, 7178, 7147, 7110 14:32:15 RM 7220 - OnceUponALoop - NEW - Synchronization Failure due to RAM/Swap exhaustion 14:32:16 https://pulp.plan.io/issues/7220 14:32:30 #info ggainey has joined triage 14:32:30 !here 14:32:31 ggainey: ggainey has joined triage 14:32:41 #idea Proposed for #7220: move to rpm plugin 14:32:41 !propose other move to rpm plugin 14:32:41 fao89: Proposed for #7220: move to rpm plugin 14:32:45 I think this is the problem that we could/should fix https://pulp.plan.io/issues/7220#note-1 14:32:54 #info x9c4 has joined triage 14:32:54 !here 14:32:54 x9c4: x9c4 has joined triage 14:33:10 yeah ... the sync failed in the middle and then the user can't sync again 14:33:25 there's nothing we can do if there's not enough RAM/swap but optimally we shouldn't leave a repo in a bad state when a sync fails 14:33:30 yup 14:33:37 daviddavis: is it fixable tho? sync didn't fail so much as have its brains blown out by OOMKiller 14:33:48 yeah 14:33:59 the problem is that on repo, the next_version wasn't incremented 14:34:01 ggainey: we could do some kind of cleanup on th next sync 14:34:08 but there was a repo version with next_number 14:34:11 dkliban: ah, ok makes sense 14:34:12 #info bmbouter has joined triage 14:34:12 !here 14:34:12 bmbouter: bmbouter has joined triage 14:34:14 so when the ram/swap problem was fixed 14:34:20 he still couldn't sync 14:34:34 i'll comment on the issue 14:34:37 the repo verison shouldn't be complete 14:34:41 it wasn't 14:34:46 daviddavis: so make "create next repo-version" and "increment next_number" be atomic 14:34:46 but it still existed 14:34:48 I expecte complete=False 14:34:57 bmbouter: yea, that's correct 14:34:59 we are not doing some kind of cleanup 14:35:07 ggainey: that's one solution 14:35:11 yeah ok, sounds like there's def a path fwd 14:35:14 #info ttereshc has joined triage 14:35:14 !here 14:35:14 ttereshc: ttereshc has joined triage 14:35:18 one way or t'other 14:35:38 we create a new repo version before sync but only update the next_version on repo after sync is done 14:35:45 so this should stay in core, yeah? 14:35:50 yes I think so 14:36:03 let's accept and let dkliban comment 14:36:08 +1 14:36:13 i just left a coment 14:36:18 +1 14:36:23 and yes all this is in core 14:36:26 #idea Proposed for #7220: Leave the issue as-is, accepting its current state. 14:36:26 !propose accept 14:36:26 fao89: Proposed for #7220: Leave the issue as-is, accepting its current state. 14:36:39 https://pulp.plan.io/issues/7220#note-2 14:36:40 #agreed Leave the issue as-is, accepting its current state. 14:36:40 !accept 14:36:40 fao89: Current proposal accepted: Leave the issue as-is, accepting its current state. 14:36:41 #topic https://pulp.plan.io/issues/7212 14:36:41 fao89: 6 issues left to triage: 7212, 7205, 7185, 7178, 7147, 7110 14:36:43 RM 7212 - bmbouter - ASSIGNED - Adjust download concurrency 14:36:44 https://pulp.plan.io/issues/7212 14:36:53 we are going to discuss this at open floor 14:36:56 +1 14:36:57 yeah 14:36:59 but i think we should accept and add to sprint 14:37:01 #info mikedep333 has joined triage 14:37:01 !here 14:37:01 mikedep333: mikedep333 has joined triage 14:37:04 +1 14:37:09 to accept-and-add 14:37:26 #idea Proposed for #7212: accept and add to sprint 14:37:26 !propose other accept and add to sprint 14:37:27 fao89: Proposed for #7212: accept and add to sprint 14:37:30 +1 14:37:40 #agreed accept and add to sprint 14:37:40 !accept 14:37:40 fao89: Current proposal accepted: accept and add to sprint 14:37:41 #topic https://pulp.plan.io/issues/7205 14:37:41 fao89: 5 issues left to triage: 7205, 7185, 7178, 7147, 7110 14:37:42 RM 7205 - pc - NEW - ClientConnectorSSLError during remote sync with cdn.redhat.com 14:37:43 https://pulp.plan.io/issues/7205 14:37:59 this is all related. the user has not replied to the previous comment yet 14:38:12 skip for now? 14:38:15 +1 14:38:19 +1 14:38:20 +1 14:38:21 !skip 14:38:22 #topic https://pulp.plan.io/issues/7185 14:38:22 fao89: 4 issues left to triage: 7185, 7178, 7147, 7110 14:38:24 RM 7185 - yuzheng - NEW - force_full rsync publish is done unnecessarily in some cases 14:38:25 https://pulp.plan.io/issues/7185 14:38:38 this is a pulp2 issue 14:38:43 I believe I saw a PR for this too 14:38:43 yes 14:39:00 let's accept 14:39:06 ok 14:39:11 #idea Proposed for #7185: Leave the issue as-is, accepting its current state. 14:39:11 !propose accept 14:39:11 fao89: Proposed for #7185: Leave the issue as-is, accepting its current state. 14:39:20 #agreed Leave the issue as-is, accepting its current state. 14:39:20 !accept 14:39:20 fao89: Current proposal accepted: Leave the issue as-is, accepting its current state. 14:39:21 #topic https://pulp.plan.io/issues/7178 14:39:21 fao89: 3 issues left to triage: 7178, 7147, 7110 14:39:22 RM 7178 - ekohl - POST - Recommended installation layout 14:39:23 https://pulp.plan.io/issues/7178 14:39:50 convert to task? 14:40:31 yes 14:40:44 #idea Proposed for #7178: convert to task 14:40:44 !propose other convert to task 14:40:44 fao89: Proposed for #7178: convert to task 14:40:48 #agreed convert to task 14:40:48 !accept 14:40:48 fao89: Current proposal accepted: convert to task 14:40:49 #topic https://pulp.plan.io/issues/7147 14:40:49 fao89: 2 issues left to triage: 7147, 7110 14:40:50 RM 7147 - ttereshc - NEW - AttributeError: 'Package' object has no attribute '_remote_artifact_saver_cas' 14:40:51 https://pulp.plan.io/issues/7147 14:41:00 accept and add to sprint 14:41:06 +1 14:41:12 +1 14:41:27 +1 and thanks 14:41:27 #idea Proposed for #7147: accept and add to sprint 14:41:27 !propose other accept and add to sprint 14:41:27 fao89: Proposed for #7147: accept and add to sprint 14:41:29 #agreed accept and add to sprint 14:41:29 !accept 14:41:29 fao89: Current proposal accepted: accept and add to sprint 14:41:30 #topic https://pulp.plan.io/issues/7110 14:41:30 fao89: 1 issues left to triage: 7110 14:41:32 RM 7110 - wibbit - NEW - pulpcore-client - packaging does not state that python 2.7 is not supported 14:41:33 https://pulp.plan.io/issues/7110 14:41:42 i updated this issue yesterday 14:42:07 your proposal sounds good to me 14:42:09 we should update teh packaging for pulpcore-client and plugins so that it shows on PyPI that only python 3.6 and later is supported 14:42:11 accept and add to sprint? 14:42:14 + 14:42:15 +1 14:42:16 1 14:42:21 +1 14:42:33 +1 14:42:44 #idea Proposed for #7110: accept and add to sprint 14:42:44 !propose other accept and add to sprint 14:42:44 fao89: Proposed for #7110: accept and add to sprint 14:42:45 #agreed accept and add to sprint 14:42:45 !accept 14:42:45 fao89: Current proposal accepted: accept and add to sprint 14:42:47 fao89: No issues to triage. 14:43:01 Open floor 14:43:13 https://hackmd.io/@pulp/triage/edit 14:43:18 https://hackmd.io/hIOjFsFiSkGJR7VqtAJ8eQ 14:43:22 sorry wrong window 14:43:28 CI status check 14:43:40 ooo lots of ci failures 14:43:56 pulp_rpm, pulp_container, pulp-openapi-generator, pulp-operator are failing 14:44:02 ow 14:44:09 pulp-rpm - ppicka and i are working on resolving that one 14:44:15 slowly, but we are working on it 14:44:18 rpm - cdn test, container - cherrypick 14:44:23 +1 14:44:47 openapi-generator and operator already have PRs 14:44:58 who can look into pulp-container? 14:45:04 x9c4: could you? 14:45:17 I can tomorrow 14:45:19 we may want to disable cherry-picking there for now 14:45:22 sounds good x9c4 14:45:30 x9c4++ 14:45:30 daviddavis: x9c4's karma is now 57 14:46:00 ok, so I think we're good 14:46:26 topic: Should we lower the download concurrency level? 14:46:28 pulp_container is cherrypick failed. 14:46:43 https://hackmd.io/@ggainey/pulp3_sync_concurrency 14:46:58 we discussed this a bit at the pulpcore team meeting 14:47:17 I think the top candidates were 10 and 5 14:47:19 bmbouter: you were hesitant to lower it to 5, what about to 10? 14:47:30 ggainey++ 14:47:30 fao89: ggainey's karma is now 38 14:47:38 I would be comfortable with 10 14:47:39 nice report! 14:47:43 thanks :) 14:47:50 so I think https://pulp.plan.io/issues/7186#note-4 motivates we should do something 14:47:53 bmbouter++ 14:47:53 daviddavis: bmbouter's karma is now 281 14:47:56 err 14:47:59 if we lower it and lower it again later that's not the worst thing in the world 14:48:02 ggainey++ 14:48:02 daviddavis: ggainey's karma is now 39 14:48:04 lol thanks daviddavis!!! 14:48:06 haha 14:48:11 hehe 14:48:29 yea, it sounds like 10 might work at least in https://pulp.plan.io/issues/7186#note-2 14:48:50 granted it's the default we're discussing, so remotes created with an old default will keep it 14:48:54 yeah ... let's go with 10 ... and if katello wants to go with lower defaults, they can do that also 14:48:56 yeah, 10 cuts the demand in half for only a ~30% hit (more or less) 14:49:07 +1 to 10 14:49:15 and do we want to provide a migration? 14:49:18 30% in exchange for reliability is a good trade 14:49:18 kk - can we add this issue to sprint 78, I'll take it 14:49:22 to lower 20 to 10? 14:49:29 I think a migration would be nice 14:49:30 in existing remotes 14:49:30 dkliban: ooo, good point 14:49:35 hm 14:49:41 so, fwiw I think we should also have an upper limit 14:49:47 making is parametrizable is not an option? 14:49:57 s/is/it 14:50:03 I like the idea of a migration, but somewhat hesitant to change params out from under the admin 14:50:12 fao89: right now users can choose a number, but we provide 20 as the default value if they dont' provide one 14:50:25 ggainey: I conquer 14:50:25 fao89: or are you asking about the migration? 14:50:25 also practically speaking it would be a migration for each plugin right? 14:50:26 i am not sure a migration is needed. if user would want he would update this on his own, thoughts? 14:50:40 ipanova: I conquer 14:50:42 we could migrate only if the user didn't change it from the default of 20? 14:50:46 also i do not think we should have an upper limit 14:50:55 s/conquer/concur/ 14:51:13 haha 14:51:26 ipanova: yeah, the more I think about it the less I want to change things w/a migration, since update is an option for the user 14:51:28 I agree no upper limit, if users want to ddos their are ample tools they can use Pulp isn't making the world less safe by restricting our users 14:51:31 let's not do a migration and simply provide release notes that advise users to change on their own 14:51:42 that sounds like a great plan 14:51:46 yep 14:51:48 dalley: we're assuming '20' means 'user didn't know what they were doing and took the default' 14:51:52 which may not be the case 14:51:57 dkliban: +1 14:52:06 ok but let's see how this plays out with katello w/o a migration 14:52:42 they will all have remotes that are 20 then 14:52:42 one option: they could write their own migration 14:52:48 they could or we could provide one 14:52:53 yup 14:53:09 I hear not changing it from underneath admins, but it would only be in cases where admins didn't set their expectations to start with 14:53:18 my concern is the practical challenge of shipping a migration only 14:53:21 is there an easy way fro users to adjust it in bulk? 14:53:38 there isn't, the PUT/PATCH must be one by one :/ 14:53:43 pulpcore-manager -c ... 14:53:50 I think you can only change it one remote at a time (unless you go into the db and do it via SQL UPDATE directly) 14:54:20 that seems to be a painful user experience, maybe we can provide a script example or something 14:54:26 I think ggainey report should be turned into a blog post, and we refer to it on the docs 14:54:34 +1 14:54:39 +1 14:54:47 +1 to blog post, can we have a migration instead of additional docs/scripts 14:55:16 this would be a migration in pulpcore, yeah? (since download_concurrency is held in core_remote) 14:55:17 migration that only changes 20 to 10? 14:55:25 yes only 20 to 10 14:55:35 if users set other than 20, they've expressed intent and we should not adjust 14:55:40 concur 14:55:45 conquer 14:55:50 concur even 14:55:52 lol 14:55:55 heh 14:56:00 i am not opposed 14:56:31 the tricky part to me is I think we'll have to ship one for each plugin? 14:56:37 or could we make a fancy pulpcore one... 14:56:52 the field lives in pulpcore so we don't need one for every plugin 14:56:57 it's on the master model 14:57:02 oh that doesn't end up on the detail table 14:57:09 yup 14:57:13 excellentttt 14:57:20 having concurrency set to 20 does not 100% mean sync will fail, that's why i think we should not provide any migration and let the user change on this own when he sees any sync failures. 14:57:33 I agree 14:57:35 yupyup 14:57:37 ipanova: true but with that thinking we can leave it at 20 14:57:39 it's like a one line update: pulpcore-manager shell_plus -c "Remote.objects.all().filter(download_concurrency=20).update(download_concurrency=10)" 14:57:51 as in why change it at all 14:58:14 bmbouter: we are changing to default of 10 for the best practices 14:58:30 I can't see how it's right to change for the future but not existing objects 14:59:05 because changing it for the future won't interfere with data that users may explicitly want 14:59:17 but they by definition don't explicitly want it 14:59:28 we don't know that for sure 14:59:32 well, for future objects we *know* "user took defaults" - for current objects, we're guessing that 20 means "didn't set a concurrency value" 14:59:44 yeah 14:59:49 bmbouter: i was just highlighting reasons of not doing a migration if we find this painful 14:59:53 i think it's pretty safe to change 20 to 10 for existing objects 15:00:12 I think it's pretty safe but I don't think we should 15:00:16 I don't have astrong opinion either way RE migration, but I *do* think 20 as default is...a bit greedy of us 15:00:26 since user took no action. however, we shouhld once again provide docs to say - here is how you can change it back to 20 15:00:51 +1 15:01:00 maybe I should disagree and commit but I don't understand the motivation for leaving it at a value we identified as unsafe 15:01:19 we don't have to do it my way we should do it the way we all feel is best 15:01:23 it's not always unsafe. I've used 20 numerous times without problems. 15:01:27 it's not necessarily unsafe, but it is greedy/excessive and can cause problems in some cases 15:01:31 yeah 15:01:42 then maybe we should leave the default alone? 15:01:57 no, I think 10 is a better default 15:01:59 it's higher performing by 30% 15:02:09 if you all don't mind, I'm going to ask the bandersnatch dev cooperlees, who works in networking at facebook, if he has any input on # of max connections 15:02:34 no, I think 20 is not a great default. the only problem I have w/a migration, is we do not *know* that a user w/a concurrency of 20 "just took the default" - that might be exactly what they want. A migration is guessing 15:02:47 dalley: +1 15:02:49 dalley: that's good but that's just the server side perspective, we need to balance that against the client side throughput experience 15:03:21 from the server side perspective the number will be 1, 2, or 3 15:04:17 true 15:04:52 ok let me ask you all, can we take a quick poll on the yes/no migration 15:05:05 sure, yea 15:05:17 what about changing to 10 only, no migration. in the release notes/docs write " the default has changed, if you experience issues with sync then downgrade to 10 on existing remote" 15:05:28 bmbouter: fine with me 15:05:34 ok, my US$0.02 here is, 20 is excessive for a default from us, 10 is more reasonable, a migration would be useful except where it isn't desired and we can't really know the difference a-priori - but I don't feel strongly about the migration-or-not 15:05:43 or even provide the command I wrote above 15:05:51 ok so let's poll on the default change 15:05:53 +1 to ipanova suggestion 15:05:57 +1 15:06:00 I am in favor of 10 as the default 15:06:08 me too 15:06:09 the release note could link to the blog post 15:06:31 ipanova: logically that doesn't make sense to me from a reasoning perspective we could just leave it at 10 and just put that line in our docs 15:06:41 leave it at 20 15:06:43 tbh, I still think we should have a conservative default and have individual plugins adjust the # higher if their remote backend can handle it 15:07:11 bmbouter: we're changing the default for the folks who never read the docs 15:07:19 dalley: that would 300% slow down pulp 15:07:37 only if plugins didn't adjust things 15:07:42 +1 to 10 + good docs , I'm also +1 to a migration because the majority are not consciously using that concurrency level 15:07:42 yes, some of those already have remotes, at 20, that they won't change unless/until they have aproblem and ask or search out this issue (in which case it's good to have anote in the docs) 15:07:43 anyway, 10 is OK with me if it's the consensus 15:07:47 +0 15:08:13 +1 to migration with the change to 10 15:08:31 +1 with docs (no migration) and change to 10 15:08:48 +1 to migraiton w/ change to 10 15:08:51 +1 to 10 and no migration 15:09:09 ha :) someone needs to count votes :) 15:09:40 I'm putting the items at hackmd, so you can put +1 there 15:09:42 agreed it would be great if we could reach an agreement (whatever that is) 15:09:45 ahh great 15:10:17 but I don't remember all the options 15:10:23 ipanova fa089 ggainey ddavis ttereshc 10-default no migration, dkliban bmbouter 10-default migration, dalley more-conservative default no opinion on migration 15:10:40 I thin kthat's what we all said, but it's been a busy discussion :) 15:11:01 did I miss anyone? 15:12:07 I think from the user experience perspective, it's better if they reach out or see in the docs how to potentially improve performance by experimenting with concurrency levels, rather than they run into an error and search for a solution. 15:12:31 this is my reason to +1 the migraiton 15:12:59 I'm still unsure about the precedent we set by changing a users data without their consent 15:13:08 that seems like a bad practice 15:13:09 there is another side of the coin from user prespective my sync for this remote was ok and was taking 5 mins and now it is taking 10 mins? 15:13:14 ttereshc: good thing we've moved to the agenda-doc, I see I misread your vote, sorry about that 15:13:14 ttereshc: ^ 15:14:02 ipanova, it could happen anyway with some features or adjustments we make 15:14:22 this is the case when they can learn about the concurrency levels and adjust if needed 15:14:36 + they will be warned what kind of issues they can run into 15:15:05 daviddavis, I hear you on the adjusting user data 15:15:29 not sure how to gauge if it's worth it or not, it's subjective 15:15:46 I believe we all agreed with the blog post, so I think we can do it for now, and discuss the migration(or not) in the future 15:16:04 that sounds good to me 15:16:06 fao89: I'd like to have that flattened so we can actually address the issue 15:16:07 we don't have to call the decision now, but I'd like to have a plan before we make an adjustment 15:16:10 same 15:16:24 also, I want to discuss more with a bigger audience like pulp-dev 15:16:33 we have 4 'migrate', 3 'not migrate 15:16:41 daviddavis: +1 on that 15:16:50 is this worth the amount of time we're spending here? 15:16:58 maybe we will get more insights 15:17:07 let's send a note to pulp-dev after this meeting 15:17:14 and come back to it on friday 15:17:17 +1 I can do it 15:17:21 daviddavis++ 15:17:21 dkliban: daviddavis's karma is now 340 15:17:29 sounds good to me 15:17:47 next topic? 15:17:49 ggainey: not sure about this, but in the end of the day it is important that everyone got heard 15:17:57 sure 15:18:32 ggainey: very often we spend more time on discussing something then just going ahead and doing it 15:18:51 fao89: yes please! 15:19:07 topic: Feedback on OpenAPI v3 - is everything working? 15:19:25 i haven't seen any problems yet, still waiting on katello to provide feedback 15:19:44 jsherril mentioned some failing tests 15:19:52 but he had not had a chance to look into it 15:19:55 it is strange not seeing bugs reported so far 15:20:15 give it a little more time 15:20:23 exactly 15:20:26 and i am sure after 3.6.0 is released we will get some 15:20:33 aye 15:20:55 ok, moving to the next topic 15:21:07 topic: Python bindings docs - host on read the docs or pulpproject.org? 15:21:24 I have a PR open, with a preview: https://github.com/pulp/pulpcore/pull/808 15:21:36 yes, i saw these PRs 15:21:43 for both pulpcore and pulp_file 15:21:46 oh yeah me too 15:21:52 I'm very excited about these docs 15:21:57 the right link: https://github.com/pulp/pulpcore/pull/805 15:21:58 we had previously discussed publishing to pulpproject.org instead of read the docs 15:22:10 well to version control 15:22:12 and i had an action item to find out what it would take 15:22:20 https://pulpcore-client.readthedocs.io/en/latest/ 15:22:48 I have a 3rd proposal as well. 15:23:01 besides read the docs and pulpproject.org 15:23:01 the biggest difference between the two approaches is that one requires comitting these docs to git (read the docs) and the other doesnt (pulpproject.org) 15:23:17 daviddavis: what is that option?? 15:23:39