14:32:19 #startmeeting Pulp Triage 2020-07-14 14:32:19 !start 14:32:19 #info fao89 has joined triage 14:32:19 Meeting started Tue Jul 14 14:32:19 2020 UTC. The chair is fao89. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:32:19 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:32:20 The meeting name has been set to 'pulp_triage_2020-07-14' 14:32:20 fao89: fao89 has joined triage 14:32:36 #info ppicka has joined triage 14:32:36 !here 14:32:36 ppicka: ppicka has joined triage 14:32:47 #info ggainey has joined triage 14:32:47 !here 14:32:47 ggainey: ggainey has joined triage 14:33:01 #info dkliban has joined triage 14:33:01 !here 14:33:01 dkliban: dkliban has joined triage 14:33:05 !next 14:33:06 #topic https://pulp.plan.io/issues/7127 14:33:06 fao89: 2 issues left to triage: 7127, 7110 14:33:07 RM 7127 - newswangerd - NEW - Add object labels 14:33:08 https://pulp.plan.io/issues/7127 14:33:18 #idea Proposed for #7127: convert to story 14:33:18 !propose other convert to story 14:33:18 dkliban: Proposed for #7127: convert to story 14:33:24 #info daviddavis has joined triage 14:33:24 !here 14:33:24 daviddavis: daviddavis has joined triage 14:33:27 #info ttereshc has joined triage 14:33:27 !here 14:33:27 ttereshc: ttereshc has joined triage 14:33:35 +1 to story 14:33:41 #agreed convert to story 14:33:41 !accept 14:33:41 fao89: Current proposal accepted: convert to story 14:33:42 #topic https://pulp.plan.io/issues/7110 14:33:42 fao89: 1 issues left to triage: 7110 14:33:43 RM 7110 - wibbit - NEW - pulpcore-client - Artifact upload fails when using python27 14:33:44 https://pulp.plan.io/issues/7110 14:33:54 i was supposed to comment on this one and close it 14:33:57 i'll do that now 14:34:02 +1 14:34:08 #idea Proposed for #7110: dkliban to comment and close as WONT FIX 14:34:08 !propose other dkliban to comment and close as WONT FIX 14:34:08 dkliban: Proposed for #7110: dkliban to comment and close as WONT FIX 14:34:13 #agreed dkliban to comment and close as WONT FIX 14:34:13 !accept 14:34:13 fao89: Current proposal accepted: dkliban to comment and close as WONT FIX 14:34:14 thanks 14:34:15 fao89: No issues to triage. 14:34:18 so we're not supporting py27 for bindings? 14:34:22 Open floor! 14:34:24 that's right 14:34:34 #info bmbouter has joined triage 14:34:34 !here 14:34:34 bmbouter: bmbouter has joined triage 14:34:36 +1 14:34:41 https://hackmd.io/SVCMjpwXTfOMqF2OeyyLRw 14:34:59 topic: zero down time upgrades - https://pulp.plan.io/issues/7120 14:35:12 we talked about this last time, I wrote the issue up like we discussed 14:35:15 so here's it is 14:35:45 #info x9c4 has joined triage 14:35:45 !here 14:35:45 x9c4: x9c4 has joined triage 14:36:28 bmbouter, is stopping some services acceptable? 14:37:38 ttereshc, you mean stop the workers for the upgrade? 14:38:01 ttereshc: the goal is to not stop any services until it's time to reload the code for them 14:38:11 so this could occur one server at a time 14:38:39 and the goal is to have old code be compatible with the latest db 14:38:48 yeah, I understand 14:38:49 old code = previous release 14:39:14 ttereshc: we want to stop no services except to restart them one by one in a clustered install 14:39:53 what about existing migraitons? which are not split into additive and destructive changes 14:40:14 we will support zero time migrations after certain release? 14:40:22 yes after a certain release 14:40:39 how far does "old code" reach? 14:40:41 whenever the CI is in place to check for them and the docs state the policy requirement 14:41:05 old code? like existing migrations 14:41:07 ? 14:41:30 no 14:41:34 no can you upgrade from x.y.* to x.(y+1).0 ? 14:42:05 the no downtime migration can only apply to x.y.z to z.y.z+1 14:42:19 actually ... 14:42:21 I thought it could go all the way 14:42:24 x.y to x.y+1 14:42:43 oh yeah that's right 14:42:46 dkliban: you right 14:43:17 so users will have to go through each y version to safely upgrade with no downtime 14:43:22 the migraitons would apply properly in all cases, it's just that the running application servers would break unexpectedly 14:43:32 yep 14:43:43 So we must prevent this. 14:43:54 do we? 14:44:14 x9c4: would that be an installer feature? 14:44:56 bmbouter, is it suggested to use this project https://pypi.org/project/django-pg-zero-downtime-migrations/ or is it only for understanding how things can be done? 14:44:59 like preflight check? I thought maybe the migration itself rejects to apply if there is a resource using the db that is too old. 14:45:18 ttereshc: it's debatable if we should actually use it... 14:45:35 ttereshc: it primarily provides row-lock features at upgrade time, I'm not sure pulp actually needs those 14:45:57 x9c4: I don't think pulp migrations can be that smart 14:46:09 but the pre-flight maybe could be 14:46:21 yeah, the first look at it doesn't look promising, it's also not maintaned much and in beta status. I'm just trying to understand what is proposed, to read more later about it 14:46:33 I'm not proposing to use that project 14:46:36 I think, we could hook into to pulpcore migrate script and do some sanity checks. 14:46:51 I am propsing our CI use the regex scripts in that article though 14:47:07 i am all for CI checks in this area 14:47:17 which look for single-operation migrations that are zero-downtime-unsafe 14:48:01 Another thought. If we did the remove part of the migration only on x.y.1, we could allow upgrades from any z-stream to the next y-Release. 14:48:08 x9c4: these checks in practice I think will be difficult... the db is one thing, application versions on a cluster are plurality 14:48:19 x9c4: agreed! 14:49:28 And more importantly, you would not need to go through all z's for a bugfix. 14:49:34 yeah I think that's a must 14:50:00 yep 14:50:15 you can than safely specify the version as ~=x.y.0 14:51:46 about checking the running application parts. We have some info about running components in the status api. 14:51:58 yep 14:52:13 the installer could definitely use that 14:52:32 And what is currently not running will then refuse to start if the migration level is not met. 14:53:25 x9c4: that sounds do-able actually but it doesn't prevent the concerning case from earlier with an already running app 14:53:32 As in "If it's not currently running, it's not a subject to zero downtime anyway." 14:54:25 agreeeeed 14:54:27 if the already running app is reflected in the status api, it prevents to migration from running. 14:54:46 x9c4: in what cases would we not do the upgrade? 14:54:48 the issue is the status API only shows you what's on "that specific server" not the entire cluster 14:54:50 Until it is upgraded and restarted 14:55:08 I see. 14:55:14 bmbouter: we need to fix that 14:55:16 architecturally we just don't have the info 14:55:37 I agree but the "http workers" are not represented in their but they probably should be.... 14:55:42 report on the installed profiles 14:56:05 so here's my take, there is a *lot* we are discussing here, what can we do to get to simple zero-downtime upgrades for the common case? 14:56:09 like the installer precheck 14:56:49 we need to add checks to the CI, we need to write up docs, 14:57:06 we need to add the installer check 14:57:22 those are the 3 things we need at a minimum 14:58:37 I would call the installer check an additional safety net. When we state in the docs, that you need to have certain version installed on all nodes in the cluster. 14:59:04 hey, we've spent 25 min discussing this topic and it seems like only a handful of people are involved 14:59:13 one step before the migrations is deciding whether upgrade pulpcore, as each plugin has different release dates 15:00:37 daviddavis: what do you suggest as the next step? 15:01:11 this topic is somewhat related with upgrade tests 15:01:28 i agree 15:01:44 daviddavis: I'm ok w/ not everyone being involved as long as everyone is willing to accept the change 15:02:00 that last part is what I'm here to learn about actually 15:02:20 is the goal to accept/decide now? 15:02:33 * ttereshc would like to read about it 15:02:38 so I think we need to discuss it when Ina is back 15:02:47 yup socializing the idea slowly is the approach not to decide now 15:03:04 sweet 15:03:06 not sure if open floor, mailing list or specific meeting 15:03:29 we can advertise on the mailing list if you like and then followup at open floor on friday after ina returns? 15:03:53 ina is not back until next week 15:04:08 yup I would even put on friday of next week to give her more time to catch up 15:04:30 this doesn't need to happen fast it just needs a decision in the next few weeks to unblock the operator's roadmap 15:04:37 sounds good to me 15:04:45 +1 to advertise on a mailing list to give more people heads up on reading and prepping for a discussion 15:04:53 +1 15:05:13 I can take the AI to send to the list and revise the issue a bit from this convo 15:05:22 bmbouter: that would be great 15:05:27 +1 15:06:03 will do 15:06:05 thanks 15:06:07 next topic? 15:06:18 yes 15:06:22 topic: do we use SessionAuthentication? https://github.com/pulp/pulpcore/blob/master/pulpcore/app/settings.py#L132 15:06:34 it is the last barrier for openapi v3 15:06:41 I think we do :/ 15:06:58 there is a bug in the ruby client realted to cookie auth 15:07:04 however, it can be resolved. 15:07:12 how is it used? 15:07:13 bmbouter: where do we provide a URL to starta sessions? 15:07:16 galaxy_ng I believe uses it, also I believe django-admin would break if we removed it django-admin logs in once and then doesn't 15:07:20 django-admin login views 15:07:25 galaxy_ng login views 15:07:25 ah cool 15:07:30 that makes sense 15:07:37 ah 15:07:37 * bmbouter eats a cookie 15:07:45 we are going to get a fix this in the openapi-generator 15:07:57 can we hide it from the schema-generator? 15:08:06 the problem now is: openapi generator has a bug for generating cookieauth at ruby bindings 15:08:07 x9c4: not without a hack 15:08:27 no one uses ruby 15:08:34 jk 15:08:43 ruby tuesday? 15:08:46 can we fix the bug? 15:08:55 fao89: we should open up a PR against openapi-generator adn in the in the meantime we should use a custom template in pulp-opneapi-geneator 15:09:01 daviddavis: yes 15:09:07 the bug: https://paste.centos.org/view/5fdddab4 15:09:27 line 14, openapi generator does not fill it 15:09:28 it's a bug in the template for the ruby configuration object 15:09:34 ah 15:09:57 +1 to dkliban's suggestion 15:10:02 concur 15:10:03 fao89: gerrod is already working on adding a custom template for the configuration.rb and configuration.py so we will be able to fix this ourselves 15:10:35 the template that gerrod is adding is going to address https://pulp.plan.io/issues/5932 15:10:41 +1 to fixing bugs. 15:10:56 and in that template we will address this bug 15:11:00 also 15:11:09 sounds good to me 15:11:12 +1 15:11:24 awesome 15:11:35 with this, I say the OpenAPI v3 PRs are ready for review 15:11:44 * ggainey cheers wildly 15:11:48 https://github.com/pulp/pulpcore/pull/785 15:11:51 fao89: i'll take a look soon 15:13:03 * x9c4 dancing 15:13:11 next topic: call for more input: API designs for RBAC 15:13:18 I just emailed this out 15:13:57 I am thinking It would just be more usable to have the sub-resource named 'access_policy' that would be /pulp/api/v3/remotes/file/file/access_policy/ for example 15:14:30 we could *also* have a /pulp/api/v3/access_policy/ so that users could see all policies in one place as a helper for audit purposes 15:14:35 this is kind of a hybrid pitch 15:14:48 that second proposal seems more usable to me 15:15:13 even if it's just a list 15:16:36 it's usable great for read purposes 15:16:42 and maybe even write purposes ... 15:16:52 I think I'm stuck on the "url" field that would be in those models 15:17:00 model isntances 15:17:20 is there ever a time when a pulp system has it's API prefix modified? 15:17:45 hmm 15:17:48 i don't think so 15:17:49 could we provide some strong validation to avoid mistakes and for users to have a list of urls to pick from? hten option 2 will be fine I think 15:18:30 ttereshc: well that's the other issue I don't think we can do that 15:19:06 well it could be possible if plugin writers provided all of them explicitly and pulpcore aggregated them ... 15:19:14 so yes, yes we could do that :) 15:19:18 :) 15:19:44 why we can't get all the urls for viewsets? is it that we can get only full endpoints? 15:20:03 from the router or from some kind of introspection 15:20:06 maybe we can 15:20:30 plugins will need to ship a default one though so they coudln't do that unless they allow_list that url in the CHOICES field 15:20:44 a default access policy, and create the accespolicy instance via a data migration 15:21:11 part of the issue of auto-gathering them is that not all urls may be access guarded, e.g. /pulp/api/v3/status isn't? 15:21:15 but maybe it should be? 15:21:44 this gives me enough to work with I think though. go with the all access_policy instances at one API, and have a solid CHOICES field 15:22:27 i gotta go 15:22:35 bmbouter, on the other hand, if we accept only patch and no post, users can't create wrong/bad urls 15:22:41 oops put 15:22:50 true! I hadn't thought about that 15:23:21 alright I'm making the single-viewset option because it's easier to make, we know it's useful, and we can add more to it later 15:23:22 so maybe option 2 is good as is 15:23:23 sound ok? 15:23:42 yes, sounds great 15:30:20 #endmeeting 15:30:20 !end