14:32:19 <fao89> #startmeeting Pulp Triage 2020-07-14
14:32:19 <fao89> !start
14:32:19 <fao89> #info fao89 has joined triage
14:32:19 <pulpbot> Meeting started Tue Jul 14 14:32:19 2020 UTC.  The chair is fao89. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:32:19 <pulpbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:32:20 <pulpbot> The meeting name has been set to 'pulp_triage_2020-07-14'
14:32:20 <pulpbot> fao89: fao89 has joined triage
14:32:36 <ppicka> #info ppicka has joined triage
14:32:36 <ppicka> !here
14:32:36 <pulpbot> ppicka: ppicka has joined triage
14:32:47 <ggainey> #info ggainey has joined triage
14:32:47 <ggainey> !here
14:32:47 <pulpbot> ggainey: ggainey has joined triage
14:33:01 <dkliban> #info dkliban has joined triage
14:33:01 <dkliban> !here
14:33:01 <pulpbot> dkliban: dkliban has joined triage
14:33:05 <fao89> !next
14:33:06 <fao89> #topic https://pulp.plan.io/issues/7127
14:33:06 <pulpbot> fao89: 2 issues left to triage: 7127, 7110
14:33:07 <pulpbot> RM 7127 - newswangerd - NEW - Add object labels
14:33:08 <pulpbot> https://pulp.plan.io/issues/7127
14:33:18 <dkliban> #idea Proposed for #7127: convert to story
14:33:18 <dkliban> !propose other convert to story
14:33:18 <pulpbot> dkliban: Proposed for #7127: convert to story
14:33:24 <daviddavis> #info daviddavis has joined triage
14:33:24 <daviddavis> !here
14:33:24 <pulpbot> daviddavis: daviddavis has joined triage
14:33:27 <ttereshc> #info ttereshc has joined triage
14:33:27 <ttereshc> !here
14:33:27 <pulpbot> ttereshc: ttereshc has joined triage
14:33:35 <daviddavis> +1 to story
14:33:41 <fao89> #agreed convert to story
14:33:41 <fao89> !accept
14:33:41 <pulpbot> fao89: Current proposal accepted: convert to story
14:33:42 <fao89> #topic https://pulp.plan.io/issues/7110
14:33:42 <pulpbot> fao89: 1 issues left to triage: 7110
14:33:43 <pulpbot> RM 7110 - wibbit - NEW - pulpcore-client - Artifact upload fails when using python27
14:33:44 <pulpbot> https://pulp.plan.io/issues/7110
14:33:54 <dkliban> i was supposed to comment on this one and close it
14:33:57 <dkliban> i'll do that now
14:34:02 <daviddavis> +1
14:34:08 <dkliban> #idea Proposed for #7110: dkliban to comment and close as WONT FIX
14:34:08 <dkliban> !propose other dkliban to comment and close as WONT FIX
14:34:08 <pulpbot> dkliban: Proposed for #7110: dkliban to comment and close as WONT FIX
14:34:13 <fao89> #agreed dkliban to comment and close as WONT FIX
14:34:13 <fao89> !accept
14:34:13 <pulpbot> fao89: Current proposal accepted: dkliban to comment and close as WONT FIX
14:34:14 <ttereshc> thanks
14:34:15 <pulpbot> fao89: No issues to triage.
14:34:18 <ppicka> so we're not supporting py27 for bindings?
14:34:22 <fao89> Open floor!
14:34:24 <dkliban> that's right
14:34:34 <bmbouter> #info bmbouter has joined triage
14:34:34 <bmbouter> !here
14:34:34 <pulpbot> bmbouter: bmbouter has joined triage
14:34:36 <ppicka> +1
14:34:41 <fao89> https://hackmd.io/SVCMjpwXTfOMqF2OeyyLRw
14:34:59 <fao89> topic: zero down time upgrades - https://pulp.plan.io/issues/7120
14:35:12 <bmbouter> we talked about this last time, I wrote the issue up like we discussed
14:35:15 <bmbouter> so here's it is
14:35:45 <x9c4> #info x9c4 has joined triage
14:35:45 <x9c4> !here
14:35:45 <pulpbot> x9c4: x9c4 has joined triage
14:36:28 <ttereshc> bmbouter, is stopping some services acceptable?
14:37:38 <x9c4> ttereshc, you mean stop the workers for the upgrade?
14:38:01 <dkliban> ttereshc: the goal is to not stop any services until it's time to reload the code for them
14:38:11 <dkliban> so this could occur one server at a time
14:38:39 <dkliban> and the goal is to have old code be compatible with the latest db
14:38:48 <ttereshc> yeah, I understand
14:38:49 <dkliban> old code = previous release
14:39:14 <bmbouter> ttereshc: we want to stop no services except to restart them one by one in a clustered install
14:39:53 <ttereshc> what about existing migraitons? which are not split into additive and destructive changes
14:40:14 <ttereshc> we will support zero time migrations after certain release?
14:40:22 <bmbouter> yes after a certain release
14:40:39 <x9c4> how far does "old code" reach?
14:40:41 <bmbouter> whenever the CI is in place to check for them and the docs state the policy requirement
14:41:05 <bmbouter> old code? like existing migrations
14:41:07 <bmbouter> ?
14:41:30 <dkliban> no
14:41:34 <x9c4> no can you upgrade from x.y.* to x.(y+1).0 ?
14:42:05 <dkliban> the no downtime migration can only apply to x.y.z to z.y.z+1
14:42:19 <dkliban> actually ...
14:42:21 <bmbouter> I thought it could go all the way
14:42:24 <dkliban> x.y to x.y+1
14:42:43 <bmbouter> oh yeah that's right
14:42:46 <bmbouter> dkliban: you right
14:43:17 <dkliban> so users will have to go through each y version to safely upgrade with no downtime
14:43:22 <bmbouter> the migraitons would apply properly in all cases, it's just that the running application servers would break unexpectedly
14:43:32 <dkliban> yep
14:43:43 <x9c4> So we must prevent this.
14:43:54 <bmbouter> do we?
14:44:14 <bmbouter> x9c4: would that be an installer feature?
14:44:56 <ttereshc> bmbouter, is it suggested to use this project https://pypi.org/project/django-pg-zero-downtime-migrations/  or is it only for understanding how things can be done?
14:44:59 <x9c4> like  preflight check? I thought maybe the migration itself rejects to apply if there is a resource using the db that is too old.
14:45:18 <bmbouter> ttereshc: it's debatable if we should actually use it...
14:45:35 <bmbouter> ttereshc: it primarily provides row-lock features at upgrade time, I'm not sure pulp actually needs those
14:45:57 <bmbouter> x9c4: I don't think pulp migrations can be that smart
14:46:09 <bmbouter> but the pre-flight maybe could be
14:46:21 <ttereshc> yeah, the first look at it doesn't look promising, it's also not maintaned much and in beta status. I'm just trying to understand what is proposed, to read more later about it
14:46:33 <bmbouter> I'm not proposing to use that project
14:46:36 <x9c4> I think, we could hook into to pulpcore migrate script and do some sanity checks.
14:46:51 <bmbouter> I am propsing our CI use the regex scripts in that article though
14:47:07 <dkliban> i am all for CI checks in this area
14:47:17 <bmbouter> which look for single-operation migrations that are zero-downtime-unsafe
14:48:01 <x9c4> Another thought. If we did the remove part of the migration only on x.y.1, we could allow upgrades from any z-stream to the next y-Release.
14:48:08 <bmbouter> x9c4: these checks in practice I think will be difficult... the db is one thing, application versions on a cluster are plurality
14:48:19 <bmbouter> x9c4: agreed!
14:49:28 <x9c4> And more importantly, you would not need to go through all z's for a bugfix.
14:49:34 <bmbouter> yeah I think that's a must
14:50:00 <dkliban> yep
14:50:15 <x9c4> you can than safely specify the version as ~=x.y.0
14:51:46 <x9c4> about checking the running application parts. We have some info about running components in the status api.
14:51:58 <dkliban> yep
14:52:13 <dkliban> the installer could definitely use that
14:52:32 <x9c4> And what is currently not running will then refuse to start if the migration level is not met.
14:53:25 <bmbouter> x9c4: that sounds do-able actually but it doesn't prevent the concerning case from earlier with an already running app
14:53:32 <x9c4> As in "If it's not currently running, it's not a subject to zero downtime anyway."
14:54:25 <bmbouter> agreeeeed
14:54:27 <x9c4> if the already running app is reflected in the status api, it prevents to migration from running.
14:54:46 <dkliban> x9c4: in what cases would we not do the upgrade?
14:54:48 <bmbouter> the issue is the status API only shows you what's on "that specific server" not the entire cluster
14:54:50 <x9c4> Until it is upgraded and restarted
14:55:08 <x9c4> I see.
14:55:14 <dkliban> bmbouter: we need to fix that
14:55:16 <bmbouter> architecturally we just don't have the info
14:55:37 <bmbouter> I agree but the "http workers" are not represented in their but they probably should be....
14:55:42 <bmbouter> report on the installed profiles
14:56:05 <bmbouter> so here's my take, there is a *lot* we are discussing here, what can we do to get to simple zero-downtime upgrades for the common case?
14:56:09 <bmbouter> like the installer precheck
14:56:49 <dkliban> we need to add checks to the CI, we need to write up docs,
14:57:06 <dkliban> we need to add the installer check
14:57:22 <dkliban> those are the 3 things we need at a minimum
14:58:37 <x9c4> I would call the installer check an additional safety net. When we state in the docs, that you need to have certain version installed on all nodes in the cluster.
14:59:04 <daviddavis> hey, we've spent 25 min discussing this topic and it seems like only a handful of people are involved
14:59:13 <fao89> one step before the migrations is deciding whether upgrade pulpcore, as each plugin has different release dates
15:00:37 <dkliban> daviddavis: what do you suggest as the next step?
15:01:11 <fao89> this topic is somewhat related with upgrade tests
15:01:28 <dkliban> i agree
15:01:44 <bmbouter> daviddavis: I'm ok w/ not everyone being involved as long as everyone is willing to accept the change
15:02:00 <bmbouter> that last part is what I'm here to learn about actually
15:02:20 <ttereshc> is the goal to accept/decide now?
15:02:33 * ttereshc would like to read about it
15:02:38 <fao89> so I think we need to discuss it when Ina is back
15:02:47 <bmbouter> yup socializing the idea slowly is the approach not to decide now
15:03:04 <ttereshc> sweet
15:03:06 <fao89> not sure if open floor, mailing list or specific meeting
15:03:29 <bmbouter> we can advertise on the mailing list if you like and then followup at open floor on friday after ina returns?
15:03:53 <daviddavis> ina is not back until next week
15:04:08 <bmbouter> yup I would even put on friday of next week to give her more time to catch up
15:04:30 <bmbouter> this doesn't need to happen fast it just needs a decision in the next few weeks to unblock the operator's roadmap
15:04:37 <fao89> sounds good to me
15:04:45 <ttereshc> +1 to advertise on a mailing list to give more people heads up on reading and prepping for a discussion
15:04:53 <ggainey> +1
15:05:13 <bmbouter> I can take the AI to send to the list and revise the issue a bit from this convo
15:05:22 <dkliban> bmbouter: that would be great
15:05:27 <x9c4> +1
15:06:03 <bmbouter> will do
15:06:05 <ttereshc> thanks
15:06:07 <bmbouter> next topic?
15:06:18 <dkliban> yes
15:06:22 <fao89> topic: do we use SessionAuthentication? https://github.com/pulp/pulpcore/blob/master/pulpcore/app/settings.py#L132
15:06:34 <fao89> it is the last barrier for openapi v3
15:06:41 <bmbouter> I think we do :/
15:06:58 <dkliban> there is a bug in the ruby client realted to cookie auth
15:07:04 <dkliban> however, it can be resolved.
15:07:12 <daviddavis> how is it used?
15:07:13 <dkliban> bmbouter: where do we provide a URL to starta  sessions?
15:07:16 <bmbouter> galaxy_ng I believe uses it, also I believe django-admin would break if we removed it django-admin logs in once and then doesn't
15:07:20 <bmbouter> django-admin login views
15:07:25 <bmbouter> galaxy_ng login views
15:07:25 <dkliban> ah cool
15:07:30 <dkliban> that makes sense
15:07:37 <daviddavis> ah
15:07:37 * bmbouter eats a cookie
15:07:45 <dkliban> we are going to get a fix this in the openapi-generator
15:07:57 <x9c4> can we hide it from the schema-generator?
15:08:06 <fao89> the problem now is: openapi generator has a bug for generating cookieauth at ruby bindings
15:08:07 <dkliban> x9c4: not without a hack
15:08:27 <daviddavis> no one uses ruby
15:08:34 <daviddavis> jk
15:08:43 <x9c4> ruby tuesday?
15:08:46 <daviddavis> can we fix the bug?
15:08:55 <dkliban> fao89: we should open up a PR against openapi-generator adn in the in the meantime we should use a custom template in pulp-opneapi-geneator
15:09:01 <dkliban> daviddavis: yes
15:09:07 <fao89> the bug: https://paste.centos.org/view/5fdddab4
15:09:27 <fao89> line 14, openapi generator does not fill it
15:09:28 <dkliban> it's a bug in the template for the ruby configuration object
15:09:34 <daviddavis> ah
15:09:57 <daviddavis> +1 to dkliban's suggestion
15:10:02 <ggainey> concur
15:10:03 <dkliban> fao89: gerrod is already working on adding a custom template for the configuration.rb and configuration.py so we will be able to fix this ourselves
15:10:35 <dkliban> the template that gerrod is adding is going to address https://pulp.plan.io/issues/5932
15:10:41 <x9c4> +1 to fixing bugs.
15:10:56 <dkliban> and in that template we will address this bug
15:11:00 <dkliban> also
15:11:09 <fao89> sounds good to me
15:11:12 <bmbouter> +1
15:11:24 <dkliban> awesome
15:11:35 <fao89> with this, I say the OpenAPI v3 PRs are ready for review
15:11:44 * ggainey cheers wildly
15:11:48 <fao89> https://github.com/pulp/pulpcore/pull/785
15:11:51 <dkliban> fao89: i'll take a look soon
15:13:03 * x9c4 dancing
15:13:11 <fao89> next topic: call for more input: API designs for RBAC
15:13:18 <bmbouter> I just emailed this out
15:13:57 <bmbouter> I am thinking It would just be more usable to have the sub-resource named 'access_policy' that would be /pulp/api/v3/remotes/file/file/access_policy/ for example
15:14:30 <bmbouter> we could *also* have a /pulp/api/v3/access_policy/ so that users could see all policies in one place as a helper for audit purposes
15:14:35 <bmbouter> this is kind of a hybrid pitch
15:14:48 <daviddavis> that second proposal seems more usable to me
15:15:13 <daviddavis> even if it's just a list
15:16:36 <bmbouter> it's usable great for read purposes
15:16:42 <bmbouter> and maybe even write purposes ...
15:16:52 <bmbouter> I think I'm stuck on the "url" field that would be in those models
15:17:00 <bmbouter> model isntances
15:17:20 <bmbouter> is there ever a time when a pulp system has it's API prefix modified?
15:17:45 <dkliban> hmm
15:17:48 <dkliban> i don't think so
15:17:49 <ttereshc> could we provide some strong validation to avoid mistakes and for users to have a list of urls to pick from? hten option 2 will be fine I think
15:18:30 <bmbouter> ttereshc: well that's the other issue I don't think we can do that
15:19:06 <bmbouter> well it could be possible if plugin writers provided all of them explicitly and pulpcore aggregated them ...
15:19:14 <bmbouter> so yes, yes we could do that :)
15:19:18 <ttereshc> :)
15:19:44 <ttereshc> why we can't get all the urls for viewsets? is it that we can get only full endpoints?
15:20:03 <ttereshc> from the router or from some kind of introspection
15:20:06 <bmbouter> maybe we can
15:20:30 <bmbouter> plugins will need to ship a default one though so they coudln't do that unless they allow_list that url in the CHOICES field
15:20:44 <bmbouter> a default access policy, and create the accespolicy instance via a data migration
15:21:11 <bmbouter> part of the issue of auto-gathering them is that not all urls may be access guarded, e.g. /pulp/api/v3/status isn't?
15:21:15 <bmbouter> but maybe it should be?
15:21:44 <bmbouter> this gives me enough to work with I think though. go with the all access_policy instances at one API, and have a solid CHOICES field
15:22:27 <dkliban> i gotta go
15:22:35 <ttereshc> bmbouter,  on the other hand, if we accept only patch and no post, users can't create wrong/bad urls
15:22:41 <ttereshc> oops put
15:22:50 <bmbouter> true! I hadn't thought about that
15:23:21 <bmbouter> alright I'm making the single-viewset option because it's easier to make, we know it's useful, and we can add more to it later
15:23:22 <ttereshc> so maybe option 2 is good as is
15:23:23 <bmbouter> sound ok?
15:23:42 <ttereshc> yes, sounds great
15:30:20 <fao89> #endmeeting
15:30:20 <fao89> !end