Decent IAM

Decentralizing Identity & Access Management and Authorization

Identifying Myself

January 1, 2024

I’m Dr. Rohit Khare, a computer scientist trying to make Identity & Access Management (IAM) easier and more effective. As a Google product manager, I launched IAM with “only”a few hundred controls. Now there are almost 50,000 entries on Permissions.cloud across Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure!

With the tools we have today, it’s no surprise customers can’t configure such complicated controls correctly — yet Cloud providers rely on a “shared responsibility” model to blame breaches on developers. I played my part in this stubborn status quo, so I’d like to atone for my sins, so to speak.

I’ve led product development at cybersecurity startups commercializing open-source approaches to automating governance, risk, and compliance for Cloud IAM, at Stacklet.io and Noq.dev. I’m excited about new efforts emerging from the Authorization (AuthZ) community, so I’ve started volunteering to help curate news clippings and convene a conference in 2024.

Even though I don't have any specific solutions to start from, I’m going to use DecentIAM.com to understand the problem space, track new technologies, and learn from new leaders. I also hope it’s helpful to you, dear reader, since I’d like to learn from your experiences, too!

Encryption is Hard Easy!

January 2, 2024

I’ve grown so old that I had to do the math to confirm that I was still a teenager when I sold my first software package. Confidante (not this one from UW) was an encryption plug-in for NeXTMail, for a domestic security agency. I became an international arms dealer, as elite a cypherpunk as I could dream of, back in those days of ITAR export controls for RSA BSAFE floppy discs!

I was almost as lucky to learn about crypto as an undergrad as I was to focus my career on the World Wide Web so early I could still trade tips on hacking together a WYSIWYG editor for HTML with the original inventors at CERN. Eventually, my first IETF Standard was another minor milestone at the intersection of the Web and Security (h/t to Scott Lawrence!).

The lasting impact of RFC 2817 was creating a registry of tokens for ways to upgrade an basic HTTP connection to speak alternative protocols, such as WebSockets today. That, in turn, was necessary so TLS could advance up the Standards track itself.

I was too embarassed to publish this entry without encryption, so I finally dug into what it would take to turn HTTPS on for my own Web server. Boy, am I glad I waited — I've never dreamed how easy EFF’s CertBot would make it to deploy Let's Encrypt certificates on my ancient, jury-rigged Apache setup. Kudos to everyone involved for an excellent user experience!

What comes before Death, but after Taxes? IIW #38!

January 5, 2024

What a long, strange trip from helping organize catering and sponsoring the first IIWs to buying my own ticket as an individual civillian nineteen years later! I just registetered for the Internet Identity Workshop and recommend it to anyone else who can make it to Mountain View from April 16-18, 2024 by the early-bird deadline next week

This time around, I'm excited about the rise of authorizaiton on the aganda, arguably a return to the original emphasis on OAuth as “Open AuthZ” after dacades of convening the right community to have nailed down Internet-scale AuthN with OIDC and advocacy from the sovereign identity folks

Forced to brainstorm on-the-spot, here were a few un-conference topics I thought up before my shopping-cart expired:

What topics are you planning to present about or lead a discussion about at this IIW?

Decentralized Authorization
resources in other apps/services/systems. "If you can't read a file, you shouldn't read its backups"
Policies in Plain English
Converting natural language business goals into operational access control policies and guarding who can join or leave groups
CedarCamp
Discuss current extension proposals, trade implementation tips, and find contract work
Zed-in-a-Box
How might weimprove developer experiences for embedding AuthZ in GUIs and APIs?
PolicyPedia
How might we use generative AI to document actual business policies and exception processes?
CSI:AuthZ
Forensic-grade reconstruction of authorization policy to test and prevent misconfiguration: how can we "backup" policies in a usable format?
Fediverify
How can end-user predict where their personal information flows in the Fediverse? How can they recall it?
Federated Learning from Private Configs and Logs
How can operators safely share enough about the authorization policy configurations to still learn useful models of "what permissions go together" or the "best way to spell test/prod/dev to tag assets" or collaboratively score risky domains for a "no-fly list" of sorts?

What are you hoping to learn about or hear a presentation about at IIW?

What are the critical questions about user-centric identity and data you hope to discuss with peers at IIW?

”Git, or it didn’t happen!”

January 10, 2024

It’s crazy we can’t test changes to Cloud security policies. We wouldn’t change code in production without version controlled releases, automated regression testing, and gradual rollouts — yet none of those are common practices in IAM today.

Infrastructure-as-Code (IaC) enabled the revolution in DevOps affairs by replacing interactive Cloud configurations GUI clicks and API calls with text files that could be checked in and reviewed just like ordinary code. From Chef to Helm charts, all sorts of new domain-specific languages have emerged for describing “what” to deploy, instead of “how” to deploy it.

Yet when it comes to creating a new identity for an intern, making them a member of the team, or allowing them access, very few of those changes are captured in a file format that’s easy to analyze, approve, audit, or apply.

Putting all of those IAMOps under Git version control was why we invented IAMbic.org at Noq: “IAM, but in code.” Now, Terraform, CDK, and many other approaches all include constructs that represent IAM roles, policies, and users, so there’s all sorts of appropriate arguments about what features make a new language necessary (or not). IAMbic can also define directory entries, to control group memberships, for example.

Administrators benefit from all the advantages that accrue from applying all changes -as-code:

Reviews
All changesets come together as coherent patches, each of which can be evaluated by the people who need to pre-approve. If they aren’t engineers who grok Github, then message managers over Slack. The basic building blocks for User Access Reviews (UAR) and other compliance challenges become easier to enforce in advance than audit after the fact.
Logs
Speaking of audits, as long as direct access to the Cloud control plane is also abstracted away by an ‘apply’ action, placing all changes under version control also establishes forensic-grade activity logs. (As long as we automate the checks that ensure there aren’t any backdoors, either.)
Tests
Once IAM operations are represented as text, they can also be checked for errors, from static spelling checks (like an intern’s account name) to scheduling conflicts (so there’s always at least one administrator on-call each shift) to dynamic regression testing in the future (like applying new conditions to old logs to estimate how many past approvals might have been denied if the changes had been in place)
Branches
Versioning is the value added by any source-code control for rollbacks, cherry-picks, staging releases from dev to test then prod, or patches applicable to alternative releases.

Since testing is the beginning of enlightenment in engineering, it seems hard for me to imagine our industry ignoring first-class support for something like Git before making changes to Cloud configurations for IAMOps. Or, for that matter, for FinOps (accounting, budgets, quotas, etc), PrivacyOps (geographic jurisdictions, support investigations, derived data leaks, etc), or other Governance policies (supply chains, build horizons, launch reviews, etc).

Only once everything’s in Git can we claim to create a true “digital twin” for testing Cloud configurations. Only once we can replicate and simulate the impact in advance can begin to tackle two other pillars of a Decent IAM: automated reasoning to enforce higher-level policies written in natural language.

Footnote:To be sure, Cloud providers typically store the past few revisions to an IAM roles or policy and include APIs to roll them back, as well as audit logs of those changes. However, those identifiers aren’t aligned to revisions from the customer’s code repositories, nor are individual changes attributable across transactional changes as they might be if the primary abstraction to the control plane were “just Git.”

Policy-as-Code only works when coders care about policy

January 11, 2024

It’s crazy that policies aren’t in plain English, especially with all the hype around generative AI! After all, the stakeholders that set policies — from legal officers to line managers to government regulators and accounting auditors — already document their requirements and rationales in natural language. Sure, it might not sound natural, written in legalese laden with industry-specific jargon, but that’s still far from JSON. And even if there were a Google Translate for compiling plain text back into code, would a collaboration platform for policymakers work more like Wikipedia than GitHub or Jira?

Can Coders Write Policies?

The original sin of Policy-as-Code (PaC) is that coders don’t care about (most) policies. “Setting budgets to cap intra-day exposure to trading with counter-parties” isn’t as easy to enforce as “only admins can issue refunds.” Whether the notion of administrator-ness were implemented in role-based (RBAC), attribute-based (ABAC), relationship-based (ReBAC), or any other ACL, it’s still a straightforward analogy to guard the code that issues refunds with an if-statement that checks for an issueRefund permission.

Coders that care about avoiding crashes can also appreciate boolean formulas for blaming callers. The vaunted “Shared Responsibility” model of Cloud security that redirects most of the real responsibility onto Cloud customers also appears to recur within IT, between separate services and teams. Adopting a modern authorization approach by documenting your own users, resources, and permissions (“names, nouns, and verbs”) is easiest when the nouns are exactly what you defined in your database schema, the verbs are exactly the methods you defined in your API, and the complexity of configuring it correctly is left as an exercise for the operators.

Can Policy Writers Code?

The promise of Policy-as-Code is that extracting all the business logic for making risky decisions from the application logic and placing it on a separate control plane would allow all the stakeholders to review and revise the company’s policies, without having to individually adapt all their apps. When that business logic gets reduced to “I can has Cheezburger,” that’s pretty simple code to check for getCheezburger permission. What about the real-world, when “Any admin can issue refunds” becomes “Any admin can issue refunds, except to themselves, other employees, or family members; and then only up to $XXX and less than YY% of daily transactions; and never for any SKUs marked as non-refundable by law or discontinued; and never in a different currency than the original form of payment …”?

Look, I’ll admit that’s a laughable example of a natural language policy. Still, if I had a nickel for every cybersecurity startup sales pitch slide deck where one side showed the goals of a policy and the other half had some cryptic code in their domain-specific language that didn’t really reflect that in reality… then I might have enough nickels to a buy a donut, one with a hole big enough to fit all the loopholes in that logic.

Are Policies in Plain English a (PiPE) Dream?

What I wish we could say was, “Interns have no business looking at customer’s data in production” and turning that into limitations on which databases, buckets, folders, and functions interns are allowed to access. When, inevitably, an intern gets assigned to the Customer Success team trying to write a tool to recover lost records, there should be an equally clear, plain-English policy for exceptions: to request privileged access, to justify which customers have consented, to grant a (temporary) escalation, and to archive and audit all of their actions afterwards.

Can we get there from here using any of today’s popular proposals for policy programming languages? Would it be just another “low-code” success story for turning prompts into JSON-formatted IAM changes? Could we use ChatGPT as a co-pilot for writing with existing PaC languages? Or would it be even more unreasonable to turn over responsibility for cybersecurity to error-prone, hallucination-prone, and jailbreak-prone AIs?

Will AI Coach Writers to Code — or Coders to Write?

I’m optimistic that an LLM might understand “intern” well enough to identify the ‘summer-2024@’ group in a directory system; and probably able to map “production” to the “env=prod” tag found on Cloud resources — but I’m not at all confident any AI could map out a SaaS code base well enough to identify all the pathways “customer data” flows through. It will probably require a dialog with all those folks-who-don’t-code to write up what kinds of private information their app processes, where it’s located, and when various derived data products, like analytics about app usage, are de-identified enough to not constitute “customer data” anymore.

Today, though? Even restricted to asking for assistance with IAM permissions defined before an LLM’s training cutoff date, a textual approach appears insufficient to connect the dots between all the attack paths that we are still uncovering in misconfigurations of IAM. I strongly suspect that LLMs will have to be coupled with automated reasoning, the third pillar of my approach to building a more-decent IAM… so stay tuned to this blog!

Elastic Authorization

January 31, 2024

Another idiosyncratic approach I take to AuthZ is imagining quotas and budgets as additional aspects of authorization. Instead of treating them as entirely orthogonal operations that are invoked after checking permissions to further filter, delay, or reject proposed actions, I think the same policymakers and stakeholders that want to protect Interns from exposure to private end-user information in production also want to protect Interns from the consequences of buggy actions that could run up huge compute bills or send out spam emails. Instead, I’d claim that instead of only issuing black-and-white, Yes-or-No, Grant-or-Deny decisions, any DecentIAM service ought to make elastic decisions depending on the resources and capacity available.

Here’s how a fourth factor might fit in context with the other three essential elements of any DecentIAM system, in my opinion:

  1. Permission: How would this action be authorized?

    Automated reasoning must show how this actor has permission to take this action on this resource, at this time, under these current conditions: Why is this possible?

  2. Justification: Why should this action be authorized?

    Natural language policy translations must be accurate enough to also clearly summarize the rationale: Why will this be wise?

  3. Attribution: How could this action be authorized?

    Version-controlled, human-readable, and complete configurations of IAM with auditable change logs must explain how policies were approved: Why was this allowed?

  4. Capacity: Can this authorized action actually occur?

    Real-time evaluation of costs and available capacity against policies limiting rates, budgets, locations, and other quotas make it possible to determine, in advance, whether sufficient resources are available to take an action: Is this affordable?

Bringing economics into the fold might appear to add more complexity than it’s worth. But I’m optimistic a unified theory (and policy language) for authorization may make DecentIAM more attractive to developers and more effective for business leaders.

From an implementation perspective, inside the hyperscale Cloud providers, these are entirely different control planes, of course! Quotas are enforced by the service mesh routing layers that rate-limit, redirect, fill, and draining in-flight requests between computing clusters operating at varying code release levels. And Billing is another service entirely, often post-processed to avoid impairing availability and sometimes unable to compute total job costs until the work’s almost completed, making any budget caps softer and less tied to real-time clearing or credit risk than one might imagine.


Contact: rkhare@gmail.com or LinkedIn