-
Notifications
You must be signed in to change notification settings - Fork 298
Part 1 : Adds RLS and CLS control Policies #2048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
proposal in Apache Polaris for the policy spec - https://lists.apache.org/thread/rf2zsgk9qh36z3s63gx6dgtl0s4cwngr |
cc @laurentgo |
PolicyEntity policyEntity, boolean inherited) { | ||
private boolean filterApplicablePolicy(PolicyEntity policyEntity) { | ||
// check the type | ||
if (policyEntity.getPolicyType().equals(ACCESS_CONTROL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a PolicyEntityAugmentor which could conditionally modify the object, let me think about it more.
decd92b
to
85ba526
Compare
I'm a bit surprised to see a "ready for review (and merge)" PR for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some preliminary comments :)
// Optional, if there means policies is applicable to the role | ||
private String principalRole; | ||
|
||
// TODO: model them as iceberg transforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to redo this policy definition after merging or before merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can amend them in another version, if a release goes in with, policies support versioning https://polaris.apache.org/in-dev/unreleased/policy/
The only concern of not using iceberg transform's right now is that they are at the moment limited, yes existing column projection can be modeled as iceberg identity transform ... but if we want to support data masking then transforms should contain references of iceberg UDF's but right now the support is not there hence refrained, open to it if we want to model them like that .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM 👍
public class AccessControlPolicyContent implements PolicyContent { | ||
|
||
// Optional, if there means policies is applicable to the role | ||
private String principalRole; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this policy apply to a role? What is the mechanism? I could not find this in the linked doc 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to purely support people who just have role based access control, providers like MS, AWS etc ... for them they can just store their row filters / projections against the role and the applicable policy
I go into the details of this here - https://docs.google.com/document/d/12nhS0GX1U1PqEBKp74bIBZsL9kB5duDlN9diHJAhJsM/edit?tab=t.0#bookmark=id.ij6iuno9gsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is: Is this role a Polaris Principal role? If we have multiple policies, how do we find the set of applicable policies? (I'm not sure I saw details on the in the doc 😅 )
More broadly: Is the binding to roles actually part of the policy?
This is not an objection... more of a point to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have multiple policies, how do we find the set of applicable policies? (I'm not sure I saw details on the in the doc 😅 )
This is an existing feature of the policy store, you can retrieve all policies applicable to a given entity. Each policy can correspond to one role.
// Use a custom deserializer for the list of Iceberg Expressions | ||
@JsonDeserialize(using = IcebergExpressionListDeserializer.class) | ||
@JsonSerialize(using = IcebergExpressionListSerializer.class) | ||
private List<Expression> rowFilters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this work "without context functions" (code comment above)? How will Polaris code interface with these expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're used to generate a view, right?
private List<Expression> rowFilters; | ||
|
||
private static final String DEFAULT_POLICY_SCHEMA_VERSION = "2025-02-03"; | ||
private static final Set<String> POLICY_SCHEMA_VERSIONS = Set.of(DEFAULT_POLICY_SCHEMA_VERSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
@@ -28,7 +28,8 @@ public enum PredefinedPolicyTypes implements PolicyType { | |||
DATA_COMPACTION(0, "system.data-compaction", true), | |||
METADATA_COMPACTION(1, "system.metadata-compaction", true), | |||
ORPHAN_FILE_REMOVAL(2, "system.orphan-file-removal", true), | |||
SNAPSHOT_EXPIRY(3, "system.snapshot-expiry", true); | |||
SNAPSHOT_EXPIRY(3, "system.snapshot-expiry", true), | |||
ACCESS_CONTROL(4, "system.access-control", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term ACCESS_CONTROL
is too generic IMHO. How about TABLE_DATA_ACCESS_EXPRESSIONS
?
The "expression" part related to the fact that this policy uses Iceberg expressions to represent filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we'll use the same policy to also have non-expression based filtering, but I think that something like TABLE_ACCESS
or TABLE_DATA_ACCESS
is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the same policy type support different contents? What is the approach to processing different contents within the same policy type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I modeled all 3 policies to be represented in a single policy spec
columnProjections
- column hiding (only authorized against RBAC)
- column projections (DDM -> to be applied with the UDF's)
- rowFilter expressions (iceberg expression which can contain UDF references)
please let me know if you prefer it otherwise ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a firm opinion on the specific form of these policies (yet).
However, I'd like to make this system extensible. That is, if we have another kind of policy for row filtering, assigning the generic ACCESS_CONTROL
name to the current one will make the new policy kind of marginal. This is why I propose for policy type names to be more specific up front.
|
||
public static String replaceContextVariable( | ||
String content, PolicyType policyType, AuthenticatedPolarisPrincipal authenticatedPrincipal) { | ||
if (policyType == ACCESS_CONTROL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why force all policies to go through this if
? Would it be possible to refactor the code to leverage java types for invoking type-specific replacements?
.setContent(policyEntity.getContent()) | ||
.setContent( | ||
PolicyUtil.replaceContextVariable( | ||
policyEntity.getContent(), policyEntity.getPolicyType(), authenticatedPrincipal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a more flexible / extensible approach to connecting policies with the security context than a simple reference to AuthenticatedPolarisPrincipal
. The Principal may not be the only factor in making access decisions.
// Iceberg expressions without context functions for now. | ||
// Use a custom deserializer for the list of Iceberg Expressions | ||
@JsonDeserialize(using = IcebergExpressionListDeserializer.class) | ||
@JsonSerialize(using = IcebergExpressionListSerializer.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This binds Polaris API to internal serialization code in Iceberg. Iceberg changes in Expression
serialization will affect Polaris APIs. I'd like to avoid this dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me given that Expression
is also defined in the IRC spec: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2162. So Iceberg side need to make sure any changes there won't violate the spec. We've also depends on CatalogHandler, RestResponse, TableIdentifier, and other classes/models from iceberg. We can change to a custom one later if we need some customizations.
@@ -98,6 +102,10 @@ public static boolean canAttach(PolicyEntity policy, PolarisEntity targetEntity) | |||
case ORPHAN_FILE_REMOVAL: | |||
return BaseMaintenancePolicyValidator.INSTANCE.canAttach(entityType, entitySubType); | |||
|
|||
case ACCESS_CONTROL: | |||
// TODO: Add validator for attaching this only to table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for deferring this?
polaris-core/src/main/java/org/apache/polaris/core/policy/PolicyUtil.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/policy/PolicyUtil.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/policy/PolicyUtil.java
Outdated
Show resolved
Hide resolved
|
||
policyContent.setRowFilters(evaluatedRowFilterExpressions); | ||
return AccessControlPolicyContent.toString(policyContent); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(design) is it the right thing to return the original content if an exception occurs?
import org.apache.iceberg.expressions.Expression; | ||
import org.apache.polaris.core.policy.validator.InvalidPolicyException; | ||
|
||
public class AccessControlPolicyContent implements PolicyContent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(design) it may be seen as a preference, but it seems the overall language community is moving towards immutable objects as data carriers (like java records) and wonder if this is something we should adopt here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a separate discussion for this in the community about the adoption/direction to use Immutable objects in the future. For this PR I think be consistent with existing data structure is fine.
import org.apache.iceberg.expressions.Expression; | ||
import org.apache.iceberg.expressions.ExpressionParser; | ||
|
||
public class IcebergExpressionListDeserializer extends JsonDeserializer<List<Expression>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(design) I wonder why a serializer/deserializer pair for the list is needed vs having it for the elementy type (Expression
)?
@@ -30,6 +31,8 @@ private static ObjectMapper configureMapper() { | |||
mapper.configure(DeserializationFeature.FAIL_ON_MISSING_CREATOR_PROPERTIES, true); | |||
// Fails if a required field is present but explicitly null, e.g., {"enable": null} | |||
mapper.configure(DeserializationFeature.FAIL_ON_NULL_CREATOR_PROPERTIES, true); | |||
// This will make sure all the Iceberg parsers are loaded. | |||
RESTSerializers.registerAll(mapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that cause conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all the REST specific types?
polaris-core/src/main/java/org/apache/polaris/core/policy/PolicyUtil.java
Outdated
Show resolved
Hide resolved
Namespace parentNamespace = policyEntity.getParentNamespace(); | ||
|
||
return ApplicablePolicy.builder() | ||
.setPolicyType(policyEntity.getPolicyType().getName()) | ||
.setInheritable(policyEntity.getPolicyType().isInheritable()) | ||
.setName(policyEntity.getName()) | ||
.setDescription(policyEntity.getDescription()) | ||
.setContent(policyEntity.getContent()) | ||
.setContent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with PolicyCatalog
exact role but it seems it is used for CRUD operations on the policies. As such, replacing the content of some policies on the fly may not be the right thing to do because people would not be able to access the source of truth?
(design) IMHO this seems like a leakage of an access control mechanism outside of the realm of PolarisAuthorizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are just replacing the get-applicable policy
with the context, its a key point to resolve server side identity and the client engine doesn't even needs to know about these as it might not even know user's identity (just has token).
Now if the concern is we want to know exact policy one can just load it right and we don't resolve context there :
please have a look in this API :
https://github.com/apache/polaris/blob/main/spec/polaris-catalog-service.yaml#L153
so IMHO this concern is addressed !
this seems like a leakage of an access control mechanism outside of the realm of PolarisAuthorizer
I thought about this really hard, even talked to folks inside, so far the PolarisAuthorizer interface just Authorizes based on RBAC, now Authorzier returning FGAC predicates and projections would require more thorough thought
I just followed the existing way of retriving the policies from persistence and doing checks which policies apply to the caller ?
if we want to go on the route of supporting GetAccessControlPolicy via Authorizer interface, Here is what I had in mind.
public interface FGACAwarePolarisAuthorizer extends PolarisAuthorizer {
default AuthorizationContext authorizeOrThrowWithFGAC(
@Nonnull CallContext callContext,
@Nonnull AuthenticatedPolarisPrincipal authenticatedPrincipal,
@Nonnull Set<PolarisBaseEntity> activatedEntities,
@Nonnull PolarisAuthorizableOperation authzOp,
@Nullable PolarisResolvedPathWrapper target,
@Nullable PolarisResolvedPathWrapper secondary) {
authorizeOrThrow(callContext, authenticatedPrincipal, activatedEntities, authzOp, target, secondary);
// query policies from persistence and give back FGAC policy
// create AuthZ context
}
AuthorizationContext authorizeOrThrowWithFGAC(
@Nonnull CallContext callContext,
@Nonnull AuthenticatedPolarisPrincipal authenticatedPrincipal,
@Nonnull Set<PolarisBaseEntity> activatedEntities,
@Nonnull PolarisAuthorizableOperation authzOp,
@Nullable List<PolarisResolvedPathWrapper> targets,
@Nullable List<PolarisResolvedPathWrapper> secondaries) {
authorizeOrThrow(callContext, authenticatedPrincipal, activatedEntities, authzOp, targets, secondaries);
// query policies from persistence and give back FGAC policy
// create AuthZ context
}
}
AuthorizationContext
- AccessControlPolicyContent
- Map<Object, Object> additionalParams
wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems a good conversation starter, thanks for proposing it. I don't know if it is best to try and extend APIs (which are very generic whereas FGAC only apply to a couple of permission) or have an extra separate method that you combine with a previous check (so 2 calls to the authorizer)
For AccessControlPolicyContent
, would it be like this?
public record AccessControlProcessingInstruction(String type, String expression);
public record RowFilter(AccessControlProcessingInstruction filter);
public record ColumnTransformation(int id, String name, AccessControlProcessingInstruction transformation);
public record AccessControlPolicyContent(String rowFilter, List<ColumnTransformation> columnTransformations);
return content; | ||
} | ||
|
||
public static boolean filterApplicablePolicy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(design) this method doesn't have any javadoc, but shouldn't this belong to some dedicated class related to FGAC instead?
(Replying to the later editor PR description)
We have not agreed on a Polaris implementation in an Iceberg community sync. (Note: the Apache policy is: "If it didn't happen on the mailing list, it never happened.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of technical comments on this PR.
I think we should explore all legit options to manage FGAC rules in Polaris. Iceberg expressions is one option. Another option is leveraging a standard like SQL:92, SQL:99, SQL:2002. Substrait is a mature project with a diverse and active community and an option as well.
My concern about Iceberg expressions is the limitations it has regarding available functions and expressions. Some examples: COALESCE
, LPAD
, RPAD
, NULLIF
, CASE
, CAST
, INTERVAL
(...).
I think we lose a lot of flexibility without these functions. Maybe I haven't looked into it enough, but I cannot figure out a way with Iceberg expressions to mask a credit card number like ***234
. But I guess that's what should not go into Iceberg expressions? What's the representation of those String
s?
Row-level filter expressions benefit a lot from CASE
, CAST
, COALESCE
etc. Not sure whether it's worth to not have those available to users.
The Iceberg expressions implementation is an interpreter, but that's rather a 2nd-ary concern.
All Iceberg query engines I know are based on SQL and can natively handle and optimize SQL functions. Not sure whether it's worth to have an (intermediate) representation.
Spark datasets/frames are not SQL, but FGAC with those APIs is a different topic ( LogicalPlan
).
I think we should start with how row-filters and column-transformations are exposed to query engines and then figure out the best way how those are are managed in Polaris ("top down approach").
@Override | ||
public List<Expression> deserialize(JsonParser p, DeserializationContext ctxt) | ||
throws IOException { | ||
ObjectMapper mapper = (ObjectMapper) p.getCodec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast assumes a Jackson implementation detail and can likely break with Jackson updates.
Therefore please remove this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As ObjectCodec
does have a readTree(JsonParser)
method, is the cast even necessary?
List<Expression> expressions = new ArrayList<>(); | ||
if (node.isArray()) { | ||
for (JsonNode element : node) { | ||
// Convert each JSON element back to a string and pass it to ExpressionParser.fromJson | ||
expressions.add(ExpressionParser.fromJson(mapper.writeValueAsString(element))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary at all?
jsonGenerator.writeStartArray(); | ||
if (expressions != null) { | ||
for (Expression expression : expressions) { | ||
jsonGenerator.writeString(ExpressionParser.toJson(expression)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wyh "string" and not proper JSON?
@@ -30,6 +31,8 @@ private static ObjectMapper configureMapper() { | |||
mapper.configure(DeserializationFeature.FAIL_ON_MISSING_CREATOR_PROPERTIES, true); | |||
// Fails if a required field is present but explicitly null, e.g., {"enable": null} | |||
mapper.configure(DeserializationFeature.FAIL_ON_NULL_CREATOR_PROPERTIES, true); | |||
// This will make sure all the Iceberg parsers are loaded. | |||
RESTSerializers.registerAll(mapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all the REST specific types?
"{\n" | ||
+ " \"principalRole\": \"ANALYST\",\n" | ||
+ " \"extraField\": \"someValue\"\n" | ||
+ // Extra field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the "extra field"?
Indeed, the ability to express something like "if X, then allow; else, deny" seems essential to row filters. I don't see any ternary operator in the current Iceberg expressions. @singhpk234 is the intent to add more expressions to Iceberg to enable row filters? |
@snazy @adutra thank you for sharing your feedbacks, i want to walk you all through what my thought process was : Why Iceberg Expressions and not SQL we did explore dialect agnostic SQL such as SQL92 for most part of it and we keep coming back to dialect specific requirements if it was a sql, engines want to store their dialect specific stuff directly in policy and hence in persistence, making this policy only workable to the policy definer dialect. what will be the behaviour if i don;t understand the dialect. What i see is we can expand Iceberg expression to contain UDF references and via UDF you model all you dialect specific stuff but bottom line being you just operate in iceberg expression in policy definition. Also heads up Iceberg Expression are gonna get expanded soon due to the constraints for v4 work that Anton is driving (a uber level idea was discussed in some of the sync), essentially storing table constraints in iceberg metadata so that it can be retrieved and enforced by calling engine but the storage is still iceberg expressions, so iceberg expression is what they plan to use for interoperability Note : I checked in catalog community sync last time iceberg expression with UDFs seems like a right direction, so i am not saying we pinned on it but just presenting thoughts of what the wide community thinks on this aspect. I understand UDF's are not there yet and it will take some time meanwhile using iceberg expression seems and storing RLS there seems like good step IMHO. Why not Substrait ? I think if we want subtrait, we can make UDF's return substrait directly, but i think in iceberg community atleast there is no concencus on the IR, as similar discussions have been brought up in the community in past for Iceberg Views, i would request to drive that in iceberg first and then we can incorporate that in policy. Also please note engines like Snowflake / Redshift ... doesn't support substrait, unless its established as a standard IR in iceberg IMHO we should not take dependency on it.
Yes, Non SQL requires a more through discussions imho, as for example how do we model the SQL written in one langue to be parsed ? I hope I am able to explain my thought process here ! I really appreciate you taking a look. |
Managing FGAC rules and exposing protection instructions are two different domains. I think we should not restrict our users to the limitations of Iceberg expressions. |
Thanks for the PR @singhpk234, super interesting to see RLS and CLS policies for governing iceberg tables Do you know if theres a way I can quickly set this up and do a proof of concept locally? I think it would be helpful to document "getting started with policies" in polaris so folks who are interested can get some hands-on experience with the proposal |
@snazy i understand that and in my design i want to leverage existing policy store implementation to manage FGAC rules and then what you are calling as protection instructions which is the result of policy evaluation is my view SQL
I am bit confused about your conclusion here, as in one place you are suggesting iceberg needs to set standard for policy evaluation result but at the same time we are discarding iceberg expression which are first class citizens of iceberg world (which are portable and are used in different language like iceberg rust / iceberg python ...) ? IMHO we should not go into dialect specific SQL as it makes the evaluation harder and specially given that we already have path forward with iceberg expressions and UDF's. please let me know you thoughts and lets have more discussions ! |
@kevinjqliu thank you for the interest in the PR, unfortunately there is not getting-started (let me create a ticket for it) with policies that i am aware of, here is some integration tests - |
Iceberg Expressions as of today can IMHO not be used to represent a bunch of simple and none of the more complex use cases. AFAIU it is not possible to:
I would love to use a portable standard, but as of today (as of Jan 2022 actually) Iceberg Expressions are not yet suitable for this use case. Until we have a way to express even simple, standard use cases, I strongly believe we have to have to support SQL. Which means, that requiring Iceberg Expressions, as proposed here, is IMHO not a viable option to satisfy the needs of our users today. As a side note I also believe that the "front end" use case (provide protection instructions) should drive the "CRUD" parts, not the other way around. It is completely fine when users only rely on Iceberg Expressions, but it would be IMHO quite a mistake to intentionally make it impossible for users to have a much more flexible way. I also do not think that UDFs are the answer to everything, despite that Iceberg UDFs are not that concrete nor even implemented. But even if Iceberg UDFs would be there, you would still have to deal with data types and casting and null-handling. None of these three aspects is currently covered by Iceberg Expressions. |
85ba526
to
11a087e
Compare
Updating here we talked offline, it seems like a decent option to have iceberg expressions, supporting sql dialect along side is an ongoing discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no consensus on this particular approach.
Particularly, there's no consensus on:
- whether security policies can be attached to multiple tables/views or just one
- whether security policies shall be attachable to only tables or tables and views
- how Polaris stores security policies
All this requires a broader discussion and should IMHO not be set in stone at this time.
Could you clarify which of the points above are directly related to this pr? Or is this a general objection to any work on FGAC until the above questions have all been resolved? From my impression this looks like this pr is just covering defining policies using Iceberg expressions and resolving context variables in said policies? The question of policy attachment is essentially ignored in this PR and I'm not sure I see anything here that changes the persistence layer or would block changes to that in the future? |
Right - this PR directs the approach how FGAC policies managed in Polaris. I really prefer to avoid having multiple approaches of the same thing in Polaris.
It is not. This PR uses the existing policy mechanism, which might not be the right way for FGAC. |
import org.apache.iceberg.expressions.Expression; | ||
import org.apache.polaris.core.policy.validator.InvalidPolicyException; | ||
|
||
public class AccessControlPolicyContent implements PolicyContent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a separate discussion for this in the community about the adoption/direction to use Immutable objects in the future. For this PR I think be consistent with existing data structure is fine.
// Iceberg expressions without context functions for now. | ||
// Use a custom deserializer for the list of Iceberg Expressions | ||
@JsonDeserialize(using = IcebergExpressionListDeserializer.class) | ||
@JsonSerialize(using = IcebergExpressionListSerializer.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me given that Expression
is also defined in the IRC spec: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2162. So Iceberg side need to make sure any changes there won't violate the spec. We've also depends on CatalogHandler, RestResponse, TableIdentifier, and other classes/models from iceberg. We can change to a custom one later if we need some customizations.
@@ -98,6 +102,10 @@ public static boolean canAttach(PolicyEntity policy, PolarisEntity targetEntity) | |||
case ORPHAN_FILE_REMOVAL: | |||
return BaseMaintenancePolicyValidator.INSTANCE.canAttach(entityType, entitySubType); | |||
|
|||
case ACCESS_CONTROL: | |||
// TODO: Add validator for attaching this only to table | |||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to pass in callContext or at least the configuration store to make this default to false and can be manually turned on for POC purpose, until we implement the validator for attaching this policy only to table.
IMHO, if the concern is that the attachment, this PR still does a good job of focusing on the policy definition itself. As an additional step, I suggest we disable attachments of ACCESS_CONTROL policy by default while still allowing them to be manually enabled for testing/PoC purposes. So, we can ensure no invalid attachment will happen before we finalized the attachment requirement. |
Thats a nice option @HonahX (This is why i added a TODO in above for attachment) this would help address @snazy feedback let me hide it behind a flag for feature / E2E testing and also rebase this pr |
11a087e
to
cf70efb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, overall looks reasonable to me.
As for the objection that There's no consensus on this particular approach
, that's what the PR is for. If you have specific objections to aspects of this approach please do raise them here so that they can be addressed. In the absence of such objections, it seems that we have a consensus.
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/policy/AccessControlPolicyUtil.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/policy/AccessControlPolicyUtil.java
Outdated
Show resolved
Hide resolved
public class AccessControlPolicyContent implements PolicyContent { | ||
|
||
// Optional, if there means policies is applicable to the role | ||
private String principalRole; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have multiple policies, how do we find the set of applicable policies? (I'm not sure I saw details on the in the doc 😅 )
This is an existing feature of the policy store, you can retrieve all policies applicable to a given entity. Each policy can correspond to one role.
// Use a custom deserializer for the list of Iceberg Expressions | ||
@JsonDeserialize(using = IcebergExpressionListDeserializer.class) | ||
@JsonSerialize(using = IcebergExpressionListSerializer.class) | ||
private List<Expression> rowFilters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're used to generate a view, right?
@JsonSerialize(using = IcebergExpressionListSerializer.class) | ||
private List<Expression> rowFilters; | ||
|
||
private static final String DEFAULT_POLICY_SCHEMA_VERSION = "2025-02-03"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like for other policies, this is a sort of policy "version"
a0a4a47
to
62afe5f
Compare
Clarifying: the approach of how FGAC policies are managed deserves a public discussion on the dev mailing list considering all the UX, implementation and runtime implications that the very specific approach implemented in this PR has. |
There has been a dev-thread for this since (06/25) [1] https://lists.apache.org/thread/rf2zsgk9qh36z3s63gx6dgtl0s4cwngr |
Discussions around how FGAC is exposed are still ongoing and things can still change. We do not have a comprehensive list of requirements, meaning the whole user-experience including how Iceberg expressions evolve, how the support for UDFs will be and how UDFs will eventually look like. "Secure views", as mentioned in the linked docs and in this PR, are IMHO not the right way. That one changes the expected behavior, requires engines to do hide parts of their query plan from users and it prevents people from updating their tables (b/c those would be represented as views). Some more concerns were raised in community meetings (Iceberg + Polaris) around "secure views". We also need to define, for Polaris, how all the required things play together, including the evaluation of the policies, considering input from AuthZ sources (OAuth et al). The users' needs for Polaris are very important. Technically, we need a performant way to expose the right protection instructions. That cannot be even designed until the Iceberg API and expressions and Polaris requirements discussions have settled. I'd prefer a top-down approach, not a bottom-up approach. |
Again, Could you clarify which of the points above are directly related to this pr? this pr just defines policies and is not related how protection instructions are exposed (via view text or structured object as part of loadTable). Never the less iceberg expression are both part of IRC spec [1] and Iceberg SDK [2], this just adheres to that. I would prefer to avoid bringing unrelated discussions to the PR. [1] https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2162 |
Again, this PR sets the way how the policies are persisted pretty much in stone. I think, there's an assumption that policies are managed in Polaris only as Iceberg Expressions, and I'm convinced that this is not the consensus. |
About the PR
This PR adds proposed policy spec proposed as part [OSS] Row and Column Based Access Control: Policy Definitions
This uses iceberg expressions, to define row level filters and uses column projections as way to project, presently since we are just talking of column (waiting for UDF's in Apache Iceberg to standardize), May be we can use transforms for the projections ? open to it !
Note: In last Apache Iceberg Community Sync all were generally alligned that using iceberg expression with extending its support for reference to iceberg UDF's were the right way to go !
This additionally introduces 2 context variables which are resolved at the catalog end based on the caller
$current_prinicipal, checks if the current principal is the principal activated, if yes makes this as true !
$current_principal_role, checks if the the underlying role is one of the activated principal roles based on current caller context !
so a policy being defined with this context variables are replace inline and evaluated using iceberg expression sdk !
so when ever the caller calls /get-applicable-policies it gets back context resolved variables and row and column policy
TODO : Policy merging
please check tests for E2E tests
Note : This is for engines who wants to get the policies directly ! rather than getting the secure view and are willing to integrate to the Polaris Policy Store directly
check this combination
https://docs.google.com/document/d/1AJicez7xPhzwKXenGZ19h0hngxrwAg3rSajDV1v0x-s/edit?tab=t.0#bookmark=id.j29shahtycb8