Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add using () statement for automatic disposing #368

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
399 changes: 399 additions & 0 deletions Draft-Accepted/RFCXXXX-Using-IDisposables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,399 @@
---
RFC: RFCNNNN-Using-IDisposables.md
Author: Jordan Borean
Status: Draft
SupercededBy:
Version: 1.0
Area: engine
Comments Due: July 1, 2024
Plan to implement: Yes
---

# Using Statement Blocks for IDisposables

## Motivation

As a powershell developer/user,
I can enclose my disposable instances in a block,
so that they are disposed automatically when they are no longer needed.

## User Experience

PowerShell can interact with many instance types that might hold onto system resources which are tied to the lifetime of that instance.
Examples of such types are the [FileStream](https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream?view=net-8.0), [RegistryKey](https://learn.microsoft.com/en-us/dotnet/api/microsoft.win32.registrykey?view=net-8.0), [Process](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process?view=net-8.0) which contain operating system (OS) handles/identifiers.
On Windows, some of these handles could place a lock which can impact future operations, like trying to open a new handle on a file locked for reading, unload a registry hive with a key still open, etc.

A safe way to deal with these objects is to wrap the code that creates the instance in a `try/finally` block.
For example the below PowerShell script interacts with various streams that should be disposed when no longer needed to free up OS resources or internal memory buffers:

```powershell
$reader = [System.IO.File]::OpenText("numbers.txt")
try {
[Console]::WriteLine($reader.ReadLine())
}
finally {
$reader.Dispose()
}
```

The drawbacks here are

+ The user needs to use a `try/finally` block to ensure it is cleaned up
+ The `finally` block can be interrupted with an engine stop request
+ The user need to explicitly write code not only to pre-define the vars, set the vars, then dispose of them with a null check

C# developers can use the [using statement](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/using) that allows them to scope an `IDisposable` instance in an automatic `try/finally` block like the above:

```csharp
using (StreamReader reader = File.OpenText("numbers.txt"))
{
Console.WriteLine(reader.ReadLine());
} // reader is disposed here
```

This RFC proposes adding an equivalent `using` statement in PowerShell that can do the same thing as C#.
For example the same C# `StreamReader` example in PowerShell can now be:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be some collisions with implementations of a using function (I have one in my profile), so I think this would qualify as a breaking change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be some collisions with implementations of a using function (I have one in my profile), so I think this would qualify as a breaking change.

using is thankfully already a keyword, so you wouldn't be able to use a command named using without & 'using' otherwise you'd get a parse error. use is a common alias/function for this, but should be unaffected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the fact that using is already a keyword is another reason why I chose to use it over something like dispose/clean. People can't do using () {} as a command without the call operator so it's a nice way to avoid a breaking change.

```powershell
using ($reader = [System.IO.File]::OpenRead("numbers.txt")) {
[Console]::WriteLine($reader.ReadLine())
}
```

Another more complex example where the user might have multiple disposable objects that should be disposed is:

```powershell
$fs = $cryptoStream = $sr = $null
try {
$fs = [System.IO.File]::OpenRead($Path)
$cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
$fs,
[System.Security.Cryptography.ToBase64Transform]::new(),
[System.Security.Cryptography.CryptoStreamMode]::Read)
$sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
$sr.ReadToEnd()
}
finally {
${sr}?.Dispose()
${cryptoStream}?.Dispose()
${fs}?.Dispose()
}
```

This is a more complicated version as now the finally block needs to check that each of the variables to be disposed are actually set and they are disposed in the opposite order they were defined.
The proposed syntax for supporting this scenario would be:

```powershell
using (
$fs = [System.IO.File]::OpenRead($Path)
$cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
$fs,
[System.Security.Cryptography.ToBase64Transform]::new(),
[System.Security.Cryptography.CryptoStreamMode]::Read)
$sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
) {
$sr.ReadToEnd()
}
```

In this scenario the `using ()` statement supports multiple `AssignmentStatementAst` statements allowing it to track multiple variables to dispose.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen in the case of:

[int]$a = 1
using ( $a = Get-Process -id $PID) {
$a.Kill()
}

my assumption is that this is a runtime error, but wanted to be explicit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of the error handling question later in the RFC. But I would have this just error like usual and the using branch would just be ignored, similar to how this code runs

[int]$a = 1
try {
    $a = Get-Process -id $PID
    $a.Kill()
}
finally {
    if ($a -is [IDisposable]) { $a.Dispose() }
}

Essentially the dispose block is just syntatic sugar to ensure that the var(s) inside the using block are automatically wrapped in a try/finally to dispose it if available.

The equivalent syntax in C# would be this example:

```csharp
using (FileStream fs = File.OpenRead(path))
using (CryptoStream cryptoStream = new CryptoStream(...))
using (StreamReader sr = new StreamReader(cryptoStream, Encoding.ASCII))
{
sr.ReadToEnd();
}
```

## Specification

The syntax for the proposed `Using` statement would be:

```
using ($<disposable>; ...) {
<Statement list>
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a reasonable alternative

using (<assignmentAst>)
statement-block

and allow a single assignment only, rather than a collection of assignments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called out later on under Multiple Disposables. The reason why I'm initially in favour of allowing a collection is if you are wrapping multiple disposable blocks you would essentially need to indent each using which is annoying to do.

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new scope is not created, it means that any variables created in the statement list would persist after the execution of the block. Is that right? If a new scope is created, then the usual variable reaping would occur, wouldn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my assumption. I would have implemented it like an if/while/for/foreach/etc where the variables are still in the same scope and persist beyond it.

The part inside the parenthesis is a variable to dispose after the `<Statement list>` ends.
It can contain multiple variables as well as the code used to define the variable.
Some examples of this would be:

```powershell
# Creates the instance separately from the using statement
$key = Get-Item 'HKLM:\System\CurrentControlSet\Control\Session Manager\Environment'
using ($key) {
$key.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
}

# Creates the variable inside the parenthesis
using ($key = Get-Item 'HKLM:\System\CurrentControlSet\Control\Session Manager\Environment') {
$key.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
}

# Disposes multiple values at the end of the block
$machineKey = Get-Item 'HKLM:\System\CurrentControlSet\Control\Session Manager\Environment'
$userKey = Get-Item 'HKCU:\Environment'

using ($machineKey; $userKey) {
$m = $machineKey.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
$u = $userKey.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
"${m};${u}"
}

# Same as the above but with the vars defined in the parenthesis
using (
$machineKey = Get-Item 'HKLM:\System\CurrentControlSet\Control\Session Manager\Environment'
$userKey = Get-Item 'HKCU:\Environment'
) {
$m = $machineKey.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
$u = $userKey.GetValue('PATH', '', 'DoNotExpandEnvironmentNames')
"${m};${u}"
}
```

## Implementation Details

I propose adding a new AST type `UsingDisposableStatementAst` that contains

+ `PipelineBaseAst[]` - Disposables
+ `StatementBlockAst` - Body

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$a = <disposable>
using ($b = $a) {
statement list
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not sure what you are asking here, wouldn't like be an AssignmentStatementAst and $b is the one to be disposed at the end of the statement list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more about expectation - in some cases (if it's a reference type) I assume that $b gets disposed, but $a would too which might lead to unmet expectations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this particular scenario is covered by https://github.com/PowerShell/PowerShell-RFC/blob/92b6a481b21f84dab7a08a490664b819c291074b/Draft-Accepted/RFCXXXX-Using-IDisposables.md#variable-or-instance. It's the same question as doing

$a = <disposable 1>
using ($a) {
    $a = <disposable 2>
}

Should the object being disposed be 1 or 2, C# will dispose 1. I agree that in either scenario, and yours, It's a perfectly valid point to bring up but I'm not sure how we could really avoid such a thing except for making sure the documentation covers such scenarios.

When visited the Ast will visit the `Disposables` to find any `AssignmentStatementAst` or `VariableExpressionAst` which it can use to keep track of variables to dispose.
The compiler will then emit an Expression that wraps the `Body` with the below for each `Disposables` (in definition order):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there remoting issues that will cause difficulties?
I see the utility of:

using ($s = new-psssession) {
invoke-command -session $s { <statement list> }
}

are there situations where something unexpected may occur?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this would affect it as Invoke-Command is a blocking call and the session isn't a disposable object anyway. Even if it was I picture your above example to be the equivalent to the following pwsh code

try {
    $s = new-pssession
    invoke-command -session $s { <statement list> }
}
finally {
    if ($s) { $s.Dispose() }
}


```powershell
$v = ... # Value from assignment/variable
try {
...
}
finally {
${v}?.Dispose()
}
```

## Alternate Proposals and Considerations

Some questions that would need consensus from the maintainers are:

+ The name of the statement
+ Whether to support multiple disposables in one block
+ Should it dispose the variable or actual instance captured before the body is run
+ Support only `IDisposable` vs duck typing with `Dispose()` method
+ What to do in case of an error in the dispose
+ Future support for `using` declarations without the block/braces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of omitting the braces or otherwise not creating a block. There's very little in PS that allows you to omit the braces. I don't believe we should start here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's definitely not part of this RFC but it is something I brought up to consider as possible future work or something that should at least be considering in case we need to adjust the behaviour here to accomodate that in the future.


### Name of the Statement

This RFC proposes using `using` as the statement name to try and provide some familiarity with how this type of operation works in C#.
The downside to `using` is that PowerShell already has a [using statement](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_using?view=powershell-7.4).
The existing `using` statements are handled at parse time and support the following:

```powershell
using assembly <.NET-assembly-path>
using module <module-name>
using namespace <.NET-namespace>
```

It should be possible to differentiate between these three parse time statements (and future ones) by checking to see if the next token after `using` starts with `(` rather than `assembly`, `module`, `namespace`, and future additions.
I believe this is a more complex topic for more advanced developers so it would be ideal to align with how C# works.

There is also a `$using:varName` "scope" used in remoting situations but as that is part of the variable name itself rather than a statement I do not think it should be an issue.

Alternate names could be:

+ `clean` - Aligns with the [clean block](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_functions_advanced_methods?view=powershell-7.4#clean) introduced in 7.3
+ `dispose` - Associated with the `Dispose()` method and the action being done
+ Something else to be proposed

A con to the above options over `using` is that this is currently valid syntax in PowerShell:

```powershell
function dispose {}

dispose ($var = 'abc') { 'foo' }
```

If a keyword that is not reserved is chosen, it could potentially break scripts that have defined that keyword as a command.

### Multiple Disposables

The current proposal aims to support disposing multiple variables either specified as a `VariableExpressionAst` or `AssignmentStatementAst` and the variables will be disposed in the reverse order they were defined in the `using` statement.
It could be desirable to only allow one variable or assignment in the statement to simplify the definition.
The downside of this is increased indentation of the code when trying to dispose of multiple objects, for example:

```powershell
using ($var1 = ...) {
using ($var2 = ...) {
...
}
}
```

If only 1 disposable could be defined in each block it might be nice to support a `using ()` expression without an expression body like so that the subsequent vars do not need to be indented:

```powershell
using ($var1 = ...)
using ($var2 = ...) {
...
}
```

I feel that it is more PowerShell like to support multiple statements in the `using (...)` rather than multiple `using (...)` calls in the same line.
By allowing multiple statements the parser should also be simpler to implement.

### Variable or Instance

The C# implementation is set to dispose the instance value as it was at the start of the using statement rather than at the end.
Take for example this C# code:

```csharp
Foo v = new Foo();
using (v)
{
v = new Foo();
Console.WriteLine(v);
}
```

When compiled this code is treated as:

```csharp
Foo v = new Foo();
Foo vTemp = v;
try
{
v = new Foo();
Console.WriteLine(v);
}
finally
{
if (vTemp != null)
{
((IDisposable)vTemp).Dispose();
}
}
```

Only the original value of `v` when the `using (v)` block starts will be the one disposed.
This could be confusing for end users who might expect that the newly assigned `v` is disposed and vice versa.
This implementation would have to make the choice of;

+ Should it follow the C# behaviour and dispose the instance set at the start of the block - my recommendation
+ Should it dispose the result of the variable/assignment as it is set at the end of the block
+ Should it attempt to stop variable reassignment inside the block
+ This may not be possible to stop with `Set-Variable` but assignments could potentially be tracked/stopped at parse time

### IDisposable vs Duck typing

Should any variables specified by the using statement only support `IDisposable` types or should it also support any method with a well defined method like `Dispose()`.
The C# implementation only works with `IDisposable` types but with the dynamic nature of PowerShell it might be useful to support anything with a `Dispose()` method.
For example:

```powershell
# Dispose() is met as part of the IDisposable contract
class MyDisposableClass : IDisposable {
[void] Dispose() {}
}

# Dispose() is just a method on the type
class MyDuckTypedClass {
[void] Dispose() {}
}

# Dispose() is part of an ETS member
$obj = [PSCustomObject]@{}
$obj | Add-Member -MemberType ScriptMethod -Name Dispose -Value { ... }

# Dispose() is overridden by ETS member
$fs = [System.IO.File]::OpenRead($path)
$fs | Add-Member -MemberType ScriptMethod -Name Dispose -Value { ... }
```

Should this new implementation work with all the above examples or should it try and limit itself to specific scenarios.
Should PowerShell attempt to support `IEnumerable`, `Array` types (or not at all) and enumerate the variable values to see what ones to dispose.
From an implementation perspective it would be simpler to support only `IDisposable` types as the expression tree would not have to go through the binder to support the ETS members but it is less flexible.

### Error Handling

How should exceptions that are thrown in a potential `Dispose()` call be treated, should they just be written as any normal `MethodInvocationException` in that statement or should they be ignored.
Having the exception be raised like normal makes the most sense as callers can still wrap the `using ()` block in their own `try/catch` and silently ignoring errors is not ideal.

```powershell
try {
# $var.Dispose() in this scenario throws an exception.
using ($var = Get-Disposable) {
...
}
}
catch {
$_ # MethodInvocationException ErrorRecord
}
```

In the case when multiple using statements are used and multiple `Dispose()` calls raise an exception, the outermost block is the one that will be in the catch block.
This follows the convention in C# where the output of the below is `first` from the outermost `using` block dispose.

```csharp
using System;

public class Program
{
public static void Main()
{
try
{
using (D a = new D("first"))
using (D b = new D("second"))
{
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}

public class D : IDisposable
{
private string _msg;

public D(String msg) => _msg = msg;

public void Dispose()
{
throw new Exception(_msg);
}
}
```

It would make sense for PowerShell to follow along with this convention but one possible alternative is to raise an [AggregateException](https://learn.microsoft.com/en-us/dotnet/api/system.aggregateexception?view=net-8.0) instead.

Another error condition that needs to be considered is how should non disposable objects be treated.
As the parser has no knowledge of the instance type or whether it has a `Dispose()` method, should this be silently ignored, or should an error/exception (which one) at runtime.

### Using Without Block

While not part of this RFC, C# supports a using statement without a block, for example:

```csharp
public void Main(bool flag)
{
if (flag)
{
using Foo foo = new();
// Foo is disposed here as it is no longer in scope
}

Console.WriteLine("end");
}
```

As PowerShell has a more complex scoping setup it is most likely not feasible to implement in PowerShell but it should be considered in case someone wants to revisit it in the future.
How should this syntax be done in PowerShell, will it have any impact on the syntax proposed in this RFC.
Possible options that should work with this RFC is to support an explicit `using $var = ...` without the parenthesis.
The presence of `$` after `using` should be able to differentiate between the existing `using ...` statements and the new one proposed in this RFC.