- 
                Notifications
    You must be signed in to change notification settings 
- Fork 53
Read out the supported and unsupported attribute of a device at configure #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
         elupus
  
      
      
      commented
      
            elupus
  
      
      
      commented
        Jul 26, 2024 
      
    
  
- Read out the supported and unsupported attribute of a device at configure to be able to only list available attributes in GUI.
- Don't swallow exceptions during configure of clusters in debug logs.
| f"Failed to write attribute {name}={value}: {record.status}", | ||
| ) | ||
|  | ||
| async def _discover_attributes_all( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wan't sure if this should go here in ZHA or in ZIGPY. The cache of unsupported attributes does exist in ZIGPY.
|  | ||
| async def discover_unsupported_attributes(self): | ||
| """Discover the list of unsupported attributes from the device.""" | ||
| attribute_info = await self._discover_attributes_all() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also i'm not fully sure what type of errors devices might respond with here. We may want to gracefully handle some things. For example, the GeneralCommand.Default_Response response is sent by some quirks here which seem utterly wrong.
| await ch_specific_cfg() | ||
|  | ||
| self.debug("Discovering available attributes") | ||
| await self.discover_unsupported_attributes() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name of the method and the comment as written are opposites (available vs unsupported)
        
          
                zha/zigbee/endpoint.py
              
                Outdated
          
        
      | for cluster_handler, outcome in zip(cluster_handlers, results): | ||
| if isinstance(outcome, Exception): | ||
| cluster_handler.debug( | ||
| cluster_handler.warning( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change? assume this was for quick visual in log during development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes unrelated. But valid. Should not hide full exceptions of general type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the reconfigure dialog we show the results of the calls with success / failure. I need to update the joining UI to do something similar. It has the capability to flag failures w/ color and messaging it just isn't working atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheJulianJES we should never generically hide base Exception types. That hides programming errors (as i found out the hard way). Error in programming caused an attribute error that was hard to track down due to this.
Its fine to swallow know errors if we know their cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ps. I move this out separate.
| I believe this is already handled by zigpy, no? When we read an attribute, it is recorded as unsupported if the device responds as such (https://github.com/zigpy/zigpy/blob/37f4ee65ec19f7b8af318d4ed2dc5325029020d9/zigpy/zcl/__init__.py#L553-L554). ZHA cluster handlers list out the attributes to be read and react appropriately. Or do you mean for the frontend's cluster configuration interface? If so, this change would probably need to be moved into a special debug mode, as every extra attribute read increases the chances of initialization failing, especially for battery-powered devices. | 
| Some devices, like Hue motion sensors, can report an attribute as "unsupported" during and shortly after pairing. This affects the occupancy timeout attribute they use, for example. It should not be hidden. The clusters menu is also a debug UI. Personally, I wouldn't hide any attributes. They can also become available after a firmware update. Although we could add a symbol or change the color depending on if an attribute is known available or unsupported. Reading out all attributes on a cluster during pairing also seems like it could cause issues with battery-powered devices that go to sleep fast. We could miss the endpoints/attributes we actually care about. Furthermore, quirks can define new attributes after a device is already paired. We don't reinterview the device then, so whether the added attribute is shown or not might be inconsistent depending on when a device was paired. Lastly, many devices fail on cluster configuration, as they simply don't support some things. That's why it's a debug message at the moment. I'd keep it that way. | 
| 
 would this be a good use for the scan device functionality? we could schedule it post join and attempt it n many times until it succeeds or we give up and make it manually triggered in the UI? | 
| Yes its mostly for frontend. But i dont see why we read each attribute to find out if its supported instead of asking the device what is supported. | 
| 
 It would probably require some extra support for quirked devices with fake ZCL attributes. The device doesn't know we show them to zigpy/zha. | 
| 
 I agree with this. We just need to figure out where it makes sense to put it. @puddly's point on device pairing is spot on. Especially for battery powered devices. | 
| @TheJulianJES i think it is unlikely HUE would skip report those attributes in the discover call even at start. That list is very likely a static list in the firmware so using this service is better than looking at the response value of the attribute read. It is also much faster to use this service since it will report many services in one call compared to trying to read them. We could wrap the discover call in zigpy to take quirked attributes into account to force them to be included in the list. @puddly i could not find a place where we iterate over ALL attributes in the clusters to figure out what the device support. Doing that would also be really bad due to the number of requests. | 
This allow us to only display available attributes.
| Still some issues with tests. Some tests seem to be using same mock on multiple clusters. Either that or we are configuring same cluster multiple times for some reason. | 
| https://github.com/mdeweerd/zha-toolkit/blob/main/custom_components/zha_toolkit/scan_device.py We should implement something like this to handle this functionality in ZHA. We can potentially implement this either before or after the device has completed initial configuration and initialization. I'd suggest running it in a background task here if we want it before config / init: zha/zha/application/gateway.py Line 397 in e897841 or we can implement it here in a chained task if we think after makes more sense: zha/zha/application/gateway.py Line 422 in e897841 
 we also need to differentiate between a new join of a new device and a device just returning to the network (zigbee devices can leave and rejoin the network at will). The way this is currently being implemented here will affect new joins and rejoins. On rejoins there is no need to do this again. We also will need to figure out how to handle rescans post firmware upgrades where we should force a complete rediscovery and reconfiguration of the device as signatures (clusters, attrs etc) often change with firmware upgrades. Changing where this is being implemented will also give us finer grained control over logging so we can avoid the flying blind issue you experienced and we can suppress logs where we deem them not helpful. | 
| 
 If we want to do this before the device is fully initialized we may want to move this into Zigpy to avoid race conditions between events. not sure if this quoted suggestion would cause a race or not but it may be possible. Would really need to poke at it to be sure. | 
| 
 Debugged this... it's failing because @patch it patching request on the cluster class not the instances and as a result devices with identify clusters on multiple endpoints end up w/ more calls in the list. Context from docs: With this change disco tests pass: Anyways, I think we need to discuss where this functionality belongs as outlined above. | 
| So i've moved this to async_initialize, as far as i can tell that is only done when we initially join. I have some tests i need to figure out. I don't think we need to chain it to any other task? | 
| 
 Initialize is called at startup, integration reload, and device pairing. For example: https://github.com/elupus/zha/blob/744a031a47694c426b78bd467b63b373b3066a0b/zha/application/gateway.py#L363 I think we want this completely separate from device joining because this can fail and be run in the background until it works without impacting device functionality. | 
| 
 In theory it can effect function since some entities depend on which attributes are marked as supported (ie inverse of unsupported). I dont think it should be background..its a fast and quick thing as soon as you have connectivity. During pairing we should be in quick poll (are we?) on the devices so this should not have any major effect on startup. | 
| Codecov ReportAll modified and coverable lines are covered by tests ✅ 
 Additional details and impacted files@@            Coverage Diff             @@
##              dev     #101      +/-   ##
==========================================
+ Coverage   95.65%   95.66%   +0.01%     
==========================================
  Files          61       61              
  Lines        9255     9283      +28     
==========================================
+ Hits         8853     8881      +28     
  Misses        402      402              ☔ View full report in Codecov by Sentry. | 
| 
 Fast polling can only be enabled for devices that have a Poll Control cluster. I don't remember off the top of my head how common this is but it's definitely only something that started appearing in newer Zigbee 3.0 devices. Many devices don't support it and once joined, they enter long polling mode and are difficult to reach. We definitely should enable fast polling for a minute or two for any joining devices that support the cluster (and potentially every time we want to communicate with the device?). This needs to be done within zigpy, as it needs to happen as early as possible (possibly before this). This sort of requires a communication overhaul for zigpy: we need a more reliable way to actually communicate with sleepy end devices. Right now, we rely on radio retries and it is not reliable for devices that poll infrequently. | 

