Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: request failure during deployment switch mid iteration still left the EVC active; it didn't rollback #555

Open
viniarck opened this issue Oct 31, 2024 · 0 comments
Labels

Comments

@viniarck
Copy link
Member

viniarck commented Oct 31, 2024

Context: I was verifying if mef_eline was fully handling the error in the following case when trying to deploy a path:

  • It succeeds removing the current flows with force True, flow_manager still responsive when handling the request.
  • But then keep a middle switch of the patch not connected to simulate a connection error, at this point mef_eline should undo/remove this path since it failed mid deployment iteration
  • Finally, it was expected that mef_eline would remove the path, but it didn't since it the first step current_path was set to empty. So, it left the EVC as active and didn't rollback

The potential problem when this case happens seems two-fold:

  1. deploy_to_path on this line, should probably be using remove_path_flows(use_path), which was the path that failed mid deployment, plus maybe no harm still also calling remove_current_flows() for extra good measure
  2. Once the prior step is fixed, then when removing flows here the request can fail again, let's say when flow_manager returns HTTP 500 during a server request overload, very rare and our NApps are behaving well but it can happen. In this case, it should also deactive the EVC and follow one of the approaches of error handling async/consistent recovery that's being documented on Tackling epic_mef_eline_flows_error_handling #515

How to reproduce

  • Use the a ring topology

  • Create an EVC with the following payload (save it as epl_static_payload.json):

cat epl_static_payload.json | http http://localhost:8181/api/kytos/mef_eline/v2/evc/                         
{
    "name": "epl_static",
    "dynamic_backup_path": false,
    "uni_a": {
        "interface_id": "00:00:00:00:00:00:00:01:1"
    },
    "uni_z": {
        "interface_id": "00:00:00:00:00:00:00:03:1"
    },
    "primary_path": [
        {
            "endpoint_a": {
                "id": "00:00:00:00:00:00:00:01:3"
            },
            "endpoint_b": {
                "id": "00:00:00:00:00:00:00:02:2"
            }
        },
        {
            "endpoint_a": {
                "id": "00:00:00:00:00:00:00:02:3"
            },
            "endpoint_b": {
                "id": "00:00:00:00:00:00:00:03:2"
            }
        }
    ]
}
  • Temporarily change [this line to self.remove_current_flows(force=False, sync=True)]https://github.com/kytos-ng/mef_eline/blob/master/models/evc.py#L885) on mef_eline. The force=False here is just to facilitate to simulate as if flow_manager would return error in its request, it'll return 424 if the switch isn't connected, but since an error HTTP 500 is handled similarly we can use this to our favor to simulate it easily.

  • Set switch s3 as disconnected sudo ovs-vsctl set-controller s3 tcp:127.0.0.1:6654

  • Restart kytosd, just so the temporary code change you made on the prior step gets loaded and will be ready to fail when flows gets pushed to switch s3

  • Request a EVC redeploy http PATCH http://0.0.0.0:8181/api/kytos/mef_eline/v2/evc/<evc_id>/redeploy

  • Notice that request will be retried, will fail, but it won't completly rollback and keep the EVC active

Here's the entire example:

❯ cat epl_static_payload.json | http http://localhost:8181/api/kytos/mef_eline/v2/evc/                        
HTTP/1.1 201 Created
content-length: 47
content-type: application/json
date: Thu, 31 Oct 2024 14:31:57 GMT
server: uvicorn

{
    "circuit_id": "e3a4db1490244c",
    "deployed": true
}



❯ sudo ovs-vsctl set-controller s3 tcp:127.0.0.1:6654


❯ http PATCH http://0.0.0.0:8181/api/kytos/mef_eline/v2/evc/e3a4db1490244c/redeploy
HTTP/1.1 409 Conflict
content-length: 50
content-type: application/json
date: Thu, 31 Oct 2024 14:33:26 GMT
server: uvicorn

{
    "response": "Circuit e3a4db1490244c is disabled."
}

2024-10-31 11:32:20,732 - INFO [kytos.core.atcp_server] (MainThread) Connection lost with client 127.0.0.1:50990. Reason: Request closed by client
2024-10-31 11:32:36,849 - INFO [kytos.napps.kytos/of_lldp] (MainThread) LLDP loop stopped on switch: 00:00:00:00:00:00:00:03, interfaces: ['s3-eth5', 's3-eth6'], port_numbers: [5, 6]
kytos $> 

2024-10-31 11:33:26,947 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:02, command: delete, force: True, total_length
: 1,  flows[0, 1]: [{'cookie': 12313867067010851916, 'cookie_mask': 18446744073709551615, 'owner': 'mef_eline'}]
2024-10-31 11:33:26,954 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51034 - "POST /api/kytos/flow_manager/v2/delete/00%3A00%3A00%3A00%3A00%3A00%3A00%3A02 HTTP/1.1" 202
2024-10-31 11:33:26,961 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:01, command: delete, force: True, total_length
: 1,  flows[0, 1]: [{'cookie': 12313867067010851916, 'cookie_mask': 18446744073709551615, 'owner': 'mef_eline'}]
2024-10-31 11:33:26,968 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51042 - "POST /api/kytos/flow_manager/v2/delete/00%3A00%3A00%3A00%3A00%3A00%3A00%3A01 HTTP/1.1" 202
2024-10-31 11:33:26,975 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:03, command: delete, force: True, total_length
: 1,  flows[0, 1]: [{'cookie': 12313867067010851916, 'cookie_mask': 18446744073709551615, 'owner': 'mef_eline'}]
2024-10-31 11:33:26,980 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51054 - "POST /api/kytos/flow_manager/v2/delete/00%3A00%3A00%3A00%3A00%3A00%3A00%3A03 HTTP/1.1" 202
2024-10-31 11:33:26,989 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:02, command: add, force: False, total_length: 
2,  flows[0, 2]: [{'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'output', 'port': 3}], '
owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}, {'match': {'in_port': 3, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'set_v
lan', 'vlan_id': 1}, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2024-10-31 11:33:26,996 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51058 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A02 HTTP/1.1" 202
2024-10-31 11:33:27,010 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:01, command: add, force: False, total_length: 
2,  flows[0, 2]: [{'match': {'in_port': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'actio
n_type': 'output', 'port': 3}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 3, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'a
ctions': [{'action_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2024-10-31 11:33:27,025 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51060 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A01 HTTP/1.1" 202
2024-10-31 11:33:27,035 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:03, command: add, force: False, total_length: 
2,  flows[0, 2]: [{'match': {'in_port': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'actio
n_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'a
ctions': [{'action_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2024-10-31 11:33:27,041 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51070 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A03 HTTP/1.1" 424
2024-10-31 11:33:27,041 - WARNING [kytos.core.retry] (AnyIO worker thread) Retry #1 for _send_flow_mods, args: ('00:00:00:00:00:00:00:03', [{'match': {'in_port': 1}, 'cookie': 123138670
67010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_gr
oup': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'pop_vlan'}, {'action_type': 'output'
, 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]), kwargs: {}, seconds since start: 0.01
kytos $> 

kytos $> 2024-10-31 11:33:33,655 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:03, command: add, force: False, total
_length: 2,  flows[0, 2]: [{'match': {'in_port': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}
, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 123138670670108
51916, 'actions': [{'action_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2024-10-31 11:33:33,662 - INFO [uvicorn.access] (MainThread) 127.0.0.1:50576 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A03 HTTP/1.1" 424
2024-10-31 11:33:33,664 - WARNING [kytos.core.retry] (AnyIO worker thread) Retry #2 for _send_flow_mods, args: ('00:00:00:00:00:00:00:03', [{'match': {'in_port': 1}, 'cookie': 123138670
67010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_gr
oup': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'pop_vlan'}, {'action_type': 'output'
, 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]), kwargs: {}, seconds since start: 6.64
kytos $> 

kytos $> 2024-10-31 11:33:41,762 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request dpid: 00:00:00:00:00:00:00:03, command: add, force: False, total
_length: 2,  flows[0, 2]: [{'match': {'in_port': 1}, 'cookie': 12313867067010851916, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}
, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 123138670670108
51916, 'actions': [{'action_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2024-10-31 11:33:41,768 - INFO [uvicorn.access] (MainThread) 127.0.0.1:50340 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A03 HTTP/1.1" 424
2024-10-31 11:33:41,769 - ERROR [kytos.napps.kytos/mef_eline] (AnyIO worker thread) Error deploying EVC EVC(e3a4db1490244c, epl_static) when calling flow_manager: In 00:00:00:00:00:00:0
0:03, {"description":"switch 00:00:00:00:00:00:00:03 isn't connected","code":424}
2024-10-31 11:33:41,769 - INFO [uvicorn.access] (MainThread) 127.0.0.1:51030 - "PATCH /api/kytos/mef_eline/v2/evc/e3a4db1490244c/redeploy HTTP/1.1" 409

Before deployment 6 flows:

rs0 [direct: primary] napps> db.flows.find({"flow.cookie": Decimal128("12313867067010851916"), "state": "installed"}).count()
6
rs0 [direct: primary] napps> 

After deployment 4 installed flows (sw3 flows still pending, consistency would fix it, but still at it is it left the current path)

rs0 [direct: primary] napps> db.flows.find({"flow.cookie": Decimal128("12313867067010851916"), "state": "installed"}).count()
4
rs0 [direct: primary] napps> 

Here's the EVC still active despite a failed deployment:

❯ http http://0.0.0.0:8181/api/kytos/mef_eline/v2/evc/e3a4db1490244c/        
HTTP/1.1 200 OK
content-length: 4390
content-type: application/json
date: Thu, 31 Oct 2024 14:37:59 GMT
server: uvicorn

{
    "active": true,
    "archived": false,
    "backup_links": [],
    "backup_path": [],
    "bandwidth": 0,
    "circuit_scheduler": [],
    "creation_time": "2024-10-31T14:31:58",
    "current_path": [
        {
            "active": true,
            "enabled": true,
            "endpoint_a": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:01:3",
                "link": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
                "lldp": true,
                "mac": "c6:36:74:db:08:ac",
                "metadata": {},
                "name": "s1-eth3",
                "nni": true,
                "port_number": 3,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:01",
                "type": "interface",
                "uni": false
            },
            "endpoint_b": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:02:2",
                "link": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
                "lldp": true,
                "mac": "82:2b:28:da:b7:92",
                "metadata": {},
                "name": "s2-eth2",
                "nni": true,
                "port_number": 2,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:02",
                "type": "interface",
                "uni": false
            },
            "id": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
            "metadata": {
                "s_vlan": {
                    "tag_type": "vlan",
                    "value": 1
                }
            },
            "status": "UP",
            "status_reason": []
        },
        {
            "active": true,
            "enabled": true,
            "endpoint_a": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:02:3",
                "link": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
                "lldp": true,
                "mac": "36:49:e6:8a:3c:10",
                "metadata": {},
                "name": "s2-eth3",
                "nni": true,
                "port_number": 3,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:02",
                "type": "interface",
                "uni": false
            },
            "endpoint_b": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:03:2",
                "link": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
                "lldp": true,
                "mac": "3a:e0:7c:e8:9e:3a",
                "metadata": {},
                "name": "s3-eth2",
                "nni": true,
                "port_number": 2,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:03",
                "type": "interface",
                "uni": false
            },
            "id": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
            "metadata": {
                "s_vlan": {
                    "tag_type": "vlan",
                    "value": 1
                }
            },
            "status": "UP",
            "status_reason": []
        }
    ],
    "dynamic_backup_path": false,
    "enabled": true,
    "end_date": null,
    "execution_rounds": 0,
    "failover_path": [],
    "flow_removed_at": null,
    "id": "e3a4db1490244c",
    "metadata": {},
    "name": "epl_static",
    "owner": null,
    "primary_constraints": {},
    "primary_links": [],
    "primary_path": [
        {
            "active": true,
            "enabled": true,
            "endpoint_a": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:01:3",
                "link": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
                "lldp": true,
                "mac": "c6:36:74:db:08:ac",
                "metadata": {},
                "name": "s1-eth3",
                "nni": true,
                "port_number": 3,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:01",
                "type": "interface",
                "uni": false
            },
            "endpoint_b": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:02:2",
                "link": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
                "lldp": true,
                "mac": "82:2b:28:da:b7:92",
                "metadata": {},
                "name": "s2-eth2",
                "nni": true,
                "port_number": 2,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:02",
                "type": "interface",
                "uni": false
            },
            "id": "78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0",
            "metadata": {
                "s_vlan": {
                    "tag_type": "vlan",
                    "value": 1
                }
            },
            "status": "UP",
            "status_reason": []
        },
        {
            "active": true,
            "enabled": true,
            "endpoint_a": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:02:3",
                "link": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
                "lldp": true,
                "mac": "36:49:e6:8a:3c:10",
                "metadata": {},
                "name": "s2-eth3",
                "nni": true,
                "port_number": 3,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:02",
                "type": "interface",
                "uni": false
            },
            "endpoint_b": {
                "active": true,
                "enabled": true,
                "id": "00:00:00:00:00:00:00:03:2",
                "link": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
                "lldp": true,
                "mac": "3a:e0:7c:e8:9e:3a",
                "metadata": {},
                "name": "s3-eth2",
                "nni": true,
                "port_number": 2,
                "speed": 1250000000.0,
                "status": "UP",
                "status_reason": [],
                "switch": "00:00:00:00:00:00:00:03",
                "type": "interface",
                "uni": false
            },
            "id": "4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30",
            "metadata": {
                "s_vlan": {
                    "tag_type": "vlan",
                    "value": 1
                }
            },
            "status": "UP",
            "status_reason": []
        }
    ],
    "queue_id": -1,
    "request_time": "2024-10-31T14:31:58",
    "sb_priority": null,
    "secondary_constraints": {},
    "service_level": 0,
    "start_date": "2024-10-31T14:31:58",
    "uni_a": {
        "interface_id": "00:00:00:00:00:00:00:01:1"
    },
    "uni_z": {
        "interface_id": "00:00:00:00:00:00:00:03:1"
    },
    "updated_at": "2024-10-31T14:31:58"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant