Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android Port #1

Open
fqrouter opened this issue Mar 4, 2013 · 7 comments
Open

Android Port #1

fqrouter opened this issue Mar 4, 2013 · 7 comments

Comments

@fqrouter
Copy link

fqrouter commented Mar 4, 2013

Based on the theory provided by you guys, I have successfully made a PoC work on Android device using Python. Next step would be try to actually compile janus and really make it work on Android. The source code of PoC located at:

https://gist.github.com/fqrouter/5083321

@evilaliv3
Copy link
Owner

Thank you for sharing @fqrouter. Have you found Janus useful? Have you deployed it on some platform? I'm going to review and test your POC and to include it on ma in repository.

@fqrouter
Copy link
Author

fqrouter commented Mar 5, 2013

I found it very useful! Thank you! Are you still actively working on Sniffjoke? I am working on something very similar with Sniffjoke aimed to turning a Android mobile to a Anti-GFW device (A.K.A fqrouter). The mobile will serve as both STA and AP wifi, so it is a wifi repeater. Then Janus coming in with its unique feature, which allows me to get packet after it has been masqueraded by the iptables. Compared to NFQUEUE, there is no way to intercept the packet after it has been -j MASQUERADE. Relying on this feature, I can send the SYN through a secret tunnel, which will blind the GFW.
However, I found my python implementation is maxing out the CPU, profiling suggest the python version is taking a lot of time on parsing the packet to do checksum correction (The intercepted packets do not have correct checksum filled in). So I have two choices, make the C version working to see if it is faster. Or I can make the interception sele ctive. According to your experience, using the C version, the throughput is about how much less than original? Is it bound by the NIC sniffing packet, or it is bound by the CPU? Did you try to make the interception selective?

@evilaliv3
Copy link
Owner

This sounds really cool. Actually me and @vecna are working on some others projects (globaleaks/tor2web and others) and unfortunately human time is very limited. :) happy to know something is keeping the idea alive and it will be really cool if you can make Janus work on Android (i did it on a Fonera using OpenWRT but things were simpler) and i'm really interested to collaborate with you on this.

Yes our implementation is certainly more faster due to the use of libevent and libpcap but it's also CPU intensive like your. regardless this the throughput is quite the same.

in a first implementation I did some test by selectively intercept packet by ip address but with this simply trick (based on arp) you can't do nothing more. in an earlier version we did use a different trick based on tun devices: by using a tun and some iptables roules we acted as a fake vpn to permit us to mangle outgoing and incoming packets. probably you could try also this? (you will find this trick in previous versions of Janus and of SniffJoke up to 2.0)

Where can i found some documentation about fqrouter, the Anti-GFW? is this a public project and what are the use cases?

@fqrouter
Copy link
Author

fqrouter commented Mar 5, 2013

Thanks for your rely! So, you say your implementation is also CPU intensive, I guess it is because the checksum calculation as well? I can not figure out a way to send packet at layer 2 still offload checksum to NIC.
I was also thinking about tun + iptables + mark + adv routing.
My use case is Anti-GFW. GFW stands for Great Firewall of China, which is well understood by the public domain now. It is a opensource project, but all the stuff I've written is in Chinese. The basic theory is through packet mangling, we can let GFW as NDIS can not rebuild the stream correctly, or drop fake packets injected by GFW. The primary techniques are:

  1. Drop fake dns answer, so the right answer will be accepted
  2. intercept SYN and send it through some secret tunnel (such as via gtalk), this way GFW will not start listening if SYN is missing
  3. inject tcp packet with wrong data but same seq using short TTL, this will blink GFW as well.
    I tested all of them on Linux desktop. If I can make this working on a android mobile running as wifi repeater, then we will be able to carry it around to provide service to other devices.

@evilaliv3
Copy link
Owner

:) thanks. your project is really interesting and respectable for me and I want to introduce you to my group: logioshermes.org.

we are effectively involved in some projects related to transparency and censorship evasion also through some research fundings and it would be realliy interesting to collaborate with you (for example working by providing you our tecnical skills, promoting the project or working on an effective case study); you find some ways to contact me here: https://www.evilaliv3.org

@vecna
Copy link
Contributor

vecna commented Mar 6, 2013

Hi Qin Fen,

I'm still thinking about the exposed analysis (the 1..3 points), and that would be a nice discussion, but it's not the actual topic. I simply don't believe that IP/TCP checksum may bring CPU so exausted, if you try to ignore checksum (and yes, having your packet discarded during the test) I'm quite sure you can't perceive a performance improve.

checksum computation is a linear complexity operation, it's present, need to be kept in account, but the processors on the mobile device are still strong. I believe that the main performance issue based on the "divert hack" composed by "tun+ipt+mark+adv" cause a tons of context switching between user an kernel, and the amount of copy between those two memories became

Normal TCP/IP Stack

  • a packet is write in a socket file descriptors, with user-kernel copy
  • a packet is sent
  • a packet is receiver by the kernel, trigger event or fill blocking operation, performing kernel-user copy.
  • a packet is read by app

net of 2 context switching

Divert hack

  • a packet is write in a socket file descriptors, with user-kernel copy
  • a packet is sent by the kernel and routed in a virtual device
  • a packet is receiver by the virtual device in kernel, trigger event or fill blocking operation, performing kernel-user copy.
  • the packet is read by the app, modified or kept
  • the packet is written on a raw socket, performing user-kernel copy
  • (only the sending operation contains 3 context switching)
  • a packet is receiver by the kernel and dropped by the firewall
  • a listening datalink layer socket is reading all the packets (performing context switch kernel-user)
  • the application loop over those packets, performing selection and sometimes modification,
  • all the packets need to be rewritten in a localhost socket (user-kernel context switch)
  • all the packets are receiver by the associated application (kernel-user context switch)
  • (receiving operation, 3 context switching)

net of 6 context switch for I/O, with some O(N) complexity like packet analysis.

So, my final analysis, is that if the hacks you're planned do not require flow desync but simply injection and packet fragmenation, you can avoid the divert in the "incoming packet" section, and this may bring to 4 context switching. (3 output, 1 input)

btw, I believe the GFW bypassing techniques you're planning (using gtalk in example), may makes the software a little hardly to escale in the "non hackish people", I appreciate the idea anyway, and if you've some analysis or PoC of this mechanism, I'm glad to analyze and comment them.

@fqrouter
Copy link
Author

fqrouter commented Mar 6, 2013

Your analysis is right, the majority time should be spent on the context switching. What I observed is, when I just intercept the upcoming traffic, given I am testing using large file download, the upcoming traffic is really slow, packet being very small. The original checksum was done using scapy, which leads to high cpu usage, not just the checksum calculation itself, but also the heavy-weight upacking and packing done by scapy. Then, I switched from scapy to dpkt, which has a much faster upacking and packing process, the cpu usage is lower than before. Comparing the two results, the dpkt version is a lot faster, in terms of downloading speed.
But if I intercept both side traffic using large file download as test, the majority of the time is not spent on the python side, but more on waiting for the socket. I suspect, large file download during the divert hack will cause a lot of data copying from kernel to userland in a awkward way, which might be slow. However, I just did a python profiling, so I can only tell the time is spent on waiting for the socket. I do not have the skill to go down deeper to find out what the kernel is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants