Live-blog from CoNEXT 2013 - day 2

Wed Dec 11 2013 00:00:00 GMT+0000 (GMT) ( CoNEXT )

Good morning! Today is the second day of CoNEXT 2013 and the sessions today are about trains, wires, tools and falling water. I will be blogging some of the sessions again today, and as before the official conference liveblog is on layer9.org.

Session 5: Trains, Lanes and Autobalancing

I skipped the session to patch my own presentation using my colleagues' feedback, but these are the presented papers:

Bullet Trains: A study of NIC burst behavior at microsecond timescales

Rishi Kapoor (UC San Diego), Alex C. Snoeren (UC San Diego), Geoffrey M. Voelker (UC San Diego), George Porter (UC San Diego)

FasTrak: Enabling Express Lanes in Multi-Tenant Data Centers

Radhika Niranjan Mysore (UC San Diego), George Porter (UC San Diego), Amin Vahdat (UC San Diego and Google)

Siddhartha Sen (Princeton University), David Shue (Princeton University), Sunghwan Ihm (Princeton University), Michael J. Freedman (Princeton University)

Session 6: Less Wires, More Movement

I am blogging it on layer9.org, but copying the posts here as well:

SoftCell: Scalable and Flexible Cellular Core Network Architecture

Presenter: Xin Jin

Authors: Xin Jin (Princeton University), Li Erran Li (Bell Labs), Laurent Vanbever (Princeton University), Jennifer Rexford (Princeton University)

Cellular core networks are not flexible and most of functionality is implemented at packet data network gateway (content filtering, app identification, firewalls, etc). Combining functionality of different components from different vendors is not feasible.

The main question: can we make cellular networks like data center networks? Yes, with softcell.

SoftCell runs on commodity hardware as a controller for all the different components. Challenge: scalable support of fine-grained service policies, e.g. combination of various filters, firewalls QoS guarantees. Packet classification has to work with millions of flows. To scale it up, packets are classified at the edge and encoded in source port / destination port because classification at the edge gateway does not scale. Classification is then also piggybacked on destination address/port.

For traffic steering using the policies, aggregate across multiple dimensions: policy id, base station id and user equipment id. The forwarding is then simple based on just the tags.

Control plane load can also be a problem, therefore use a Local Agent at each base station.

Using LTE workload characteristics, each basestation handles a few hundreds of UE arrivals and handoffs, which are sufficiently low for the approach to scale. Commodity switches also have enough memory for policy storage, because generally there are not many.

Overall the softcell architecture runs on commodity hardware sufficiently well for existing workloads.

Q: what is fundamental to cellular networks with respect to traffic steering? What is different from

A: Dominant traffic pattern is North-South, between clients and gateways and few gateway switches.

Q: In cellular networks user traffic is tunnelled using GTP, so there is already aggregation - why can you not just use GTP headers?

A: For different policies you want to route to different middleboxes

Robust Assessment of Changes in Cellular Networks

Presenter: Ajay Mahimkar

Authors: Ajay Mahimkar (AT&T Labs - Research), Zihui Ge (AT&T Labs - Research), Jennifer Yates (AT&T Labs - Research), Chris Hristov (AT&T Mobility Services), Vincent Cordaro (AT&T Mobility Services), Shane Smith (AT&T Mobility Services), Jing Xu (AT&T Mobility Services), Mark Stockert (AT&T Mobility Services)

Changes in the sense of software upgrades, configuration, hardware deployment... The question is how those affect user perception of service quality - accessibility, retainability, throughput, minutes of usage, etc. But no lab can fully replicate scale, complexity and diversity of operational networks! To measure these details also need to take into account external factors - seasonal changes (leaves on trees!), weather (worst when coincide with configuration changes), traffic pattern changes, other network events such as outages...

Idea: Litmus - compare performance between study and control group:

Going to discuss the methodology of selection of the groups. Spatial regression in study and sampled control group to learn the coefficients and compare the differences before a change and after it. Using domain knowledge to select control group: select control group subject to same external factors and sharing similar properties with study group. Geo distance, topological structure of the cell net, etc.

Evaluation: Litmus outperforms study-group only analysis because of robustness to external factors. Also outperforms Difference in Differences analysis. Some operational experiences: self optimizing network doing automated load balancing, neighbor discovery, etc. - how did it perform during hurricane Sandy. Both study and control group were impacted due to Sandy (everything went down!), but study group did better than control - faster recovery, so the feature was rolled out network-wide.

Q: There is a way to do A/B testing in offline manner if you log enough data without actually running experiments - do you think it > might be applicable?>

A: Definitely would be, but the question is how to select the control group - e.g. completely random selection might not work

Q: How do you identify external factors?

A: It is very hard, but with this analysis you don't need to know what external factor is there, just automatically discount their impact. But external factor identification is not plausible without additional information.

3GOL: Power-boosting ADSL using 3G OnLoading

Presenter: Vijay Erramilli

Authors: Claudio Rossi (Politecnico di Torino), Narseo Vallina-Rodriguez (Univ. of Cambridge), Vijay Erramilli (Telefonica Research), Yan Grunenberger (Telefonica Research), Laszlo Gyarmati (Telefonica Research), Nikolaos Laoutaris (Telefonica Research), Rade Stanojevic (Telefonica Research), Konstantina Papagiannaki (Telefonica Research), Pablo Rodriguez (Telefonica Research)

The authors propose a very innovative idea: selectively onload a part of the traffic from the wired network onto the wireless network in order to overcome slow ADSL connections at home. Indeed, the authors propose to onload rather offload data to the cellular network!

Of course there are many challenges to be faced and the authors prove the feasibility of the proposed solution by measuring the capacity available on the cellular network, by trace driven analysis and by studying the problem of data caps on cellular data connections.

A fully application level prototype is implemented and evaluated in the wild in residential environment with two bandwidth hungry applications: VoD and picture upload. Up to a 4x speedup is obtained for downlink and up to 6x speedup for uplink, which demonstrate the great potential of this proposal, which is fully over-the-top. The solution is also compared with MPTCP achieving better result.

However, a network integrated solution is needed to apply this proposal at scale without harming the existing traffic on the cellular network.

Q: As a consumer - the reason people don't go high on the cap is that they get nervous. So as a user I'll start getting higher in my cap and...

A: In the paper we propose a simpler estimator and a guard parameter that the

Q: when we did experiments on 3G HSDP, we noticed that the network splits the bandwidth across multiple users and I assume similar scheme for uplink. Do you use multiple operators?

A: All users were only using one network and in most cases on a single cell and we still saw improvements

Q: can you give some information about the way you do scheduling between interfaces?

A: we use a greedy scheduler, only focusing on HTTP.

Capturing Mobile Experience in the Wild: A Tale of Two Apps

Presenter: Ashish Patro

Authors: Ashish Patro (University of Wisconsin Madison), Shravan Rayanchu (University of Wisconsin Madison), Michael Griepentrog (University of Wisconsin Madison), Yadi Ma (University of Wisconsin Madison), Suman Banerjee (University of Wisconsin Madison)

The author study the deployment and usage of mobile application in order to understand the factors that impact the application experience. How a developer can capture that across all users?

The authors propose to use application as a vantage point. Developer can include a toolkit (including a library) that provides an API to get contextual measurements, device and user infos, and network state.

They study 2 applications: a MMORPG, named Parallel Kindom) (PK) and StudyBlue (SB). For example, the impact of device on user interactivity is studied (user actions/times) showing that this increases with screen size. Moreover, battery consumption was higher with cellular with respect to wi-fi and it has an effect also on the session length, which was lower with higher battery consumption.

Furthermore, the impact on network performance of the toolkit are studied in terms of latency ans cellular usage. For example, the interactivity is sensitive to latency, and decreases with latency increase.

Other measurements are shown along with their impact on application usage, including revenue! This new toolkit can give great insight for developers! Check-it out.

Q: Would it be possible to colocate your insight server with the app server?

A: Definitely, but we prefered to keep it module to reduce developer effort

Q: is privacy a concern at all?

A: Users are informed about the type of data collected

Q: how is online profiling itself impacting battery life?

A: In this case we have some performance overhead numbers in the paper, but it was not more than 2%. The measurements were very lightweight and infrequent - mostly values reported by the OS.>

Q: so how do you attribute energy consumption to apps?

A: (some clever filtering which was not clear, I assume more details in the paper)

Q: most of the mobile handsets have system on chip, so you can't easily isolate different resources and each has different mechanisms - > have you looked at this issue?

A: we did not look at very low level details because we do not have access to them from the framework. We could also do some > crowdsourcing to built up better models.

Q: What are the activities in the app that were meant to generate revenue?

A: "food" in the application that can be bought for real money and hence the food consumption is a direct proxy of revenues

Q: What part of what you built google analytics would tell me?

A: GA more about business analytics, while we focused more on performance

Silent TCP Connection Closure for Cellular Networks

Presenter: Feng Qian

Authors: Feng Qian (AT&T Labs - Research), Subhabrata Sen (AT&T Labs - Research), Oliver Spatscheck (AT&T Labs - Research)

Talking about the problem of reusing TCP connections and closing them after a timeout when device radio is asleep. Proposes STC - silent TCP connection closure: client and server negotiate a timeout value for connection closing and after a timeout period + guard period both ends close the connection silently, without exchanging FINs. Also backward-compatible, but needs to modify the TCP stack, socket API and the client applications.

Trace-driven evaluation - 10 days, LTE, 600K user sessions. Overall radio energy saving is 11.3% and reduces signaling by 6% - just looking at late FIN packets.

Q: I'm wondering if your problem is solved if you go back to HTTP 1.0?

A: Immediate close

Q: so you might be changing semantics and you might have many problems..

Q: what is the implication on SPDY e.g. where connections will be maintained for a long time?

A: the delayed closure program still exists, even if mitigated

Q: why do you think you have those timeouts even e.g. with Google's apps, when they control both sides? What we've seen is that in 95% > of cases there is no reuse

A: (missed the question - taking offline)

Q: security implications?

A: by injecting FINs you can already force-close connections, so there is no difference

Session 7: Tools

Direct Code Execution: Revisiting Library OS Architecture for Reproducible Network Experiments

Hajime Tazaki (University of Tokyo, Japan), Frederic Urbani (INRIA, France), Emilio Mancini (INRIA, France), Mathieu Lacage (Alcmeon, France), Daniel Camara (INRIA, France), Thierry Turletti (INRIA, France), Walid Dabbous (INRIA, France)

Presented by Hajime Tazaki

Container based emulation [CoNEXT '12] provides lightweight emulation, but timing is still limited by hardware resources and there is no support for debugging. There are several proposals introducing virtual time to address the problem... Network simulators are not completely realistic functionally. Motivation is therefore to improve on the two issues using "Direct Code Execution"

The key point is running real code (POSIX apps, kernel network stacks), integrating with ns-3 to provide virtual clock and improve debuggability by running everything in userspace.

Got lost in the details about the architecture.

To verify reproducibility, picked a previous paper from NSDI and replicated goodput measurement with their proposed DCE + ns-3 (LTE/Wi-FI), Linux MPTCP and iperf.

Q: I'd like to thank you for this nice work, because there is a lot of system work behind this. Something you did not mention in the paper is whether there are any drawbacks?

A: Sometimes it requires protocol implementation modification...

Q: NS3 includes network simulation cradle, which allows to include real TCP stacks

A: That is an ancestor of our work, so we use some of the ideas. The objectives and goals are almost the same

Pythia: Yet Another Active Probing Technique for Alias Resolution

Pietro Marchetta (University of Napoli Federico II), Valerio Persico (University of Napoli Federico II), Antonio Pescapè (University of Napoli Federico II)

Presented by Pietro Marchetta

Accurate knowledge of the Internet topology is essential for deep understanding of the ecosystem, but ISPs hide the data. Traceroute only provides IP-level view, but for performing robustness analysis need to perform alias resolution to group addresses owned by the same router.

All alias resolution techniques are imperfect, but IP timestamp probing has been explored before. Order matters [Sherry10], multiple ICMP echo request packets to different addreses. Use this fact and UDP timestamp option to infer Alias.

Evaluation: ground ruth obtained with IGMP probing, compare against "Motu" and "Palmtree" - able to classify higher proportion of addresses at similar accuracy, slightly lower than Motu.

Q: can you tell if a router is pair-interfaced IP router?

A: yes, an initial step based on the number of collected timestamps

Q: how often does ICMP response not come back?

A: timestamp option has an impact on responsiveness, but it also depends on vantage point and type of probes. ICMP has 30% reduction of responsiveness

An Automated System for Emulated Network Experimentation

Simon Knight (University of Adelaide/Cisco Systems), Hung X Nguyen (University of Adelaide), Iain Phillips (Loughborough University), Olaf Maennel (Loughborough University), Randy Bush (IIJ), Nickolas Falkner (University of Adelaide), Matthew Roughan (University of Adelaide)

Inferring Multilateral Peering Agreements

Vasileios Giotsas (University College London), Shi Zhou (University College London), Matthew Luckie (CAIDA / UC San Diego), KC Claffy (CAIDA / UC San Diego)