Migrate DDoS Defense to the Cloud: eBay’s Journey With Cloud Armor and Reblaze  (Cloud Next ’19)
Articles,  Blog

Migrate DDoS Defense to the Cloud: eBay’s Journey With Cloud Armor and Reblaze (Cloud Next ’19)


[MUSIC PLAYING] EMIL KINER: Thank you
for joining us today. My name is Emil Kiner. I see some people are filtering
in, but I’ll get started. I’m a product manager
on the networking team. And today we’ll be talking
about DDoS protection as well as a web application firewall. And we’ll talk
about eBay’s journey and how they’ve
migrated and evaluated the GCP network
and our abilities to protect from
application attacks. I’m joined today
by Derek Chamorro. He’s a security
architect at eBay. And I’m also joined by
Tzury Bar Yochay, the CTO and founder of Reblaze. They’ll be both talking
about their solutions and how they’ve
integrated with GCP. So today, what
we’re going to do is I’ll go over and give a
kind of brief introduction about the Google
network and talk about its scale and the
infrastructure that supports it, and then we’ll dive
into a little bit more about our Cloud Armor solution,
which is our DDoS protection service, as well as a
web application firewall. We’ll go over some customer
use cases and most typical use patterns. And after that we’ll hand it off
to Tzury to talk about Reblaze and how they’ve
integrated with GCP. And after that, eBay
Derek will come onstage and talk about some
penetration and stress testing that they’ve done
on the platform to evaluate– to migrating
eBay workflows to GCP. And I hear they got
some good results. We’ll have plenty of time
for questions at the end. And when we do, we ask that
you use the microphones here in the middle for questions. OK. So Cloud Armor is
deployed actually at the edge of Google’s
global network. It’s part of our globe
balancing infrastructure. Before I begin, I do
want to kind of give an overview of what our
network looks like to give you an idea of the scale and scope. And it goes a long
way in explaining in how we’re able to do
the protections that we are able to do. So we actually have
19 regions or data centers distributed globally. You can see here in blue. We also have a future
ones coming in, and those are the white dots. In addition, we have
another 134 edge POPs, or points of presence. These are co-located facilities
within our ISP partners’ data centers. And what that means is we’re
able to receive client traffic and put them on
the Google network as close as possible
to the clients. So once a request is
made to a Google IP, it gets routed to
the nearest edge POP, and then it’s on
Google’s backbone instead of being routed
through the public internet. Speaking of the backbone,
we have 13 subsea cable investments that are either
wholly or partially owned by us. All right. Those are all the blue lines
that are being strung together. On a daily basis, we move– the amount of data that we
move across our backbone is orders of magnitude
larger than the data that traverses the public internet. What that means is
Google’s network capacity is so large that we are
actually able to effectively absorb and dissipate most of
the most common distributed denial of service
attacks without impacting any availability or reliability. So most of our
customers don’t even know that they’ve been attacked. Like on other clouds, security
is a shared model on GCP. And when it comes to network
security and security in general, we work
to secure our networks and infrastructure, and
we provide all the tools and capabilities
for customers to do the same in their environments. In fact, we enable and
encourage customers to follow a defense
in depth approach, where you deploy various
security solutions at various stages of the
stack that you can see here– oh, I’m kind of in the way. And at every point
in the stack, we also have room for partner
solutions to deliver capabilities that may be
more specialized than we offer natively. Today we’ll be talking about
Cloud Armor, up at the top end, for application protection
as well as our global load balancing infrastructure. But I will start off
with kind of giving an overview of the network
security controls that we have. All right. So we have three different types
of network security controls that customers often
deploy in unison. In red, we’ve got VPC firewalls. These are your traditional
firewalls using layer 3, layer 4 semantics. You would use these to define
the perimeter of a VPC, north, south, as
well as east, west, but you can even get
as granular as per VM. So you can use VPC firewalls
to control and define ACLs on a per VM basis if necessary. They have an analog in blue
called VPC Service Controls. VPC Service Controls allow
you to define effectively a virtual perimeter
around GCP-based services like managed APIs, things like
Google Cloud Storage, BigTable, BigQuery. The point is you can use VPC
Service Controls to ensure that only authorized users
only from authorized locations using only authorized
machines can access or receive the data within those
API-based services to mitigate things
like data exfiltration and really just make sure that
only permitted functions are executing on your resources. And finally, in green,
we’ve got Cloud Armor, which is instrumented in part
of our global load balancing infrastructure. And that’s able to
filter and drop traffic coming from the public
internet into GCP. So as I’ll dive in
a little deeper, we’re able to do layer 3
through layer 7 filtering. And I’ll kind explain
how that works. Yeah. Before doing that, understanding
how our global load balancing infrastructure
works is valuable. So the GCLB is not a DNS-based
global load balancer. As many of you know, we actually
publish in any cast VIP. And this virtual IP is
accessible from anywhere in the world. And when a client coming
from California, or New York, or Singapore goes and
accesses that same IP, the edge POP location
nearest to them will receive that
traffic, all right, and will then filter
that traffic– will enter that traffic onto
the Google’s network, where it will be routed to the
nearest region wherever your applications
may be deployed. So if you only have one– your application is only
hosted in one region, that’s where the traffic will go. But if you have
your applications across different regions
distributed globally, then we will route the
traffic where necessary. So our load balancers
are SSL terminating. All right. And what that means is
customers can do SSL offload. You upload your certificates
into your global load balancer instance, and then we decrypt
the traffic on the way in, and then we’ll
re-encrypt it on the way back out to your back end. But in that small
sliver of space where it is in
plain text, that’s where Cloud Armor is actually
able to do the inspection and evaluate and
drop the traffic. Similarly, other
points in the stack will filter out and drop
bad traffic, however that’s defined. Layer 3, layer 4
attacks, malformed packets and things like
that will get scrubbed out at various stages of
the stack, all of which happens way upstream of
the customer back ends. So as I said, Cloud Armor is
a part of our load balancing infrastructure. Together with HTTP load
balancer and Cloud Armor, customers are
automatically protected from the most common types of
distributed denial of service attacks, things like SYN floods,
ACK floods, DNS amplifications. These are the types of attacks
that hit the news most common, most often. These are the ones that have
terabyte level bandwidth usage and things like that. And all that is protected
and blocked automatically. Like I said earlier,
most of our customers don’t even know that
they’ve been attacked. And we are working on
exposing that visibility, so in the future you will
be able to get a report and do sort of after-action
investigation as necessary. Cloud Armor proper, as I said,
we are in the sliver of space where the traffic
is in plain text, so we can do full
layer 7 inspection. We don’t look at the body, but
we do look at all the requests headers and the cookies,
so you’re actually able to define ACLs
today with IPs, right? We were generally available
for IP allow and deny. But in the future,
currently in an alpha, we’ve got geo-based access
control, a full set of WAF capabilities, as well as
a custom rules language to allow you to
define your own rules. And all of this should
go without saying, but it’s good to repeat it. All of this is built on
top of infrastructure that we have designed and built
for Google over the past 20 years to protect against
these common attacks to make sure that
things like search, and ads, and Gmail, and YouTube
all stay up and available. So it’s that same
expertise that we’ve developed protecting our
own applications that allows us to go and
enable customers to leverage that investment. Very big numbers there. So as I mentioned,
Cloud Armor is deployed at the edge of the network. And what that means
is it’s actually able to enforce the
security policies upstream without consuming
your own compute. So Cloud Armor enables
customers to protect applications from DDoS, filter
incoming requests by Geo, as well as most
layer 7 parameters. And as a web
application firewall, we’re also going to
ship pre-canned rules to protect against the web’s
kind of most common application attacks. So we are porting over
the mod security core rule set, if you’re
familiar with what that is. And we’re starting with SQL
injection cross-site scripting, but the plan is to port over and
support the full core rule set. And critically, we also
have real-time telemetry for your own monitoring
and security needs. I’ll show what that looks
like a little bit later. But the point is you’ve got
logs going to Stackdriver, and you’ve also got a
monitoring dashboard that helps you detect and react
when there is something to do. So as I mentioned, the Cloud
Armor security policies are used to customize access
to protected resources. And it could be as flexible
and as uniform or granular as you need. So although the
protection happens at the very edge
of the network, you can define the security policy– define access controls that
could be applied consistently to all of your applications
in a single project. There’s kind of a one-to-many
relationship between policy and backend service. Or, if the business
dictates, you can actually have
customized controls where you would
have one security policy per backend service. And all that matters is
what you would allow or deny into that application. Once traffic comes into
a backend service that is protected by a
security policy, the rules within that policy
are evaluated in priority order. Much like most application
security appliances out there, although Cloud Armor
is not an appliance, we fall down the list of
rules in priority order. Each rule has a match condition. And if traffic matches
that match condition, then the action associated
with that rule will be taken. The action can be allow, deny. In the future, the
action will be throttle if you need to do
some rate limiting. And if none of the traffic
matches, if none of the rules hit, then the default
action will be taken. And, again, that
could be allow, deny, or throttle in the future. And that provides the ability
to create very flexible, very granular controls. One best practice
recommendation, of course, is, since we
terminate evaluation after the first match,
it’s best to have all of your deny rules higher
priority than your allow rules. So if you do need to drop
traffic, that happens first. In terms of overall
features and status– so I’ve mentioned this
as I’ve talked before, but we do have a rich and
comprehensive feature set. Some of it is generally
available today. We announced GA back
at RSA last month for the granular
policy framework of the overall distributed
denial of service protection, the layer 3, layer 4 stuff,
as well as the ability to create IP lists to
allow or deny traffic based on the IP in those policies. That’s GA, as is all
of our telemetry. Currently in alpha
we have the rest of the WAF slate of features. That includes Geo
access controls, that includes our
custom rules language on top of which we’ve built
[INAUDIBLE] security rules. SQL injection and cross-site
scripting are up first. Shortly thereafter, we’ll follow
with things like remote code execution, local file inclusion,
and remote file inclusion, and then customer rules. So in addition to us shipping
these pre-baked rules, customers will be able
to use our rules language to select on any layer
3 through layer 7 attribute of the
traffic in order to filter out based
on business needs. So now going over some
of the most typical use cases that we’ve got. As I mentioned before, we’ve
got DDoS protection, right? That’s first and foremost. You get that as soon as you
deploy behind the HTTP load balancer in Cloud Armor. The idea is we will
filter out and scrub out any bad layer 3, layer
4 traffic to the extent that you’re only serving
HTTP or TCP traffic, and only good requests
will get to you. All right. So this is very common. Most customers love this. And you’ll hear from
relays in eBay talking about their thoughts on this. Here’s an example of our real
time monitoring dashboard. All right. So we provide
granular visibility both at an overall level about
all your incoming traffic and the percentages and the
actual numbers of what’s allowed and what’s blocked. But critically, we
also have the ability to deploy rules in preview mode. So this is akin to sort of
passive rules in older WAF deployments. The idea is you would model
the impact of the rule without it actually having– without it taking action
on your production traffic. So we log rules in preview mode. They log their action
to Stackdriver, but they don’t actually
stop the traffic. And then it’s up
to you to decide to take it off preview mode. So you’re able to monitor
the impact of previewed rules in these managed monitoring
dashboards as well. You can visualize it. And most importantly,
you can set up custom alerting policies. Using any of the metrics that
are exposed in the monitoring dashboards, you would set
up custom alerting policies to trigger any of your
incident response processes, notify the right
people, and then you can go and take action. One of the most common use
cases is restricting source IPs. Many of our customers
have a business need to filter their traffic
through an upstream proxy and yet still want to have
their projects in GCP. And the HTTP load balancer will
expose a publicly available IP address. So what they’ll do is they’ll
use Cloud Armor to define a default deny all rule and
then create two higher priority rules that would
allow the source IPs of their upstream
proxy and perhaps their own corporate
range of IPs. And the effect would be
then that only traffic from the upstream proxy is
allowed through the load balancer to the
backend services. Meanwhile, anyone
else from the internet that hits the public IP
directly would get denied. All right. That’s only if you
need to force traffic to come through
an upstream proxy. And the final use
case I want to talk about today is the idea
of downstream detection and upstream enforcement. So the way that
this generally works is customers will deploy some
basic Cloud Armor security policies. All right. And they’ll have
their applications within GCP as backend services. And they also have
their own sort of custom monitoring solutions. They’re doing monitoring
and analytics, looking at the application
logs, but they’re also bringing in data from a lot
of other sources, like– There we go. They’re bringing data
from other sources, maybe the load balancer
logs, maybe anything else going to their own [INAUDIBLE],,
any events going to their [? SIMs, ?] things like that. On top of that they’ll have
threat or fraud detection algorithms running,
or abuse detection, and any number of things that
they’ll identify something that’s bad based
on things that are unique to their
applications, taking all the context necessary. And then they’ll
define a signature and then push that signature
as a Cloud Armor rule upstream. And the idea is,
once you’ve decided that some set of traffic is
bad, you can go and enforce that upstream in Cloud
Armor, and that you would have near real-time
propagation measured in minutes to globally enforce
these security policies. With that, I’d like to
hand it off to Tzury, invite him back on stage,
to talk about Reblaze and their integration with GCP. [APPLAUSE] TZURY BAR YOCHAY:
Yeah, give a hand to Emil for the great
presentation about Cloud Armor. So before diving
into my slides, I’d like to take this
opportunity to thank you both, the team at Google
and at eBay, for your trust– first of all, for your trust and
for this great amazing journey we did together in the
last several months. So Reblaze is a comprehensive
security platform, or to be more precise,
comprehensive application security platform
deployed in the cloud. We’re about to explore Reblaze,
the features of Reblaze, as well as how Reblaze
integrated into GCP leverages GCP products and
platforms Cloud Armor already mentioned, but also a few other
products such as Cloud Security Command Center, BigQuery,
and Machine Learning. Reblaze is a cloud
native solution that provides you with nearly
everything you need to control and command your application. When we’re looking at
today’s threats map, we think it will be
wise to divide it into four different pillars– DDoS, well-known,
common on a daily basis, bot management, which is
basically a subset of DDoS, or it shows some
similarities with DDoS in terms of automated traffic,
runs across a large network. But instead of
flooding your network, flooding your resources with
incoming traffic, bot detection solution, bot
management solution will protect your
platform from attacks such as account take over, brute
force logging, even credit card fraud, and simple
scraping, data scraping. If your business is your
data and vise versa, scraping is an issue
you have to deal with literally every minute. The next is web
application firewall. While DDoS and
bot management are focusing on traffic,
behaviors and patterns within traffic, what
application firewall focuses on, the data, the actual data
submitted over to the server. Does the data contain any
malicious vectors, [INAUDIBLE],, SQL injections, [INAUDIBLE]
file inclusions, et cetera? And last but not least,
especially nowadays, API security. And one might think, what makes
API security a special case? While it probably
can be described as yet just another web
application platform, it’s just a simple HTTP
request back and forth, reading data and
responding to the clients. However, today’s
API, or as they’re called modern APIs, while
originally were designed to enable computers,
service, communicate with each other, today’s API,
by design, serving humans, and bots, and machines
at the same time. If you have an API and you
built your web application with Angular, or React,
or any other framework, the user interface, the
interacting within a browser or within a mobile device, is
actually using API underneath. And the very same
API will be used when you will release your
mobile native application. And at the same time,
your clients or affiliates will use the same set
of API to communicate and to transfer data between
their service and yours. So with Reblaze, one of the
key strengths of Reblaze is enabling you
and, in most cases, automatically profile
each and every case and assign the designated
policy optimized for each case. So human access API will
be enforcing and monitoring the human behavior along
the line while machines– B2B API, server to
server API, will enjoy and will run under different,
completely different policy. Also, by definition, the fact
API is all about transmitting data back and forth, API is– eventually, each
operation puts some data, puts some stream of bytes
either in a database or in their drive. And that suggests direct
access to a database somehow, and therefore, security
became much more critical. And last, but from the service
we take from our clients as well as from– our analysts are
actually reporting. The greatest challenge with
API today is, where are they? So it used to be a [? slash ?]
API or API subdomain to run your API, but today’s
within organizations, APIs are becoming publicly
available on a daily basis, literally. Every cloud function
you put out, every microservice you release
or you consume and utilize, that’s actually in APIs. So for the system administrator,
for the security engineer within an organization,
there is no way they actually know and familiar
with the location and purpose of each and every API, because
developers are simply releasing them on a daily basis. So let’s talk about
the setup of Reblaze and its relations
and relationship with Cloud Armor and other
cloud security products. So in a typical GCP
product, you will have– and Emil has actually
covered some of it. So you will have a
global CDN in front. The global CDN will be
working in conjunction with a global balancer and
in GCP and Cloud Armor. So CDN will be handling
content delivery in a most efficient way. Load balancer will
handle everything related to SSL termination and
ensuring high availability. And Cloud Armor is processing
and handling the security. So you have those three
products and platforms actually becoming your front shield
and your front shield for your global public
front-end for your traffic. Now, Reblaze stands
right next to it. So instead of a load balancer
communicating with your scale group, with your
[INAUDIBLE] directly, the load balancer will
communicate with Reblaze. This Reblaze enforces a
DDoS, block all the bots, operates with the web
application firewall, and provides the API security. How long Reblaze
does in real time, protecting, blocking,
dropping, or letting in? Reblaze pushes all
the data to BigQuery. It’s not real time. It’s about 10 seconds,
10 to 12 seconds delay, but it’s nearly real time. Now, this is where all the
fun stuff are happening, because BigQuery and Reblaze
is being continuously analyzed, used to
analyze the data, and every incoming
request and every session be analyzed continuously. And as we detect attacks
or misbehave from a user, from a session, attacks of which
you cannot block it based upon a given request, rather
you need some track record, you need to see a pattern. As we detect an attack
source, Reblaze proxies immediately get updated
as well as Cloud Armor get updated immediately. So there is immediate
action taking place as attacks are detected. This setup, the way we set it
up enabled us to provide you the users with four features,
or some call it principles, which we believe
are key requirements in every security solution. The one is absolute visibility. That is a given. You cannot protect against
something you’re not seeing. In a web scale, when you
process millions or billions of requests a day, you
have to have the ability to facilitate the tools
to analyze and get to each and every record and
track each and every session. But at Reblaze, we do more
than just data analysis in real time. The second principle
is immediate action. As I mentioned, action
taken in every request. Farther processes are
dropped immediately. We detect the source of the
attack, update immediately. Proxies update immediately
to Cloud Armor. Not only that, we also have
integrated with Google– other [? SIM ?] products called
Cloud Security Command Center. So also, we push alerts,
and we push all the events to your command center so you
can get, in one screen, all the recent attacks. And if an event is
taking place, you’ll be able to take
action immediately straight from the portal. The third feature
is recommendation. We help do the thinking for you. It’s really hard to keep
in mind all the areas, all the policies
required, all the changes within your application. As I’ll show you soon, Reblaze
provides a recommendation where security policies
might be better off fine tune, restricted
access, API discovery, as I explained. Nobody knows today
where their API is. Reblaze, given the capacity and
the availability of BigQuery, every day we analyze
the entire traffic, and we spot errors in which
we detected API activity. And we say, OK,
here’s an API section within your application. It is not profiled
yet as an API. Would you like to in
one click set it as API? So you will enjoy all the
capabilities of API security. And last is our biometric
behavioral analysis. So this is a capability
which, in the recent months, became the tool, the
tool that we are actually fighting and detecting the
most sophisticated bots. And the biometric– I will
just be explaining to you how it works. So in addition to the
session graph that we hold and process– in Reblaze,
it’s per user for each user– there is another graph
which we maintain, and that is the human
interaction, the human presence along this session. So its applications are
protected by Reblaze. Every click and
every keyboard tap– every click on the keyboard,
every tapping, scrolling, every event that happens
on the client side is monitored and
reported to Reblaze and analyzed in BigQuery. So when we look at the data,
when we look at– when we inspect and analyze a session
and each and every session get analyzed– not only that we have all the
data and all the transaction performed by this session,
we also have, was there a human at all? Do we actually– are we
dealing with a human, or just emulated or headless browser,
or any type of sophisticated bot attacks? And that wouldn’t be possible
without the BigQuery. Because given we’re
processing billions, literally billions and billions
of requests a day, BigQuery gives us the
ability to add on top of it several other billions of
mouse clicks and keyboards and yet provides you as a user,
and our platforms, our systems, immediate analysis and
immediate information about each and every session. When we talk about
visibility, if I may take a few more
seconds to expand the visibility of Reblaze– So in addition to the fact that
every request, every header, every cookie, every
authentication action, every API call
is stored in Reblaze and analyzed in Reblaze,
we enrich every request with metadata, which gives
us a better profiling of each and every session. For instance, every request
in Reblaze is marked. Is it a human, or headless
browser, or a bot? Is this session operating
within the Tor network, or behind a proxy,
or behind a VPN? Is it a server on a VPS on
a cloud somewhere or a data center? Or is it a domestic
office link connection? And those profiles– and
so absolute visibility means all this data is always
available to make decision, to take action when needed. Another case which BigQuery
became critical for us is a bot detecting brute force
attacks, such as loggings or account takeover. For instance, anybody
recognize what this is? So this is one of the
things we found by working with Derek and his team. Basically, as you
probably– if you know, with Gmail, your mailbox,
on the left to the @ sign, you can spell it however you
want with any number of dots, any position. It will always be
the same inbox. So if you have a
mechanism that says, validate your account
by sending you an email, and you’ll link on– you go, you click on the
link, and that’s all. Your account is validated. What we have found in this
case, in this particular case– and there’s thousands
of cases like this– we found 697 different
variations of this email, as you saw. Now, those cases, we
found them by mistake. We had no plan to go and seek
for this particular attack. But given the machine
learning and the analysis around BigQuery
continuously, the algorithm detected too much
of a similarity between hundreds of
hundreds of accounts and said, take a look at this. We spot something. So Reblaze has more
features than that, but time doesn’t
allow us to do so. And this case, I will
hand over to Derek so he can share with us
his experience with Reblaze and Cloud Armor. [APPLAUSE] DEREK CHAMORRO:
Thank you, Tzury. So first I want to
thank our legal team for allowing me to talk about
this, because in most cases we’re not allowed to. So some of you might be
wondering why I’m up here. Given the nature of eBay, we
kind of build our own servers and we put them all around
our global data centers. But like most of you,
I like to be prepared. As a security
professional, too, I’m extremely paranoid
about whatever the next big attack is. And when I think about
some of our brands potentially going into an
area that we don’t own, and if we don’t have a
proper strategy in place, then that potentially is
a threat vector for us. So part of this use
case was to determine, how could we replicate our
existing known perimeter security controls
into an environment that essentially we don’t know? So some of you might
think of– or know eBay as being just eBay.com. And it’s an online marketplace
with over 175 million users, with over 1.1 billion
live listings at any time. But I like to think
of eBay as kind of like the roof of a house. And under that roof,
we have many brands that fit our marketplace
strategy, whether it be StubHub, which is the
world’s largest online ticket marketplace, or a lot of
our other global brands that help people
around the world find what they’re looking for
in their local communities. So what do they all
share in common? Well, we as a security team,
we treat them all equally. And we want to ensure
that we’re protecting not only their brand image
but their reputation. So some of the goals
that we had with this is that we wanted
to first experiment with how we could extend
this perimeter protection strategy no matter where
our data could live. Second, we wanted
to kind of preserve the same level of visibility
that we currently have within our on-premise systems. Ideally, we want to
be able to correlate that data with our
on-premise systems with whatever we were able
to achieve in an environment that essentially we didn’t own. And third, we just wanted to
see if we could break anything. I mean, the inert
hacker in all of us, we want to see where we could
induce some kind of failure, whether it be within something
we found within Google, or something we
found within Reblaze, or something we found within
the systems we provisioned. And on a side
note, when we asked our customer
engineer, just said, hey, listen, we
want to DDoS Google, I expected some kind of delay. And I was actually
really surprised when he said, yeah, let’s do it. So it made me pretty
happy about that. So a lot of you who have
developed a DDoS security program may be
familiar with this. What this is is
part of the approach of having a successful defensive
strategies to create tooling for common DDoS threat vectors. So we put it across three
pillars– anomaly detection, visibility and forensics,
and attack mitigation. And in most cases, all
of these work together to mitigate attacks. So you might have a
small attack that sneaks under configured thresholds. So to mitigate that
potential impact, you would first
look at some kind of anomaly detection of alert. And with this alert, you would
use your existing visibility and forensics tools to
determine whether or not the attack was real
and then look for ways to be able to block the attack. And then you would use your
attack mitigation tools to implement some kind
of filter in order to be able to filter that out. So with this in
mind, we built a set of requirements that
could mirror the existing tooling that we had on premise
and do kind of like a feature parity comparison
with the existing controls available
through Google as well as through
their trusted partners. So we decided to
build a test site. And while the test
setup wasn’t a replica of what our traditional
sites would look like, we wanted to build
a project that would represent all
aspects of what we want to accomplish from this test. And we discuss actually
building a mirror site, but what we realized was
that we would probably blow through our security
budget for the entire year. So this is an example
of what we had. And first, we wanted
to build something that was fully automated. If we encountered
any failure, we wanted to just be
able to destroy it and rebuild it as it is. So it kind of shows a
reflection of the attack traffic that we would expect in. So first, on the
right-hand side, before we engage with
any other third parties, we decided to build
some attack VMs and kind of create this hairpin
traffic that would come in through the internet. And it first hit the
Google front-end. So we automatically knew that
we had your L3 and L4 protection that was managed by Google. So once it hit that layer, after
that it hit the specific test project, where we
parked a domain. We didn’t want it to have
anything to do with eBay. We just wanted to be
something separate. That way we didn’t have any
liability if anything ever happened to us. But what it was, it was a Cloud
DNS Managed Zone externally. We had a static IP assigned
to the load balance fit and attached a less encrypt
managed certificate with it that automatically rotated. And then it did our
SSL off-loading for us. So the next hop would then
be the Reblaze proxy tier. So this Reblaze proxy tier
was managed instance group, which use an instance
template of a managed image through Reblaze. It was using N1
standard 16 instance types, which is 16 VCPUs
and 60 gigs of RAM each. And it autoscaled
based off of demand. So in reality, for
the test, we just kept two instances
running at all times. And during the testing, it
would autoscale based off of the specific metrics
we had provisioned for it. And then we had the BigQuery–
the logs go to BigQuery. So it was a structured data
set that was IM-controlled. We had access to them. Nobody else did. And that was part of
the reasons why we wanted to do this experiment. We wanted to maintain some
kind of control of the data that we were actually creating. From there, it would
forward it to another cloud DNS-managed zone that
was considered internal. And then it sent it to
this specific tier of VMs that we provisioned. These are vulnerable Docker
instances that we created. There were susceptible things,
like OS top 10 attacks. The idea is that we wanted
to keep them to the point where we could break them. So if anything penetrated
here, we wanted to see– monitor this
specific tier and see if anything failed within
the upper part of the stack. So we partnered up with a third
party DDoS company, specialized in kind of test driving what
your security controls are, your DDoS mitigation
solutions are, to simulate the true scale
of distributed attack. And we wanted to focus on
specific DDoS types that were considered common
types from layer 3 to layer 7 for our test cycle. So the layer 3, layer 4 attacks
listed were more focused on megabits per second. As an example, like
a UDP flood, which uses UDP data gram
containing IP packets to flood random ports
on a target network. Or, well, the layer 7 attacks
were more connection-focused– connection per second-focused,
like HTTP flood, which is an attack that
use a large number of HTTP get or post requests to target
an application or web server. So the sequence of events
was the vendor initiated each attack individually. During each attack we searched
for impacts to availability through a series of scripted
curl requests and web page reloads and then monitored
the performance on the VMs. And then following
each attack, we cleared out any
dynamic IP blacklists before the next attack started. So here are the results
of the test as follows. So a pass means
good in our case. So no visible impact to any
of the backend service VMs. And a fail would mean
there was visible impact to the backend service VM. So as expected,
all layer 3 attacks were mitigated at the Google
front-end via Cloud Armor. Any of the non-TCP
traffic was discarded by Google’s
distributed firewall. And this left literally only the
TCP connection floods and TCP flag attacks that could
reach the front-end VIP. So here are some
of our findings. This is kind of
like the feedback. Mind you, when we did this test,
we did this late last year, we worked really well with
Google to say, hey, listen, this is some of what we found. And some of the product
that you see today and that was actually
released at RSA is from some of the
feedback that we were able to give them
to make the product a lot more resilient. So at the time, all the
attacks were silently dropped. So we really didn’t get
any visibility into that. We didn’t really have the rich
logs that we wanted to see. Ideally, you want to be
able to correlate that across all your properties. So you weren’t able to do things
like short term log analysis, or short term threat analysis
or long term threat analysis. And that’s kind of some
of the feedback we gave. We wanted to see
some kind of summary logs so that we could be aware
of any anomalous activity that was happening in Google,
and at the same time, so we could start correlating
that and training that across all of our properties. So as you can see, all
the layer 7 attacks were mitigated within Reblaze,
either through dynamic rule or bot challenges. And following each test, we
needed to clear the dynamic IP blacklist so that
the next set of tests could successfully run. And from a visibility
perspective, again, some of the things
that we encountered is that some of the load
balance logs provided some rudimentary access
logs and not a form that was easy to analyze. And at the time, they only
contained IP blacklist logging. From the Reblaze
perspective, all the data was through BigQuery. Some of the canned
table visualizations were difficult to pivot. We gave that feedback
to Reblaze and they made them a lot better. And again, since everything
is logged to BigQuery, you can kind of build
your own analysis tools if you want to for visibility. These are enabled
defenses that we had. So the global ACL is a dynamic– it’s a network-based ACL. It does things like IP blocking,
ASM blocking, or via country. The two that we
primarily used was the global DR XDeny, which
is a dynamic IP blacklist, and the Deny Bot. Deny Anonymous Proxies and
Deny Tor are self-explanatory. Those lists are
automatically updated for us. But a Deny Bot is interesting,
because a lot of vendors, what they normally
do is their response is just injected JavaScript. But we found Reblaze to
be a lot more intelligent, where it takes a lot of
human detection components in order to be able to detect
that it’s actually a bot. So some of the techniques
were environment validation, proof of work, human
behavior interaction, which is literally
like mouse clicks, moves, and scrolls to
validate connection. And then we had a
default eBay WAF policy, which is where we whitelisted
HTTP methods, GET and POST, and denying all
others, that way we can allow certain arguments in
while denying everything else. So what you see here are
the Reblaze traffic graphs. And this is the graph of
the scripted attack test before, after, and during. On the y-axis you see– it represents the
connections per second while the bottom
shows the time range. And so this is a scripted
HTTP-based web flood. And so you can see
that Reblaze starts by challenging each
request and asking the bot to re-request the same page. Since the bot never
re-requests the same page, its request is not
passed through. And these bots get caught by
the web page dynamic rules, and then they’re
temporary blacklist. And again, you can kind of see
the trending from the bandwidth as well. So I start sending challenges,
the bandwidth increased. And as soon as the bots are
moved to the dynamic blacklist, the bandwidth drops. And as Emil
mentioned previously, these dynamic blacklists can
be configured to auto-escalate to Cloud Armor. And again, this is the
simulated browser attack. And we had some connection
[INAUDIBLE] during the test. But as with the
scripted attacks, you can see that the
same thing went out. The challenges went out. And as soon as challenging
the response several times, the bots got moved into
a dynamic blacklist. And again, the same with
each– with the bandwidth to Reblaze proxy. So the verdict for
us, it was a pass. Cloud Armor provides
scaled out defenses against volumetric DDoS attacks. And Google’s reputation
precedes itself with mitigating these
type of attacks. There was a few different–
the blips that we found is that some of the
telemetry, but given the feedback we
were able to give, we see that in the
product that is today. And again, with Reblaze’s bot
detection and dynamic rules, it was amazing at being able
to stop a lot of the layer 7 attacks that we encountered. At the same time, from the
visibility perspective, we were actually able to
get a lot of telemetry that we wanted. So overall, Google
Cloud Armor, Reblaze were a sound solution for
DDoS attacks of all kinds. And that’s that. [APPLAUSE] Also, we have– there’s
a survey on your app. So if you could,
please take some time to give some feedback. And thank you. [APPLAUSE] [MUSIC PLAYING]

Leave a Reply

Your email address will not be published. Required fields are marked *