cartomancy labs futurecast
Posts
Fraud Frameworks & Foundations Part III: The Cards we’ve been De@LT

Fraud Frameworks & Foundations Part III: The Cards we’ve been De@LT

Defenses at Layer eighT, Getting Inside the Product/Game, and Centering Users in our Models of Defense

Allison Miller
December 11, 2024

Extending Product Security: Life at Layer Eight

Ok fam, so far we’ve reviewed a draft maturity framework (CARTO) and then a reference model for risk decisioning systems (CoRDRA). You know what we haven’t talked about yet? We haven’t talked about how teams think about fraud/abuse itself - analyzing abuse cases and understanding options we have for addressing them (and how that might differ from problems affecting cyber). Let’s get into a bit of that, about how we model out fraud & anti-abuse problems.

Just like detection tech, there’s a lot we can borrow from existing models and approaches in cyber. But just like detection tech, there’s some tweaking we need to do. Let’s take a look at the cards we’ve been De@LT* in fraud & abuse.

Coming to terms with who’s in charge — The users decide: You can design all kinds of cool MVP products and features, but ultimately, it’s the users who decide what’s interesting and useful about your product/platform. They are the ones who figure out what features they are going to use, and how.
- The more creative your users are, the more likely you are going to find they don’t use your product the way you expected.
- Brainstorming is a great way to get to features (WHAT features to introduce), and it’s also very helpful to brainstorm after the feature is designed (HOW the feature might be used).
- Note to product managers: Collect telemetry that will get your insights not just on whether your feature is hitting KPIs, but how the feature is being used - those may also provide helpful feedback on if the feature is being used “creatively” or abused.

“This is fine” aka everything is WAI: Some fraud attacks might leverage security issues and vulnerabilities, but in most systems, there are ongoing abuse problems that exist “above” the code layer. Meaning: your application and application security controls are working as intended (WAI). It’s just that the features of your system (moving money, communicating with others, displaying advertisements, playing a game) were used with bad intent.
- In a very large scale product at a very large scale internet company, I noticed any time there was a problem or error, we would drop what we were doing to understand - were we looking at a bug? Or was the system WAI? Bugs got fixed, WAI’s were noted, clarified, and considered.
- Note: Where product manager teams use personas to model for user experience, I recommend they also consider including variants of adversarial users to test the edges of their functionality.

Probabilistic vs systematic controls: Unlike cryptography, most anti-abuse controls are probabilistic in nature. Meaning there is a known and expected failure rate. This is true with all rules and scores (by design), and none of the other mechanisms are “perfect”, either.
- Layers of less than perfect defense: Like in cyber, these controls are layered in order to improve confidence and weed out bad actors, but at the end of the day (given what we have to work with) we know that to keep legitimate activity flowing, we’re going to need to accept some level of bad actor activity, too.
- Some cyber examples: We do see this in cyber in many “detection” driven tooling, whether the detection is implemented in a preventative or detective (flagging after the fact) manner. Anti-virus is a good example of how this works in cyber. Another might be tuning the accuracy of biometric authentication, in high control systems we may be able to decide how we prefer to balances the false positives vs false negatives.
Protecting users & customers vs protecting the mothership: Perhaps the biggest difference between cyber and fraud is that, in general, cyber is dealing with the insides of a system - servers and code and employees, where abuse happens in customer-facing contexts. How the code works externally, and customers. It turns out that a company can create policies that employees must abide by, but companies cannot force customers to do anything - they can apply as much friction as those customers will accept. In truth there are incentives designed into most companies to keep friction as low as possible - because without maximizing customer activity and revenue, the companies cease to exist. This dynamic: friction vs revenue is very real, and quantifiable when talking about fraud and other customer abuse cases.

Getting in the Game

If we take a look at cyber attack frameworks, we see that the journey (marginally) moves from left to right and (primarily) centers the bad actor (attacker)’s experience. And it’s important to note that while there’s maybe a logical progression of an attack process, in cyber, attackers do not sit still and move through flows designed for them.

Overview from MITRE ATT&CK

At Layer 8, in business logic, customers and users DO move through the applicable flows. Below I show some of the common elements of a checkout flow; this is not quite a wireframe it would be easy to create a more detailed understanding of the business logic flow by linking wireframes together. (How very object oriented of us).

User walks into a checkout flow, slides from my 2018 talk @ LocoMocoSec

Note that users are not not hopping layers, here. Even attackers would follow the flow as laid out - attackers would be manipulating their way through flows that are (largely) WAI. These are the cards we (Layer 8 defenders) are De@LT.

Because the “game” of fraud and abuse happens within Layer 8 where code is WAI, many teams that work on fraud and abuse practice a method of threat modeling that focuses on threat modeling within the product / code / UX itself (the game board, perhaps). Now, an advanced bad actor will certainly think outside of the box. But they will still largely act on the game board built by your product and development teams. (Note: Here too is where we note that how things designed aren’t always how they are implemented, hearkening back to the top of this post. Risk/fraud/abuse folks have known that “the code IS the policy” forever.)

To clarify, this isn’t a critique or even a comment on cyber threat modeling. I’ve seen some amazing threat models constructed on very sophisticated flows that work from Layer 8 all the way back down the stack. It’s more a clarification on how “the exercise in fraud/abuse that is the most in analogous to threat modeling” differs, and I think the biggest point of differentiation is that both the threat issues AND the potential solution set are (largely) native to the UX and product design itself. (See discussion of the CoRDRA)

User-Centered Risk Modeling

So if we’re going to be working from within product flows, how might we investigate and model different types of attacks? We can still use different variations of Threat Modeling(TM), but I like the “Attack & Defend” method which was introduced to me while working at a fintech in product design and anti-fraud techniques. Here’s a high-level example of an “attack and defend” model overlaid on a common user flow.

We take the user flow (the checkout flow from above works fine) and start thinking about all the different ways the flow - WAI - could go wrong. (I think of this as the threat actors playing the cards in their hand.)

Step 1: Adding the the “Attack” side of the model (this is about 15 minutes into the LocoMocoSec talk)

Step two is to figure out what actions or features we could add into the flow to counter the abuse cases. (It’s our turn, to play OUR cards)

Step 2: Countering the attacks, the “Defend” side of the model

This is step one and two, but like many threat modeling exercises (especially ones where controls are probabilistic and therefore must be layered to have reliable preventions), this process will be iterated. At some point each abuse scenario will likely have it’s own Ishikawa diagram of layers of controls! (If we were using playing cards there would be stacks across our “game board”.)

Keep in mind that the examples above show the UX-integrated preventative controls, also part of this attack and defend might be behind-the-scenes detective and identity challenges. (INVISIBLE cards - this is a weird game)

The goal in the example above is to stop fraud and also to avoid the big preventative control of a checkout or payments flow: a decline. We will put friction in front of risky users, and try to weed out the bad actors so that good users who look risky to us will still be able to get through the flow. Every successful completion of a risk control/friction point increases our confidence, and hopefully we get our confidence up enough so that the final node is an approval, rather than a decline. You see that we included the Confirmation as the terminal node of the diagram - this was mostly for simplicity, there will be a whole additional set of controls to consider if the Checkout needs to be terminated by a decline at or before Confirmation. [Note: Ask me sometime about merchants that terminate AFTER Confirmation]

A Game that Never Ends & Getting Beyond “Don’t Get Popped”

Okay, once we’ve got a working Attack & Defend model to incorporate into our design of the UX, a couple of things to consider:

A user flow must be designed to optimize the success of good users (in getting through the flow)
A defensible user flow must consider both the happy path (good user successful) as well as a variety of resolvable unhappy paths
A user flow represents multiple interactions points and considerations for the designer:
- Each interaction point can be wired to throw off telemetry and learn more about the user in the flow
- Each interaction point can be a potential place to identify problems and to layer in controls
- Each interaction point can represent a decision point, and decision points can be optimized [FORESHADOWING]
Performance of defending the flow likely affects performance of the flow itself
- Re-routing to less-happy paths will likely affect both bad actors and good customers (imagine your product managers screaming about friction and checkout abandonment)
- Getting it wrong with too much defense depresses bad actor success, but also good user success (example: decline rate too high, bye bye revenue)
- Getting it wrong with too little defense enables good user success, but also enables bad user success (example: decline rate too low, losses soar - bye bye profitability)

If we continue the card analogy, let’s note that the game is not won or lost on a particular hand. Not only are defenders in this game playing many, many rounds of the game, they must play in a way that keeps the good players coming back as well as defeating as many bad actors as possible.

A note for the card sharks: When Risk Impact becomes Dollars

I’m not a great poker player. I play more on vibes and intuition, and my friend, poker is not a game of vibes and intuition. Highly skilled poker players, I understand, are running numbers in their head, they’re considering conditional probabilities, they’re keeping track of all the cards they see and also working through mental models about the cards they haven’t seen - basically, highly skilled poker players are able to quantify the value of the cards in their hand, and proceed accordingly. I mention this because we are also able (to a certain extent) to understand the “value” of our hand (our defensive design) when it comes to an abuse problem like fraud.

And this is one of the things I’ve always loved about working in fraud management — the ability to quantify. The ability to have a sense of the value of our “hand” - our strategy, our infrastructure, our data, our workflows. A simple gains chart helps visualize this a bit (a sequence of gains charts shown below). We can talk about the “lift” of model (comparing one model to another, or a model to no model at all), and the curve jutting out to the left represents “the best we can do”. It’s like our decision quality horizon. And if we can choose to live anywhere on the curve we like, where do we choose?

Do we choose the super-cushy approval rate that let’s all the holiday shoppers through? (and many of the bad actors)
Or do we choose the mad-stringent approval rate that stymies all fraudsters? (and frustrates a bunch of innocent shoppers)

And friends, that were we can then start to do things like choose the most profitable place on the curve. That, my friends, changes the discussion with stakeholders a lot.

1) Generic gains chart, 2) gains over random, 3) gains moving from one model to a better model. I start talking about this around 38:15 in the LocoMocoSec video.

Coming back to the earlier foreshadowing (just a couple of paragraphs up) - I have two caveats around this extremely exciting ability to optimize loss rate against something like profitability:

It gets more complicated to get really precise on this when you have multiple decision points, not all of which provide a signal around monetary impact of the risky bad actor / potentially good customer actions.
Generally systems that have highly predictable model performance get there because they have a lot of throughput (many events/transactions, lots of telemetry)…and maturity (feedback loop, ability to derive insights from telemetry).

That said - and here’s my favorite point I like to make about those slides and diagram:

That horizon of model performance might look familiar to finance nerds. It’s because that curve represents the “best we can do” with the current technology/information available to us - kind of like beta, maybe? Because right now we can live anywhere on that curve, but if we make investments, we may be able to get more gain and shift the curve further to the left - increasing performance. This provides a great business case for investing in better technology, and better data. And this need to continue shifting the curve is ongoing, because the other thing about model performance: there is no “set it and forget it”. As soon as you launch a model, it’s performance starts degrading. Because there are smart attackers on the other side, and they’re learning from us as much as we’re learning from them.

Wrapping it Up

Card game analogies aside, the options and goals of defenders are a little different when they are working above the appsec layer and getting into the product design and (WAI) product behaviors. It changes:

How risk analysis is done of the product and product design (threat modeling)
How defense is designed and laid into a customer experience (in-product features, happy paths & unhappy paths)
How defense considers performance and “winning” (expected transactional loss rates, model performance, balancing false positives & false negatives, thinking about impact on operations and profitability)

Reflecting, both cyber and anti-abuse have being De@LT more complicated hands as of late. Where cyber has the complexity of working up and down an ever changing stack and a ballooning exposure surface area, in anti-abuse the products themselves are getting more complex, and we are not just dealing with transactional risk anymore - we’re dealing with expanding lifecycles with our customers and partners.

Good news / Bad news: It’s interesting times in this card game, and continues to get more interesting.

Endnotes

*Regarding “De@LT to stand-in for Defense at Layer eighT” - maybe it IS cheating to create an acronym using the last letter of a word instead of the first letter of the word, who’s to say?

In this write-up, I’ve used some slides and concepts from a talk I gave at the first LocoMocoSec in 2018, if you’d like to watch/listen to that presentation you can find it here: