deploying RPKI based Origin Validation
Volgens mij gaat Nederland wereldleider routing security worden! Hoe/waar zijn jullie met implementaties van RPKI Origin Validation? Hebben mensen hulp nodig? Groeten, Job ----- Forwarded message from Job Snijders <job@ntt.net> ----- Date: Thu, 12 Jul 2018 17:50:29 +0000 From: Job Snijders <job@ntt.net> To: nanog@nanog.org Subject: deploying RPKI based Origin Validation Hi all, I wanted to share with you that a ton of activity is taking place in the Dutch networker community to deploy RPKI based BGP Origin Validation. The mantra is "invalid == reject" on all EBGP sessions. What's of note here is that we're now seeing the first commercial ISPs doing Origin Validation. This is a significant step forward compared to what we observed so far (it seemed OV was mostly limited to academic institutions & toy networks). But six months ago Amsio (https://www.amsio.com/en/) made the jump, and today Fusix deployed (https://fusix.nl/deploying-rpki/). We've also seen an uptake of Origin Validation at Internet Exchange route servers: AMS-IX and FranceIX have already deployed. I've read that RPKI OV is under consideration at a number of other exchanges. Other cool news is that Cloudflare launched a Certificate Transparency initiative to help keep everyone honest. Announcement at: https://twitter.com/grittygrease/status/1017224762542587907 Certificate Transparency is a fascinating tool, really a necessity to build confidence in any PKI systems. Anyone here working to deploy RPKI based Origin Validation in their network and reject invalid announcements? Anything of note to share? Kind regards, Job ----- End forwarded message -----
Hi Job, nice to see some more noise around OV. You (and a few others) already know it, the wider community probably not) ... so here we go (again).
Hoe/waar zijn jullie met implementaties van RPKI Origin Validation?
AS286 is "prepared", but not yet rejecting anything. Once the customer cone is clean (or customers had enough time to get their or their customer's or their customer customer's invalids corrected), reject will be enabled (and not disabled again). Round about a dozen of invalids of downstreams remain ... For peers - stopped at 95% of geographical coverage enabled with reject ... then some got too scared of reachability issues for own space and single homed customers and what to do when someone complains; to some extend a "is there a process" issue with "who will take the work load" discussion paired with "as long as the risk to get blamed for breaking connectivity is significant higher than the coolness or honour factor - better be careful". So the more the market makes a hype about it, the more likely a quick deployment -with reject- will be. Yes, there are some (not just a few) invalids out there, not covered by not-invalid less specifics: https://as286.net/data/ana-invalids.txt (note: not a perfect report, limited to AS286 peer routes, downstream routes excluded, potential wrong altpfx for <12 prefixes [...]) I haven't done yet any analysis what's in there - eyeballs, customers, know and famous pages or whatever (and unlikely will do it soon). Anyone willing to share their experience with invalid == reject and broken reachability? I already proposed to Job to bring into being -like the World IPv6 day- a "No Invalid" day (or hours) - as long as sufficient larger networks participate it might wake up people to get their ROAs fixed (some of them are not even aware ...).
Hebben mensen hulp nodig?
What I'm still struggling a bit with is: RV (OV) vs RTBH, selective BH and AS286's rtdsCoS. Guess it's very unlikely networks will publish /32 (or little larger) or /128 (or little larger) ROA. Falling back to IRR might be ok to some extend for RTBH (it just breaks connectivity), but with sBH and rtsdCoS it allows to some extend traffic hijacking. Another idea is taking valid ROAs and building upto /32|/128 filters out of them and only allow RTBH, sBH, rtsdCoS for space ROAs are published. At least it might encourage more networks to publish ROAs. But that still needs some fine tuning as e.g. 10.0.0.0/8-16,AS1 and 10.0.10.0/24,AS2 - is AS1 allowed to BH 10.0.10.0/32 or only AS2? And not everyone might be able to create ROAs for all address space they have "easily". How are the networks already rejecting invalids and offer BH/sBH or alike handle it? How would you expect your peer/upstream handle your BH announcements if if doesn't validate (as there's no /32 valid ROA for it)? Cheers, Markus -- FvD, Markus Weber, AS286 KPN EuroRings Germany B.V. Rüsselsheimerstr. 22, DE-60326 Frankfurt Amtsgericht Frankfurt HR99781, GF Jesus Martinez & Hugo van den Akker
On Thu, Jul 12, 2018 at 09:09:44PM +0000, Weber, Markus wrote:
Hoe/waar zijn jullie met implementaties van RPKI Origin Validation?
AS286 is "prepared", but not yet rejecting anything.
Once the customer cone is clean (or customers had enough time to get their or their customer's or their customer customer's invalids corrected), reject will be enabled (and not disabled again). Round about a dozen of invalids of downstreams remain ...
From my perspective you are almost squacky clean! I see two invalids 88.159.27.0/24 (invalid, but covered by valid route 88.159.0.0/16) and 94.103.31.0/24 (also covered by valid 94.103.16.0/20). I'm sure there are more, but if you drop these two prefixes it shouldn't result in loss of connectivity because there are covering valid routes.
Yes, there are some (not just a few) invalids out there, not covered by not-invalid less specifics: https://as286.net/data/ana-invalids.txt (note: not a perfect report, limited to AS286 peer routes, downstream routes excluded, potential wrong altpfx for <12 prefixes [...])
We should compare notes on how you generate this. I looked today and only 2,200 prefixes become unreachable due to misconfigured RPKI ROAs. http://instituut.net/~job/rpki-report-2018.07.12.txt - I think the majority of invalid more-specifics simply are attempts at traffic-engineering and not critical.
I haven't done yet any analysis what's in there - eyeballs, customers, know and famous pages or whatever (and unlikely will do it soon).
Anyone willing to share their experience with invalid == reject and broken reachability?
I hosted peeringdb.com for a while inside my own ASN which does RPKI Origin Validation - this resulted in about 1 complaint per year. I've CCed Sebastiaan Koetsier, he'll be able to comment more on what he has seen in the last 6 months.
I already proposed to Job to bring into being -like the World IPv6 day- a "No Invalid" day (or hours) - as long as sufficient larger networks participate it might wake up people to get their ROAs fixed (some of them are not even aware ...).
I think it'll be quite challenging to get everyone timing wise lined, but we can keep this idea in the back of our heads should we fail to make progress through other means.
Hebben mensen hulp nodig?
What I'm still struggling a bit with is: RV (OV) vs RTBH, selective BH and AS286's rtdsCoS.
Guess it's very unlikely networks will publish /32 (or little larger) or /128 (or little larger) ROA.
Yes, agreed. We should not ask this from customers.
Falling back to IRR might be ok to some extend for RTBH (it just breaks connectivity), but with sBH and rtsdCoS it allows to some extend traffic hijacking.
Another idea is taking valid ROAs and building upto /32|/128 filters out of them and only allow RTBH, sBH, rtsdCoS for space ROAs are published. At least it might encourage more networks to publish ROAs. But that still needs some fine tuning as e.g. 10.0.0.0/8-16,AS1 and 10.0.10.0/24,AS2 - is AS1 allowed to BH 10.0.10.0/32 or only AS2? And not everyone might be able to create ROAs for all address space they have "easily".
How are the networks already rejecting invalids and offer BH/sBH or alike handle it? How would you expect your peer/upstream handle your BH announcements if if doesn't validate (as there's no /32 valid ROA for it)?
I was thinking that for blackhole routes you pretend the MaxLength attribute of the ROA is 32 (or 128). This way you can still do _origin_ validation for blackhole routes, but the prefix-length check will fail. This definitely needs a bit more thinking, but we don't need to solve it immediately. If you want to move forward fast you can just let blackholes function as they function today (based on IRR). Kind regards, Job
On 12-07-18 23:25, Job Snijders wrote:
On Thu, Jul 12, 2018 at 09:09:44PM +0000, Weber, Markus wrote:
Hoe/waar zijn jullie met implementaties van RPKI Origin Validation?
AS286 is "prepared", but not yet rejecting anything.
Once the customer cone is clean (or customers had enough time to get their or their customer's or their customer customer's invalids corrected), reject will be enabled (and not disabled again). Round about a dozen of invalids of downstreams remain ...
From my perspective you are almost squacky clean! I see two invalids 88.159.27.0/24 (invalid, but covered by valid route 88.159.0.0/16) and 94.103.31.0/24 (also covered by valid 94.103.16.0/20). I'm sure there are more, but if you drop these two prefixes it shouldn't result in loss of connectivity because there are covering valid routes.
A patently I needed to publish the RPKI cert of 88.159.27.0/24. Thanks for the headsup. <...>snap</snap> Kind regards, Michiel Piscaer AS39309 Edutel BV -- Network / System Engineer Security Officer E-mail: m.piscaer@edutel.nl Telefoon: +31 88 787 0209 Fax: +31 88 787 0502 Mobiel: +31 6 16048782 Threema: PBPCM9X3 PGP: 0x592097DB W3: www.edutel.nl
On Thu, Jul 12, 2018 at 11:44:28PM +0200, M. Piscaer wrote:
From my perspective you are almost squacky clean! I see two invalids 88.159.27.0/24 (invalid, but covered by valid route 88.159.0.0/16) and 94.103.31.0/24 (also covered by valid 94.103.16.0/20). I'm sure there are more, but if you drop these two prefixes it shouldn't result in loss of connectivity because there are covering valid routes.
A patently I needed to publish the RPKI cert of 88.159.27.0/24. Thanks for the headsup.
Yes, it seems that adding a separate extra ROA just for the /24 is better than using "MaxLength=24". Wall of text on what that is :-) https://tools.ietf.org/html/draft-ietf-sidrops-rpkimaxlen Kind regards, Job
Job wrote:
Yes, it seems that adding a separate extra ROA just for the /24 is better than using "MaxLength=24".
Wall of text on what that is :-) https://tools.ietf.org/html/draft-ietf-sidrops-rpkimaxlen
In a world doing RV I agree on above statement. In a world where most transit networks do not drop invalids I think by doing so the risk of suffering heavier from even a "simple prefix hijack" (on purpose or not) and not being able to react quickly might be higher as your "get at least something back /24 announcement" eventually doesn't get far if your transit does RV (and esp. not to the networks doing RV - the networks "stating" to have a cleaner table - what might be still true, but useless if there's a more specific between source and destination). Has anyone ever done measurements on the time from publishing ROAs 'till these show up on their routers and what reasonable timers might be? So please "push" transit networks (the larger the better) and IXes to do the reject of invalids with all the consequences (and don't blame them for the missing prefixes). Then not using MaxLength=24 as a default perfectly makes sense (except some of the cases the draft mentions). For the time being I think every network should carefully think about what fits to them best (service, customer, connectivity,...) 'till RV is wider deployed (in larger transit networks, on IXes as always on, ...). Might be perfect for you to follow the draft if you host Dutch content for EU-Dutch eyeballs, doing 95% of your traffic via RV forced-enabled IXes or in-country transit networks doing RV. It might come with additional challenges if there are between you and your customers some "questionable" networks not doing RV or the IX not doing RV for e.g. the party most of your traffic goes/ comes from. Or when your upstream is the only one doing RV. Cheers, Markus -- FvD, Markus Weber, AS286 KPN EuroRings Germany B.V. Rüsselsheimerstr. 22, DE-60326 Frankfurt Amtsgericht Frankfurt HR99781, GF Jesus Martinez & Hugo van den Akker
From that perspective - yes, close to clean. But don't surprise and annoy customerswithout warning them upfront that eventually their or their customer's (TE) announcement suddently stop to work ... esp. don't give your (my) 1st line NOC a very hard time then in the need to explain it to
Hi Job, | From my perspective you are almost squacky clean! I see two invalids | 88.159.27.0/24 (invalid, but covered by valid route 88.159.0.0/16) and | 94.103.31.0/24 (also covered by valid 94.103.16.0/20). I'm sure there | are more, but if you drop these two prefixes it shouldn't result in loss | of connectivity because there are covering valid routes. the customer when he calls in the middle of the night (esp. when simple IRR filtering often causes headache). |> Yes, there are some (not just a few) invalids out there, not covered |> by not-invalid less specifics: https://as286.net/data/ana-invalids.txt |> (note: not a perfect report, limited to AS286 peer routes, downstream |> routes excluded, potential wrong altpfx for <12 prefixes [...]) | We should compare notes on how you generate this. I looked today and | only 2,200 prefixes become unreachable due to misconfigured RPKI ROAs. | http://instituut.net/~job/rpki-report-2018.07.12.txt - I think the | majority of invalid more-specifics simply are attempts at | traffic-engineering and not critical. Take invalids received from eBGP peers on every PE, note prefix and path. Take "clean" (without invalids) table of DFZ. For every invalid lookup in the invalid free DFZ table if there's a valid same or a less specific. If not, altpfx=NONE and expect connectivity issue. Your 2200 are more or less the 2277 ones in my list with altpfx=NONE Aggregated ~876 ;-) (#of IPs remain the same). Cheers, Markus
On Thu, Jul 12, 2018 at 09:44:47PM +0000, Weber, Markus wrote:
| From my perspective you are almost squacky clean! I see two invalids | 88.159.27.0/24 (invalid, but covered by valid route 88.159.0.0/16) and | 94.103.31.0/24 (also covered by valid 94.103.16.0/20). I'm sure there | are more, but if you drop these two prefixes it shouldn't result in loss | of connectivity because there are covering valid routes.
From that perspective - yes, close to clean. But don't surprise and annoy customers without warning them upfront that eventually their or their customer's (TE) announcement suddently stop to work ... esp. don't give your (my) 1st line NOC a very hard time then in the need to explain it to the customer when he calls in the middle of the night (esp. when simple IRR filtering often causes headache).
Well - if you want to be kind to your customers, just deploy "invalid == reject" on your peering partners first. This way your customers benefit from a form of protection layer, while not being subjected to the same strictness you apply to peers. I bet you most BGP hijacks & misconfigurations come into AS 286 via the peering partners, as all customers already are filtered based on IRR data (if I am not mistaken?).
|> Yes, there are some (not just a few) invalids out there, not covered |> by not-invalid less specifics: https://as286.net/data/ana-invalids.txt |> (note: not a perfect report, limited to AS286 peer routes, downstream |> routes excluded, potential wrong altpfx for <12 prefixes [...])
| We should compare notes on how you generate this. I looked today and | only 2,200 prefixes become unreachable due to misconfigured RPKI ROAs. | http://instituut.net/~job/rpki-report-2018.07.12.txt - I think the | majority of invalid more-specifics simply are attempts at | traffic-engineering and not critical.
Take invalids received from eBGP peers on every PE, note prefix and path. Take "clean" (without invalids) table of DFZ. For every invalid lookup in the invalid free DFZ table if there's a valid same or a less specific. If not, altpfx=NONE and expect connectivity issue.
Your 2200 are more or less the 2277 ones in my list with altpfx=NONE Aggregated ~876 ;-) (#of IPs remain the same).
Ah, so we arrive at same conclusion. I misunderstood your file format. It is of paramount importance that this number is brought down, I think the only way to bring down the number is to deploy Origin Validation and create the feedback loop that 'wrong roa == poor connectivity'. Kind regards, Job
Hi all, Just wanted to share our (AS15703, True B.V.) experience as a hosting provider with enabling RPKI invalid filtering (invalid == reject). We've secured (most of) our routes since 2014 with ROAs but last Tuesday we have deployed filters which reject RPKI invalid routes. So far we have had a grand total of two tickets regarding users in one certain RPKI invalid prefix not being able to reach our network, but those people quickly understood that this wasn't our problem but a problem with their hosting partner. They took it up with their hosting partner and it was fixed within a day. Overal, I would certainly recommend filtering RPKI invalids (and create ROAs for your prefixes!!) to prevent hijacks. -- Met vriendelijke groet / Best regards, Joshua Vijsma
On 12 Jul 2018, at 21:58, Job Snijders <job@ntt.net> wrote:
Volgens mij gaat Nederland wereldleider routing security worden!
Hoe/waar zijn jullie met implementaties van RPKI Origin Validation? Hebben mensen hulp nodig?
Groeten,
Job
----- Forwarded message from Job Snijders <job@ntt.net> -----
Date: Thu, 12 Jul 2018 17:50:29 +0000 From: Job Snijders <job@ntt.net> To: nanog@nanog.org Subject: deploying RPKI based Origin Validation
Hi all,
I wanted to share with you that a ton of activity is taking place in the Dutch networker community to deploy RPKI based BGP Origin Validation. The mantra is "invalid == reject" on all EBGP sessions.
What's of note here is that we're now seeing the first commercial ISPs doing Origin Validation. This is a significant step forward compared to what we observed so far (it seemed OV was mostly limited to academic institutions & toy networks). But six months ago Amsio (https://www.amsio.com/en/) made the jump, and today Fusix deployed (https://fusix.nl/deploying-rpki/).
We've also seen an uptake of Origin Validation at Internet Exchange route servers: AMS-IX and FranceIX have already deployed. I've read that RPKI OV is under consideration at a number of other exchanges.
Other cool news is that Cloudflare launched a Certificate Transparency initiative to help keep everyone honest. Announcement at: https://twitter.com/grittygrease/status/1017224762542587907 Certificate Transparency is a fascinating tool, really a necessity to build confidence in any PKI systems.
Anyone here working to deploy RPKI based Origin Validation in their network and reject invalid announcements? Anything of note to share?
Kind regards,
Job
----- End forwarded message ----- _______________________________________________ NLNOG mailing list NLNOG@nlnog.net http://mailman.nlnog.net/listinfo/nlnog
participants (5)
-
Job Snijders -
Job Snijders -
Joshua Vijsma / True -
M. Piscaer -
Weber, Markus