Analysis of Unique URLs Retrieval Patterns in Web Logs
This analysis delves into the retrieval patterns of unique URLs from web logs, revealing that a significant percentage of URLs are re-presented from different client IP addresses. The study identifies top repeaters and examines instances of potential proxy device usage based on AS locations.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Heres looking at you Geoff Huston
The Theory We use Google Ads to deliver a test script to a very large profile of users We measure the DNS, DNSSEC, IPv6, performance, and many other aspects of the end user s view of the Internet We have some 500,000 ads delivered per day And each of them use uniquely generated URLs So, in theory we should see each unique URL retrieved once Right?
Here is what we see in the web logs [22/Jan/2014:00:10:21 +0000] 120.194.53.xxx "GET /1x1.png?t10000.u3697062917.s1390349413.i333.v1794.rd.td [22/Jan/2014:00:11:29 +0000] 221.176.4.xxx "GET /1x1.png?t10000.u3697062917.s1390349413.i333.v1794.rd.td 68 seconds later: -- SAME URL -- 120.194.53.xxx Origin AS = 24445 -- 221.176.4.xxx Origin AS = 9808
How widespread is this? 48 days in 2013: 29,171,864 unique URLS presented to end users 612,089 of these URLS were re-presented to us from a different client IP address That s 2.1% of URLs fetches that seem to have attracted a digital stalker!
The Top Repeaters RankIP Address 1 119.147.146.xxx 2 182.18.208.xxx 3 182.18.209.xxx 4 124.6.181.xxx 5 112.198.64.xxx 6 203.177.74.xxx 7 120.28.64.xxx 8 211.125.138.xxx 9 210.94.41.xxx 10 222.127.223.xxx 11 210.143.35.xxx 12 202.156.10.xxx 13 14.1.193.xxx 14 183.90.103.xxx 15 202.246.252.xxx 16 192.51.44.xxx 17 183.90.41.xxx 18 110.34.0.xxx 19 110.232.92.xxx 20 37.19.108.xxx 21 24.186.96.xxx 22 161.53.179.xxx 23 193.254.230.xxx 24 121.54.54.xxx 25 77.244.114.xxx Count 11,241 1,0982 5,046 5,046 4,641 3,315 3,230 3,098 1,414 1,269 1,177 1,154 1,128 1,069 995 887 774 704 638 603 573 535 534 500 484 AS 4134 23944 23944 4775 4775 4775 4775 9619 6619 4775 2516 10091 45960 55430 2526 2510 55430 4007 23679 44143 6128 2108 25304 10139 42779 AS Name CHINANET-BACKBONE No.31,Jin-rong Street CN SKYBB-AS-AP AS-SKYBroadband SKYCable Corporation PH SKYBB-AS-AP AS-SKYBroadband SKYCable Corporation PH GLOBE-TELECOM-AS Globe Telecoms PH GLOBE-TELECOM-AS Globe Telecoms PH GLOBE-TELECOM-AS Globe Telecoms PH GLOBE-TELECOM-AS Globe Telecoms PH SSD Sony Global Solutions Inc. JP SAMSUNGSDS-AS-KR SamsungSDS Inc. KR GLOBE-TELECOM-AS Globe Telecoms PH KDDI KDDI CORPORATION JP SCV-AS-AP StarHub Cable Vision Ltd SG YTLCOMMS-AS-AP YTL COMMUNICATIONS SDN BHD MY STARHUBINTERNET-AS-NGNBN Starhub Internet Pte Ltd SG HITNET HITACHI,Ltd. Information Technology Division. JP INFOWEB FUJITSU LIMITED JP STARHUBINTERNET-AS-NGNBN Starhub Internet Pte Ltd SG Subisu Cablenet (Pvt) Ltd, Baluwatar, Kathmandu, Nepal NP NUSANET-AS-ID Media Antar Nusa PT. ID VIPMOBILE-AS Vip mobile d.o.o. RS CABLE-NET-1 - Cablevision Systems Corp. US CARNET-AS Croatian Academic and Research Network HR UNITBV Universitatea TRANSILVANIA Brasov RO SMARTBRO-PH-AP Smart Broadband, Inc. PH AZERFON Azerfon AS AZ
Web Proxies? A strong indicator of a proxy device is that it is located in the same AS as the end client. So lets filter that list and look at those repeaters that use a different AS from the original request And here s what we see
Different Origin AS Repeaters RankIP Address 1 119.147.146.xxx 2 220.181.158.xxx 3 123.125.161.xxx 4 210.133.104.xxx 5 202.214.150.xxx 6 112.65.211.xxx 7 221.176.4.xxx 8 62.84.94.xxx 9 212.40.141.xxx 10 101.69.163.xxx 11 59.162.23.xxx 12 8.35.201.xxx 13 118.186.36.xxx 14 190.96.112.xxx 15 202.155.113.xxx 16 118.228.151.xxx 17 123.125.73.xxx 18 69.41.14.xxx 19 118.97.198.xxx 20 112.215.11.xxx 21 122.2.0.xxx 22 176.28.78.xxx 23 14.139.97.xxx 24 211.155.120.xxx 25 121.96.61.xxx CountAS 8,886 4134 493 446 285 266 248 226 204 203 163 158 156 149 147 143 142 136 133 131 128 125 123 120 116 114 AS Name CHINANET-BACKBONE No.31,Jin-rong Street CN CHINANET-IDC-BJ IDC, China Telecommunications Corporation CN CHINA169-BJ CNCGROUP IP China169 Beijing Province Network CN DNP Dai Nippon Printing Co., Ltd JP IIJ Internet Initiative Japan Inc. JP CNCGROUP-SH China Unicom Shanghai network CN CMNET-GD Guangdong Mobile Communication Co.Ltd. CN FiberLink Networks LB SODETEL-AS SODETEL SAL LB CHINA169-BACKBONE CNCGROUP China169 Backbone CN TATACOMM-AS TATA Communications IN GOOGLE - Google Inc. US CHINANET-IDC-BJ IDC, China Telecommunications Corporation CN Empresa Provincial de Energia de Cordoba AR INDOSATM2-ID INDOSATM2 ASN ID ERX-CERNET-BKB China Education and Research Network Center CN CHINA169-BJ CNCGROUP IP China169 Beijing Province Network CN CE-BGPAC - Covenant Eyes, Inc. US TELKOMNET-AS2-AP PT Telekomunikasi Indonesia ID JKTXLNET-AS-AP PT Excelcomindo Pratama ID IPG-AS-AP Philippine Long Distance Telephone Company PH ELSUHD-AS Elsuhd Net Ltd. Communications and Computer Services IQ RSMANI-NKN-AS-AP National Knowledge Network IN CHINANET-IDC-BJ IDC, China Telecommunications Corporation CN BAYAN Bayan Telecommunications, Inc. PH 23724 4808 7677 2497 17621 9808 16130 31126 4837 4755 15169 23724 262150 4795 4538 4808 47018 17974 17885 9299 197893 55824 23724 6648
Maybe its National Infrastructure We ve all heard about the Great Firewall of China And other countries may be doing similar things So perhaps these repeaters are the result of some form of national / regional content cache program So lets filter this further by using geolocate information to find those cases where the original end client and the digital stalker locate to different countries
Different Country Stalkers RankIP Address 1 119.147.146.xxx 2 8.35.201.xxx 3 190.216.130.xxx 4 190.27.253.xxx 5 61.92.16.xxx 6 208.80.194.xxx 7 112.140.187.xxx 8 69.41.14.xxx 9 126.117.225.xxx 10 113.43.175.xxx 11 202.249.25.xxx 12 139.193.204.xxx 13 180.13.45.xxx 14 201.221.124.xxx 15 123.125.161.xxx 16 220.181.158.xxx 17 208.184.77.xxx 18 183.179.254.xxx 19 203.192.154.xxx 20 139.193.223.xxx 21 175.134.140.xxx 22 210.187.58.xxx 23 195.93.102.xxx 24 221.82.58.xxx 25 167.205.22.xxx CountAS 7,001 4134 156 84 82 62 53 33 32 31 29 26 25 22 21 21 17 17 16 16 13 12 12 12 12 12 AS Name CHINANET-BACKBONE No.31,Jin-rong Street CN 15169 GOOGLE - Google Inc. US 3549 GBLX Global Crossing Ltd. AR 19429 ETB - Colombia CO 9269 HKBN-AS-AP Hong Kong Broadband Network Ltd. HK 13448 WEBSENSE Websense, Inc. US 45634 SPARKSTATION-SG-AP 10 Science Park Road SG 47018 CE-BGPAC - Covenant Eyes, Inc. US 17676 GIGAINFRA Softbank BB Corp. JP 17506 UCOM UCOM Corp. JP 4717 AI3 WIDE Project JP 23700 BM-AS-ID PT. Broadband Multimedia, Tbk ID 4713 OCN NTT Communications Corporation JP 27989 BANCOLOMBIA S.A CO 4808 CHINA169-BJ CNCGROUP China169 Beijing Province Network CN 23724 CHINANET-IDC-BJ IDC, China Telecommunications Corporation CN 6461 MFNX MFN - Metromedia Fiber Network US 9269 HKBN-AS-AP Hong Kong Broadband Network Ltd. HK 10026 PACNET Pacnet Global Ltd JP 23700 BM-AS-ID PT. Broadband Multimedia, Tbk ID 2516 KDDI KDDI CORPORATION JP 4788 TMNET-AS-AP TM Net, Internet Service Provider MY 1668 AOL-ATDN - AOL Transit Data Network GB 17676 GIGAINFRA Softbank BB Corp. JP 4796 BANDUNG-NET-AS-AP Institute of Technology Bandung ID
Different Country Stalkers RankIP Address 1 119.147.146.xxx 2 8.35.201.xxx 3 190.216.130.xxx 4 190.27.253.xxx 5 61.92.16.xxx 6 208.80.194.xxx 7 112.140.187.xxx 8 69.41.14.xxx 9 126.117.225.xxx 10 113.43.175.xxx 11 202.249.25.xxx 12 139.193.204.xxx 13 180.13.45.xxx 14 201.221.124.xxx 15 123.125.161.xxx 16 220.181.158.xxx 17 208.184.77.xxx 18 183.179.254.xxx 19 203.192.154.xxx 20 139.193.223.xxx 21 175.134.140.xxx 22 210.187.58.xxx 23 195.93.102.xxx 24 221.82.58.xxx 25 167.205.22.xxx CountAS 7,001 4134 156 84 82 62 53 33 32 31 29 26 25 22 21 21 17 17 16 16 13 12 12 12 12 12 AS Name CHINANET-BACKBONE No.31,Jin-rong Street CN 15169 GOOGLE - Google Inc. US 3549 GBLX Global Crossing Ltd. AR 19429 ETB - Colombia CO 9269 HKBN-AS-AP Hong Kong Broadband Network Ltd. HK 13448 WEBSENSE Websense, Inc. US 45634 SPARKSTATION-SG-AP 10 Science Park Road SG 47018 CE-BGPAC - Covenant Eyes, Inc. US 17676 GIGAINFRA Softbank BB Corp. JP 17506 UCOM UCOM Corp. JP 4717 AI3 WIDE Project JP 23700 BM-AS-ID PT. Broadband Multimedia, Tbk ID 4713 OCN NTT Communications Corporation JP 27989 BANCOLOMBIA S.A CO 4808 CHINA169-BJ CNCGROUP China169 Beijing Province Network CN 23724 CHINANET-IDC-BJ IDC, China Telecommunications Corporation CN 6461 MFNX MFN - Metromedia Fiber Network US 9269 HKBN-AS-AP Hong Kong Broadband Network Ltd. HK 10026 PACNET Pacnet Global Ltd JP 23700 BM-AS-ID PT. Broadband Multimedia, Tbk ID 2516 KDDI KDDI CORPORATION JP 4788 TMNET-AS-AP TM Net, Internet Service Provider MY 1668 AOL-ATDN - AOL Transit Data Network GB 17676 GIGAINFRA Softbank BB Corp. JP 4796 BANDUNG-NET-AS-AP Institute of Technology Bandung ID
Lets zoom in for a second And look at the distribution of the clients who were stalked by 119.147.146.xxx Which countries were the clients located?
MD ME MK MM MN MO MP MT MU MX MY NC NI NL NO NP NZ OM PA PE PH PK PL PR PS PT RO RS RU RW SA SE SG SI SK SR 2 7 69 2 36 37 4 4 7 107 Mexico 375 Malaysia 1 New Caledonia 1 Nicaragua 15 Netherlands 8 Norway 1 Nepal 20 New Zealand 1 Oman 11 Panama 29 Peru 166 Philippines 1 Pakistan 340 Poland 7 Puerto Rico 9 Occupied Palestinian Territory 1 Portugal 197 Romania 62 Serbia 32 Russian Federation 1 Rwanda 24 Saudi Arabia 3 Sweden 83 Singapore 13 Slovenia 13 Slovakia 2 Suriname Republic of Moldova Montenegro Macedonia Myanmar Mongolia Macao Northern Mariana Islands Malta Mauritius EC EG ES FR GB GE GR GY HK HN HR HU ID IE IL IN IQ IT JM JO JP KE KG KH KR KW KZ LA LK LT LV MA 8 22 38 68 45 12 25 1 721 Hong Kong 1 Honduras 9 Croatia 67 Hungary 159 Indonesia 16 Ireland 8 Israel 32 India 21 Iraq 52 Italy 5 Jamaica 2 Jordan 2,910 Japan 1 Kenya 1 Kyrgyzstan 28 Cambodia 27 Republic of Korea 1 Kuwait 11 Kazakhstan 6 Laos 11 Sri Lanka 12 Lithuania 6 Latvia 6 Morocco Ecuador Egypt Spain France United Kingdom Georgia Greece Guyana RankCountCountry AE 27 AG 2 AL 32 AM 13 AR 19 AT 5 AU 21 AW 6 AZ 8 BA 27 BD 1 BE 10 BG 45 BN 1 BO 1 BR 44 BS 1 BY 7 BZ 4 CA 125 Canada CL 13 CN 4,622 China CO 11 CR 1 CW 2 CY 1 CZ 37 DE 21 DO 2 DZ 19 United Arab Emirates Antigua and Barbuda Albania Armenia Argentina Austria Australia Aruba Azerbaijan Bosnia and Herzegovina Bangladesh Belgium Bulgaria Brunei Darussalam Bolivia Brazil Bahamas Belarus Belize Chile Colombia Costa Rica Cura ao Cyprus Czech Republic Germany Dominican Republic Algeria
SV TH TN TR TW UA US UZ VC VE VN YE 3 138 3 57 1,241 Taiwan 37 371 1 1 16 249 1 El Salvador Thailand Tunisia Turkey Ukraine United States of America Uzbekistan Saint Vincent and the Grenadines Venezuela Vietnam Yemen
What the? That s an impressive list of countries! And our collection of 30 million URLs across 49 days is a mere drop in the ocean of web fetches on the Internet So are we glimpsing here the tip of some much larger program of URL stalking?
Accident? Deliberate? Something Else? Why go to all the trouble to collect URLs but use the same IP address to perform the followup stalking? Is this some kind of deliberate leakage from a middleware device? Or the result of some kind of a virus? Or the outcome of TOR + virus? Or a smart, but at the same time remarkably dumb, digital stalking program? Or <insert your favourite conspiracy theory here>