Use of Shannon Entropy Estimation for DGA Detection

 Syed Qutb      12/12/2023 10:30      114

For threat hunters and security researchers, Advanced Persistent Threats (APTs) are always one step ahead especially in case of cyber-attacks involving Domain Generation Algorithms (DGAs) in which attackers circumvent defenders by drafting thousands of FQDN using quick random seeds that maintains communication between a virus or ransomware with its C&C (command & control) servers. Taking down these malicious FQDN is a challenging task, as Cybersecurity professionals have to identify these domains one by one that are often up for only limited periods of time having rapid rotation of DGA seeds. Thus, signature-based detection is near to impossible.

To cater this problem, we used machine learning approach using Splunk’s URL toolbox Shannon calculator. The approach focuses on identifying anomalies in evolving FQDN patterns by using Shannon’s Entropy. We selected Shannon’s entropy, as it serves as a good metric to quantifying the entropy (uncertainty / information content) of a given domain name. If the entropy score of domains exceeds the defined threshold value (i.e. >4.2 in our case) then it can serve as a useful indicator to identify DGA-based FQDN. Hence, the higher the entropy index, the more likely a given DNS was algorithmically generated. The Splunk SPL query we implemented to analyze the dataset is appended as under:

| inputlookup dns_dga.css
| inputlookup dns_legit_1m.csv append=t
| sample ratio=0.0001
| rex field=domain max_match=1 "(?.*)\..*$"
| table domain subdomain type subtype
| dedup subdomain
| `ut_shannon(subdomain)`
| `ut_meaning(subdomain)`
| eval ut_digit_ratio = 0.0
| eval ut_vowel_ratio = 0.0
| eval ut_domain_length = max(1,len(domain))
| rex field=subdomain max_match=0 "(?\d)"
| rex field=subdomain max_match=0 "(?[aieou])"
| eval ut_digit_ratio=if(isnull(digits),0.0,mvcount(digits) / ut_domain_length)
| eval ut_vowel_ratio=if(isnull(vowels),0.0,mvcount(vowels) / ut_domain_length)
| eval ut_consonant_ratio = max(0.0, 1.000000 - ut_digit_ratio - ut_vowel_ratio)
| eval ut_vc_ratio = ut_vowel_ratio / ut_consonant_ratio
| fields - digits - vowels
| fit TFIDF analyzer=char ngram_range=1-3 subdomain into "DNS_DGA_TIDIF_PREPROCESS_MIX_D865K_L1M"
| fit PCA subdomain_tfidf* k=3
| table * PC_*
| fields - digits - vowels - subdomain_*
| fit RandomForestClassifier n_estimators=10 type from PC_* ut_consonant_ratio ut_digit_ratio ut_domain_length ut_meaning_ratio ut_shannon ut_vc_ratio ut_vowel_ratio

This solution helps us in identification of Domain Generation Algorithms (explained in MITRE ATT&CK T1483) at real time, and it has been implemented in our EUNOMATIX MLDETECT app. For more details and functionality of our ML based detection framework, please contact EUNOMATIX, info@eunomatix.com.



References
https://attack.mitre.org/techniques/T1568/002/
https://redcanary.com/blog/threat-hunting-entropy/

Leave a Comment



About.

Established in 2012, EUNOMATIX is fast-paced, growing company that is committed to innovation, excellence and provide state of the art network and security solutions to their clients. EUNOMATIX has a track record of quality service to companies across the US, UK, Europe and Middle East.

Our out-of-the-box and proactive security approach gives customer the capability to reduce their OpEx and CapEx through a systematic security implementation plan. A list of customers currently engaged with us for Managed Security Operations, Machine Learning Analytics and Threat Hunting include companies mainly from government, defence, telecommunication and health sectors. However, we at EUNOMATIX also provision services for the university research labs and networks as these comparatively more challenging in terms of technology and rich feature perspectives.