Work out data volumes by source type

| metadata type=sourcetypes
| noop sample_ratio=1000
| append [ search index=*  
   | eval size=length(_raw) 
   | stats avg(size) as average_event_size by sourcetype index
   ]
| stats values(totalCount) as total_events values(average_event_size) as average_event_size by sourcetype
| addinfo
| eval period_days=(info_max_time-info_min_time)/(24*60*60)
| eval totalMB_per_day=floor(total_events*average_event_size/period_days/1024/1024)
| table sourcetype totalMB_per_day

purpose:

Efficiently calculate how much data is being indexed per day by source type. Very useful for calculating enterprise security data volumes

requirements:

Requires 6.3.x or later for the event sampling feature

comments:

Combines results from | metadata for counts and then multiplies this by the average event size. Automatically accounts for time ranges. You will need to modify the sample rate to be suitable for your data volume. The metadata search turns out to be very approximate and counts the values associate with buckets, if you have buckets which are open for a very long time it will take the value for the entire period of the bucket, not the period of your search time range. Consider using tstats if this is an issue in your environment.

Show links between tags, sourcetypes, apps and datamodels.

| rest splunk_server=local count=0 /servicesNS/-/-/admin/datamodel-files 
| spath input=eai:data output=base_search path=objects{}.baseSearch 
| spath input=eai:data output=constraints path=objects{}.constraints{}.search 
| eval tag_content = mvappend(base_search,constraints) 
| rex max_match=0 field=tag_content "tag=\"?(?<tag_name>\w+)\"?" 
| mvexpand tag_name 
| rename title AS datamodel 
| append 
    [| rest splunk_server=local count=0 /servicesNS/-/-/admin/eventtypes 
    | rename eai:acl.app AS app tags AS tag_name 
    | search app="*TA*" 
    | rex max_match=0 field=search "sourcetype=\"?(?<sourcetype>[^\s\"^)]+)\"?" 
    | mvexpand sourcetype 
    | mvexpand tag_name 
    | eval app_sourcetype=mvzip(app,sourcetype,"__") 
    | stats list(tag_name) as tag_name by app, sourcetype,app_sourcetype ] 
| stats list(datamodel) as datamodel, list(app) as app, list(app_sourcetype) as app_sourcetype by tag_name 
| search datamodel=* 
| stats values(datamodel) as datamodel, values(tag_name) as tags by app_sourcetype 
| eval tags=mvdedup(tags) 
| rex max_match=0 field=app_sourcetype "\"?(?<app>.+)__\"?" 
| rex max_match=0 field=app_sourcetype "__\"?(?<sourcetype>.+)\"?" 
| fields - app_sourcetype

purpose:

This search answers the questions which dashboards will my new add-on be used for in Enterprise Security.

requirements:

comments:

This is useful to see which app populates which datamodel in Enterprise Security or any other environment which datamodels.