ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
cross check contribution column later and modify it
2
Datasets found: (Many of the dataset seems to be under the catagory of lingustic forensics)
3
PublicBasic Data Carving Test #1http://dftt.sourceforge.net/test11/index.html

http://dftt.sourceforge.net/

Talk about the collection
4
By requestThe Real Data Corpus (RDC) is a collection of raw data extracted from data-carrying devices that were purchased on the secondary market around the world. https://digitalcorpora.org/corpora/disk-images/real-data-corpus
5
Public NIST is developing Computer Forensic Reference Data Sets (CFReDS) for digital evidence. These reference data sets (CFReDS) provide to an investigator documented sets of simulated digital evidence for examination.https://www.cfreds.nist.gov/
6
Public (Polish)Polish Corpus of Suicide Notes - forensic linguisticshttp://www.pcsn.uni.wroc.pl/
7
Public Brennan Greenstadt Obfuscation corpus (authorship of text)https://psal.cs.drexel.edu/index.php/JStylo-Anonymouth
8
Public (by request) in dutchPersonae Corpus (lingustic - forensic)https://www.clips.uantwerpen.be/datasets/personae-corpus
9
Public Key ingredient to evaluation are data. For PAN's shared tasks on digital text forensics, a number of datasets have been compiled and used to evaluate dozens of approaches. Using these datasets in your research ensures comparability.
authorship verifcation
http://pan.webis.de/data.html
10
Public Authorship verification (created by 10.1145/3098954.3104050)https://www.dropbox.com/sh/f2mlp6u5vervx9b/AABr_c7qrmahCqUviIu3ORz6a?dl=0
11
Public A new Dataset for People Tracking and Reidentification ( created by 10.1145/2072572.2072590)http://www.openvisor.org/3dpes.asp
12
PublicRAISE (RAw ImageS datasEt) RAISE - A Raw Images Dataset for Digital Image Forensics
(created by 10.1145/2713168.2713194)
http://mmlab.science.unitn.it/RAISE/
13
Public DARPA Intrusion Detection Data Setshttps://ll.mit.edu/ideval/data/
14
Not avaliableMemcorp
15
Public?Face reqognition datasetshttp://www.face-rec.org/databases/
16
PublicWikiLeaks began publishing The Global Intelligence Files – more than five million emails from the Texas-headquartered "global intelligence" company Stratfor. The emails date from between July 2004 and latehttps://wikileaks.org/the-gifiles.html
17
Public sample / licenced access to fullsmartphone dataset (SherLock vs Moriarty: A Smartphone Dataset for Cybersecurity Research)http://bigdata.ise.bgu.ac.il/sherlock/#/
18
By invitation only
VirusShare.com - Because Sharing is Caring (System currently contains 29,348,478 samples.)
https://virusshare.com/
19
PublicCopy-move forgery detection using SIFT features (Amerini et al, TIFS 2011).https://github.com/lambertoballan/sift-forensic/blob/master/README.md
20
publicSUrvaliance - CAVIAR Test Case Scenarioshttp://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/
21
Public Microsoft Malware Classification Challenge (BIG 2015)https://www.kaggle.com/c/malware-classification/data
22
By requestThe Drebin Dataset - android malwarehttps://www.sec.cs.tu-bs.de/~danarp/drebin/
23
By request Forensic Voice Comparison Databaseshttp://databases.forensic-voice-comparison.net/
24
By requestELSDSR - speaker recognition datasethttp://www2.imm.dtu.dk/~lfen/elsdsr/index.php?page=avl
25
PublicNOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithmshttp://ecs.utdallas.edu/loizou/speech/noizeus/
26
PublicThe GRID audiovisual sentence corpushttp://spandh.dcs.shef.ac.uk/gridcorpus/
27
PublicSMS corpus forensic lingustic - NUS corpushttps://github.com/kite1988/nus-sms-corpus
28
PublicThe Blog Authorship Corpus (forensic lingistic)http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
29
PublicCleaned DEFCON CTF dataset for data-driven cyber attribution research.http://cysis.engineering.asu.edu/cyber-attribution/
30
By requestGPDS960signature databasehttps://figshare.com/articles/GPDS960signature_database/1287360
31
PublicHand signature genuine and forgories -SVC2004 corpushttp://www.cse.ust.hk/svc2004/download.html
32
PublicWebb spam corpus (identify web spam)https://www.cc.gatech.edu/projects/doi/WebbSpamCorpus.html
33
PublicYahoo Password Frequency Corpushttps://figshare.com/articles/Yahoo_Password_Frequency_Corpus/2057937
34
Public
Search
ICFHR 2010 Signature Verification Competition (4NSigComp2010) - forensic signature analysis
http://www.iapr-tc11.org/mediawiki/index.php/ICFHR_2010_Signature_Verification_Competition_(4NSigComp2010)
35
PublicDroidWare is a synthetic dataset designed to address the problem of malware detection in Android-based environments.https://github.com/RECOVI/DroidWare
36
Public Botnet scenarios dataset - Network traffichttps://cybervan.appcomsci.com:9000/datasets
37
Public Kharon Malware Datasethttp://kharon.gforge.inria.fr/dataset/
38
PublicMAWILab (network traffc anomelies)http://www.fukuda-lab.org/mawilab/
39
PublicKDD Cup 1999 Datahttp://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
40
PublicThe UNSW-NB15 data set description (Malware)/Networkhttps://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/
41
By requestAndroZoo - Android application package dataset (files)https://androzoo.uni.lu/
42
PublicMudflow - Malware Android application traffichttps://www.st.cs.uni-saarland.de/appmining/mudflow/
43
PublicDIT SMS Spam Datasethttp://www.dit.ie/computing/research/resources/smsdata/
44
Public password data sethttp://www.datasciencecentral.com/forum/topics/password-dataset-for-you-to-test-your-data-science-skills
45
Public
Phishing Websites Data Set
http://archive.ics.uci.edu/ml/datasets/Phishing+Websites
46
By requestSpam Trackhttp://trec.nist.gov/data/spam.html
47
Public
Spambase Data Set
http://archive.ics.uci.edu/ml/datasets/Spambase?ref=datanews.io
48
PublicWEBSPAM-UK2007 (current dataset) - spamhttp://chato.cl/webspam/datasets/uk2007/
49
By requestDeceptive Opinion Spam Corpus v1.4http://myleott.com/op_spam/
50
Public
microblogPCU Data Set - spam
https://archive.ics.uci.edu/ml/datasets/microblogPCU
51
Public(multiple of same type) TRECVid Surveillance Event Detection - videohttp://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html

https://www.nist.gov/itl/iad/mig/trecvid-surveillance-event-detection-evaluation-track
52
By requestTweets2001 dataset - socaial media - spamhttp://trec.nist.gov/data/tweets/
53
PublicNetwork excersise datasets (CDX dataset)http://www.usma.edu/crc/sitepages/datasets.aspx
54
PublicCredit Card Fraud Detectionhttps://www.kaggle.com/dalpozz/creditcardfraud
55
Public ADFA IDS Datasets - Network https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-IDS-Datasets/
56
Public Traffic Data from Kyoto University's Honeypots - Networkhttp://www.takakura.com/Kyoto_data/
57
By requestKAIST Multispectral Pedestrian Detection Benchmark - survaliencehttps://sites.google.com/site/pedestrianbenchmark/
58
PublicFraud dataset - UCSD-FICO datamining contest 2009 datasethttps://www.cs.purdue.edu/commugrate/data/credit_card/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180893/
59
PublicISOT Botnet Dataset - Malwarehttp://www.uvic.ca/engineering/ece/isot/datasets/index.php
60
By requestAnalyzing Web Traffic
ECML/PKDD 2007 Discovery Challenge dataset
http://www.lirmm.fr/pkdd2007-challenge/index.html#dataset
61
PublicHTTP DATASET CSIC 2010 - Networkhttp://www.isi.csic.es/dataset/
62
Public Masquerading User Data dataset - networkhttp://www.schonlau.net/intrusion.html
63
Public
default of credit card clients Data Set
https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
64
By requestMedical financial data - realhttps://www.cms.gov/openpayments/explore-the-data/dataset-downloads.html#
65
PublicDigital Corpora Govdoc1https://digitalcorpora.org/corpora
66
PublicAZSecure-datahttp://www.azsecure-data.org/get-data.html
67
Public Enron datasethttps://enrondata.readthedocs.io/en/latest/
68
Public Caltech Pedestrian Detection Benchmark - survalience http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/index.html
69
Public Edinburgh Informatics Forum Pedestrian Databasehttp://homepages.inf.ed.ac.uk/rbf/FORUMTRACKING/
70
Public SARC3D - wide area survalience https://computervisiononline.com/dataset/1105138655

http://imagelab.ing.unimore.it/imagelab/page.asp?IdPage=17
71
By requestViolent Scenes Detection dataset - survaliencehttps://computervisiononline.com/dataset/1105138641

http://org-web4.technicolor.com/en/innovation/scientific-community/scientific-data-sharing/violent-scenes-dataset/download
72
PublicQMUL underGround Re-IDentification (GRID) - survaliencehttp://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html

https://computervisiononline.com/dataset/1105138639
73
By requestDaimler Pedestrian Benchmarkshttps://computervisiononline.com/dataset/1105138626

http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/Daimler_Pedestrian_Segmentatio/daimler_pedestrian_segmentatio.html
74
By requestUvA Person Tracking Benchmarkshttp://www.gavrila.net/Datasets/Univ__of_Amsterdam_Multi-Cam_P/UvA_Multi-Camera_Multi-Person_/uva_multi-camera_multi-person_.html

https://computervisiononline.com/dataset/1105138625
75
PublicCUHK Crowd Dataset - survaliencehttp://www.ee.cuhk.edu.hk/~jshao/CUHKcrowd_files/cuhk_crowd_dataset.htm
76
Public Cars dataset - survaliencehttp://ai.stanford.edu/~jkrause/cars/car_dataset.html
77
By requestThe Comprehensive Cars (CompCars) datasethttp://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html
78
PublicStanford Drone Dataset - survaliencehttp://cvgl.stanford.edu/projects/uav_data/
79
PublicNetwork PCAP files National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC)http://www.netresec.com/?page=MACCDC
80
Public ISTS 12 - PCAP - networkhttp://www.netresec.com/?page=ISTS
81
Public Data Capture from National Security Agency (NSA)http://www.westpoint.edu/crc/SitePages/DataSets.aspx
82
Public Malware dumphttp://contagiodump.blogspot.no/2013/04/collection-of-pcap-files-from-malware.html


https://www.mediafire.com/?a49l965nlayad
83
PublicRepository of PCAP files and malware (talk about it as a collection)http://malware-traffic-analysis.net/
84
Public 2 malware datasets http://moyix.blogspot.no/search?q=dataset
85
Public?Repository of PCAP fileshttp://www.pcapr.net/browse/protos
86
By requestThe CAIDA "DDoS Attack 2007" Datasethttps://www.caida.org/data/passive/ddos-20070804_dataset.xml
87
PublicNetwork repositories describe them as such (BreachDB /PCAPs / DBPortsDB)https://www.evilfingers.com/repository/index.php
88
PublicMalware dataset Special Dataset CTU-13https://stratosphereips.org/category/dataset.html
89
By requestMalware, network, etc Canadian Institute for Cybersecurity datasets are used around the world by universities, private industry and independent researchers.http://www.unb.ca/cic/research/datasets/index.html
90
compiled list (by request)Network Wireless datasets (collection)http://crawdad.org/all-byname.html
91
PublicCapture files from 4SICS Geek Loungehttp://www.netresec.com/?page=PCAP4SICS
92
PublicS4x15 - Digital Bond's S4 conference 2015http://www.netresec.com/?page=DigitalBond_S4
93
PublicA collection of ICS/SCADA PCAPshttps://github.com/automayt/ICS-pcap
94
Public Image Spam Datasethttp://www.cs.jhu.edu/~mdredze/datasets/image_spam/
95
compiled list (public)email datasets etchttp://csmining.org/index.php/spam-email-datasets-.html
96
PublicNetwork and memory datasetshttp://traces.cs.umass.edu/index.php/CpuMem/CpuMem
97
PublicSynthetic datasets generated by the PaySim mobile money simulator (fraud)https://www.kaggle.com/ntnu-testimon/paysim1
98
Public Synthetic datasets generated by the BankSim payments simulator (fraud)https://www.kaggle.com/ntnu-testimon/banksim1
99
Public SMS Spam Collection Dataset https://www.kaggle.com/uciml/sms-spam-collection-dataset
100
PublicCLAIR collection of "Nigerian" fraud emails (email)https://www.kaggle.com/rtatman/fraudulent-email-corpus
101
PublicHillary Clinton's Emails (freedom of information act)https://www.kaggle.com/kaggle/hillary-clinton-emails
102
By requestCommon Crawl on AWS (network)https://aws.amazon.com/public-datasets/common-crawl/
103
collectionsCAIDA Data - Overview of Datasets, Monitors, and Reportshttps://www.caida.org/data/overview/
104
Public Corpus containing 200 multilingual emails (Spanish, English and Portuguese) structured according to the RFC2822 specification.https://figshare.com/articles/Corpus_200_Emails/1326662
105
106
107
108
109
110
Went though
111
http://www.re3data.org
112
https://computervisiononline.com/dataset/1105138639
113
https://www.cooldatasets.com/#Science-Datasets
114
https://datasource.kapsarc.org/pages/home/
115
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
116
117
91 -
118
119
120
Red = will not include
121
Yellow = may include
122
Green Have been included
123
Orange will reference (review new)
124
Compiled listMany datasets of interesthttps://computervisiononline.com/datasets

https://www.cooldatasets.com/#Science-Datasets

https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
125
Compiled list (biometric and forensic)Biometric and Forensic Research Database Cataloghttps://tsapps.nist.gov/BDbC/Search?page=1&sortOrder=Organization
126
Compiled listImage and vision group (Digital Forgery?)http://caiivg.weebly.com/dataset.html
127
compiled listPublicly available PCAP fileshttp://www.netresec.com/?page=PcapFiles
128
Compiled listTest Images and Forensic Challengeshttp://www.forensicfocus.com/images-and-challenges
129
compiled listimage and visionhttp://www.vision.ee.ethz.ch/en/datasets/
130
Compiled listMultimodal Biometric Recognitionhttp://www.lvc.ele.puc-rio.br/projects/Biometric_Recognition/download.html
131
132
Compiled list Mark Dredze

http://www.cs.jhu.edu/~mdredze/code.php
133
Compiled listsNetwork datasetshttps://www.researchgate.net/post/What_are_the_different_datasets_available_for_network_intrusion_detection
134
Compiled listSome financial datasethttps://relational.fit.cvut.cz/searchdiscarded as little information on the content of the dataset
135
136
Compiled list DATASETS FOR CYBER FORENSICShttp://datasets.fbreitinger.de/datasets/
137
Public?Div datasetshttps://github.com/caesar0301/awesome-public-datasets
138
139
140
141
142
143
144
145
resosouce for finding datasetsWelcome to Kaggle Datasetshttps://www.kaggle.com/datasets?sortBy=hottest&group=featured