A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | cross check contribution column later and modify it | |||||||||||||||||||||||||
2 | Datasets found: | (Many of the dataset seems to be under the catagory of lingustic forensics) | ||||||||||||||||||||||||
3 | Public | Basic Data Carving Test #1 | http://dftt.sourceforge.net/test11/index.html http://dftt.sourceforge.net/ Talk about the collection | |||||||||||||||||||||||
4 | By request | The Real Data Corpus (RDC) is a collection of raw data extracted from data-carrying devices that were purchased on the secondary market around the world. | https://digitalcorpora.org/corpora/disk-images/real-data-corpus | |||||||||||||||||||||||
5 | Public | NIST is developing Computer Forensic Reference Data Sets (CFReDS) for digital evidence. These reference data sets (CFReDS) provide to an investigator documented sets of simulated digital evidence for examination. | https://www.cfreds.nist.gov/ | |||||||||||||||||||||||
6 | Public (Polish) | Polish Corpus of Suicide Notes - forensic linguistics | http://www.pcsn.uni.wroc.pl/ | |||||||||||||||||||||||
7 | Public | Brennan Greenstadt Obfuscation corpus (authorship of text) | https://psal.cs.drexel.edu/index.php/JStylo-Anonymouth | |||||||||||||||||||||||
8 | Public (by request) in dutch | Personae Corpus (lingustic - forensic) | https://www.clips.uantwerpen.be/datasets/personae-corpus | |||||||||||||||||||||||
9 | Public | Key ingredient to evaluation are data. For PAN's shared tasks on digital text forensics, a number of datasets have been compiled and used to evaluate dozens of approaches. Using these datasets in your research ensures comparability. authorship verifcation | http://pan.webis.de/data.html | |||||||||||||||||||||||
10 | Public | Authorship verification (created by 10.1145/3098954.3104050) | https://www.dropbox.com/sh/f2mlp6u5vervx9b/AABr_c7qrmahCqUviIu3ORz6a?dl=0 | |||||||||||||||||||||||
11 | Public | A new Dataset for People Tracking and Reidentification ( created by 10.1145/2072572.2072590) | http://www.openvisor.org/3dpes.asp | |||||||||||||||||||||||
12 | Public | RAISE (RAw ImageS datasEt) RAISE - A Raw Images Dataset for Digital Image Forensics (created by 10.1145/2713168.2713194) | http://mmlab.science.unitn.it/RAISE/ | |||||||||||||||||||||||
13 | Public | DARPA Intrusion Detection Data Sets | https://ll.mit.edu/ideval/data/ | |||||||||||||||||||||||
14 | Not avaliable | Memcorp | ||||||||||||||||||||||||
15 | Public? | Face reqognition datasets | http://www.face-rec.org/databases/ | |||||||||||||||||||||||
16 | Public | WikiLeaks began publishing The Global Intelligence Files – more than five million emails from the Texas-headquartered "global intelligence" company Stratfor. The emails date from between July 2004 and late | https://wikileaks.org/the-gifiles.html | |||||||||||||||||||||||
17 | Public sample / licenced access to full | smartphone dataset (SherLock vs Moriarty: A Smartphone Dataset for Cybersecurity Research) | http://bigdata.ise.bgu.ac.il/sherlock/#/ | |||||||||||||||||||||||
18 | By invitation only | VirusShare.com - Because Sharing is Caring (System currently contains 29,348,478 samples.) | https://virusshare.com/ | |||||||||||||||||||||||
19 | Public | Copy-move forgery detection using SIFT features (Amerini et al, TIFS 2011). | https://github.com/lambertoballan/sift-forensic/blob/master/README.md | |||||||||||||||||||||||
20 | public | SUrvaliance - CAVIAR Test Case Scenarios | http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/ | |||||||||||||||||||||||
21 | Public | Microsoft Malware Classification Challenge (BIG 2015) | https://www.kaggle.com/c/malware-classification/data | |||||||||||||||||||||||
22 | By request | The Drebin Dataset - android malware | https://www.sec.cs.tu-bs.de/~danarp/drebin/ | |||||||||||||||||||||||
23 | By request | Forensic Voice Comparison Databases | http://databases.forensic-voice-comparison.net/ | |||||||||||||||||||||||
24 | By request | ELSDSR - speaker recognition dataset | http://www2.imm.dtu.dk/~lfen/elsdsr/index.php?page=avl | |||||||||||||||||||||||
25 | Public | NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms | http://ecs.utdallas.edu/loizou/speech/noizeus/ | |||||||||||||||||||||||
26 | Public | The GRID audiovisual sentence corpus | http://spandh.dcs.shef.ac.uk/gridcorpus/ | |||||||||||||||||||||||
27 | Public | SMS corpus forensic lingustic - NUS corpus | https://github.com/kite1988/nus-sms-corpus | |||||||||||||||||||||||
28 | Public | The Blog Authorship Corpus (forensic lingistic) | http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm | |||||||||||||||||||||||
29 | Public | Cleaned DEFCON CTF dataset for data-driven cyber attribution research. | http://cysis.engineering.asu.edu/cyber-attribution/ | |||||||||||||||||||||||
30 | By request | GPDS960signature database | https://figshare.com/articles/GPDS960signature_database/1287360 | |||||||||||||||||||||||
31 | Public | Hand signature genuine and forgories -SVC2004 corpus | http://www.cse.ust.hk/svc2004/download.html | |||||||||||||||||||||||
32 | Public | Webb spam corpus (identify web spam) | https://www.cc.gatech.edu/projects/doi/WebbSpamCorpus.html | |||||||||||||||||||||||
33 | Public | Yahoo Password Frequency Corpus | https://figshare.com/articles/Yahoo_Password_Frequency_Corpus/2057937 | |||||||||||||||||||||||
34 | Public | Search ICFHR 2010 Signature Verification Competition (4NSigComp2010) - forensic signature analysis | http://www.iapr-tc11.org/mediawiki/index.php/ICFHR_2010_Signature_Verification_Competition_(4NSigComp2010) | |||||||||||||||||||||||
35 | Public | DroidWare is a synthetic dataset designed to address the problem of malware detection in Android-based environments. | https://github.com/RECOVI/DroidWare | |||||||||||||||||||||||
36 | Public | Botnet scenarios dataset - Network traffic | https://cybervan.appcomsci.com:9000/datasets | |||||||||||||||||||||||
37 | Public | Kharon Malware Dataset | http://kharon.gforge.inria.fr/dataset/ | |||||||||||||||||||||||
38 | Public | MAWILab (network traffc anomelies) | http://www.fukuda-lab.org/mawilab/ | |||||||||||||||||||||||
39 | Public | KDD Cup 1999 Data | http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html | |||||||||||||||||||||||
40 | Public | The UNSW-NB15 data set description (Malware)/Network | https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/ | |||||||||||||||||||||||
41 | By request | AndroZoo - Android application package dataset (files) | https://androzoo.uni.lu/ | |||||||||||||||||||||||
42 | Public | Mudflow - Malware Android application traffic | https://www.st.cs.uni-saarland.de/appmining/mudflow/ | |||||||||||||||||||||||
43 | Public | DIT SMS Spam Dataset | http://www.dit.ie/computing/research/resources/smsdata/ | |||||||||||||||||||||||
44 | Public | password data set | http://www.datasciencecentral.com/forum/topics/password-dataset-for-you-to-test-your-data-science-skills | |||||||||||||||||||||||
45 | Public |
Phishing Websites Data Set | http://archive.ics.uci.edu/ml/datasets/Phishing+Websites | |||||||||||||||||||||||
46 | By request | Spam Track | http://trec.nist.gov/data/spam.html | |||||||||||||||||||||||
47 | Public |
Spambase Data Set | http://archive.ics.uci.edu/ml/datasets/Spambase?ref=datanews.io | |||||||||||||||||||||||
48 | Public | WEBSPAM-UK2007 (current dataset) - spam | http://chato.cl/webspam/datasets/uk2007/ | |||||||||||||||||||||||
49 | By request | Deceptive Opinion Spam Corpus v1.4 | http://myleott.com/op_spam/ | |||||||||||||||||||||||
50 | Public | microblogPCU Data Set - spam | https://archive.ics.uci.edu/ml/datasets/microblogPCU | |||||||||||||||||||||||
51 | Public | (multiple of same type) TRECVid Surveillance Event Detection - video | http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html https://www.nist.gov/itl/iad/mig/trecvid-surveillance-event-detection-evaluation-track | |||||||||||||||||||||||
52 | By request | Tweets2001 dataset - socaial media - spam | http://trec.nist.gov/data/tweets/ | |||||||||||||||||||||||
53 | Public | Network excersise datasets (CDX dataset) | http://www.usma.edu/crc/sitepages/datasets.aspx | |||||||||||||||||||||||
54 | Public | Credit Card Fraud Detection | https://www.kaggle.com/dalpozz/creditcardfraud | |||||||||||||||||||||||
55 | Public | ADFA IDS Datasets - Network | https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-IDS-Datasets/ | |||||||||||||||||||||||
56 | Public | Traffic Data from Kyoto University's Honeypots - Network | http://www.takakura.com/Kyoto_data/ | |||||||||||||||||||||||
57 | By request | KAIST Multispectral Pedestrian Detection Benchmark - survalience | https://sites.google.com/site/pedestrianbenchmark/ | |||||||||||||||||||||||
58 | Public | Fraud dataset - UCSD-FICO datamining contest 2009 dataset | https://www.cs.purdue.edu/commugrate/data/credit_card/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180893/ | |||||||||||||||||||||||
59 | Public | ISOT Botnet Dataset - Malware | http://www.uvic.ca/engineering/ece/isot/datasets/index.php | |||||||||||||||||||||||
60 | By request | Analyzing Web Traffic
ECML/PKDD 2007 Discovery Challenge dataset | http://www.lirmm.fr/pkdd2007-challenge/index.html#dataset | |||||||||||||||||||||||
61 | Public | HTTP DATASET CSIC 2010 - Network | http://www.isi.csic.es/dataset/ | |||||||||||||||||||||||
62 | Public | Masquerading User Data dataset - network | http://www.schonlau.net/intrusion.html | |||||||||||||||||||||||
63 | Public |
default of credit card clients Data Set | https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients | |||||||||||||||||||||||
64 | By request | Medical financial data - real | https://www.cms.gov/openpayments/explore-the-data/dataset-downloads.html# | |||||||||||||||||||||||
65 | Public | Digital Corpora Govdoc1 | https://digitalcorpora.org/corpora | |||||||||||||||||||||||
66 | Public | AZSecure-data | http://www.azsecure-data.org/get-data.html | |||||||||||||||||||||||
67 | Public | Enron dataset | https://enrondata.readthedocs.io/en/latest/ | |||||||||||||||||||||||
68 | Public | Caltech Pedestrian Detection Benchmark - survalience | http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/index.html | |||||||||||||||||||||||
69 | Public | Edinburgh Informatics Forum Pedestrian Database | http://homepages.inf.ed.ac.uk/rbf/FORUMTRACKING/ | |||||||||||||||||||||||
70 | Public | SARC3D - wide area survalience | https://computervisiononline.com/dataset/1105138655 http://imagelab.ing.unimore.it/imagelab/page.asp?IdPage=17 | |||||||||||||||||||||||
71 | By request | Violent Scenes Detection dataset - survalience | https://computervisiononline.com/dataset/1105138641 http://org-web4.technicolor.com/en/innovation/scientific-community/scientific-data-sharing/violent-scenes-dataset/download | |||||||||||||||||||||||
72 | Public | QMUL underGround Re-IDentification (GRID) - survalience | http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html https://computervisiononline.com/dataset/1105138639 | |||||||||||||||||||||||
73 | By request | Daimler Pedestrian Benchmarks | https://computervisiononline.com/dataset/1105138626 http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/Daimler_Pedestrian_Segmentatio/daimler_pedestrian_segmentatio.html | |||||||||||||||||||||||
74 | By request | UvA Person Tracking Benchmarks | http://www.gavrila.net/Datasets/Univ__of_Amsterdam_Multi-Cam_P/UvA_Multi-Camera_Multi-Person_/uva_multi-camera_multi-person_.html https://computervisiononline.com/dataset/1105138625 | |||||||||||||||||||||||
75 | Public | CUHK Crowd Dataset - survalience | http://www.ee.cuhk.edu.hk/~jshao/CUHKcrowd_files/cuhk_crowd_dataset.htm | |||||||||||||||||||||||
76 | Public | Cars dataset - survalience | http://ai.stanford.edu/~jkrause/cars/car_dataset.html | |||||||||||||||||||||||
77 | By request | The Comprehensive Cars (CompCars) dataset | http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html | |||||||||||||||||||||||
78 | Public | Stanford Drone Dataset - survalience | http://cvgl.stanford.edu/projects/uav_data/ | |||||||||||||||||||||||
79 | Public | Network PCAP files National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) | http://www.netresec.com/?page=MACCDC | |||||||||||||||||||||||
80 | Public | ISTS 12 - PCAP - network | http://www.netresec.com/?page=ISTS | |||||||||||||||||||||||
81 | Public | Data Capture from National Security Agency (NSA) | http://www.westpoint.edu/crc/SitePages/DataSets.aspx | |||||||||||||||||||||||
82 | Public | Malware dump | http://contagiodump.blogspot.no/2013/04/collection-of-pcap-files-from-malware.html https://www.mediafire.com/?a49l965nlayad | |||||||||||||||||||||||
83 | Public | Repository of PCAP files and malware (talk about it as a collection) | http://malware-traffic-analysis.net/ | |||||||||||||||||||||||
84 | Public | 2 malware datasets | http://moyix.blogspot.no/search?q=dataset | |||||||||||||||||||||||
85 | Public? | Repository of PCAP files | http://www.pcapr.net/browse/protos | |||||||||||||||||||||||
86 | By request | The CAIDA "DDoS Attack 2007" Dataset | https://www.caida.org/data/passive/ddos-20070804_dataset.xml | |||||||||||||||||||||||
87 | Public | Network repositories describe them as such (BreachDB /PCAPs / DBPortsDB) | https://www.evilfingers.com/repository/index.php | |||||||||||||||||||||||
88 | Public | Malware dataset Special Dataset CTU-13 | https://stratosphereips.org/category/dataset.html | |||||||||||||||||||||||
89 | By request | Malware, network, etc Canadian Institute for Cybersecurity datasets are used around the world by universities, private industry and independent researchers. | http://www.unb.ca/cic/research/datasets/index.html | |||||||||||||||||||||||
90 | compiled list (by request) | Network Wireless datasets (collection) | http://crawdad.org/all-byname.html | |||||||||||||||||||||||
91 | Public | Capture files from 4SICS Geek Lounge | http://www.netresec.com/?page=PCAP4SICS | |||||||||||||||||||||||
92 | Public | S4x15 - Digital Bond's S4 conference 2015 | http://www.netresec.com/?page=DigitalBond_S4 | |||||||||||||||||||||||
93 | Public | A collection of ICS/SCADA PCAPs | https://github.com/automayt/ICS-pcap | |||||||||||||||||||||||
94 | Public | Image Spam Dataset | http://www.cs.jhu.edu/~mdredze/datasets/image_spam/ | |||||||||||||||||||||||
95 | compiled list (public) | email datasets etc | http://csmining.org/index.php/spam-email-datasets-.html | |||||||||||||||||||||||
96 | Public | Network and memory datasets | http://traces.cs.umass.edu/index.php/CpuMem/CpuMem | |||||||||||||||||||||||
97 | Public | Synthetic datasets generated by the PaySim mobile money simulator (fraud) | https://www.kaggle.com/ntnu-testimon/paysim1 | |||||||||||||||||||||||
98 | Public | Synthetic datasets generated by the BankSim payments simulator (fraud) | https://www.kaggle.com/ntnu-testimon/banksim1 | |||||||||||||||||||||||
99 | Public | SMS Spam Collection Dataset | https://www.kaggle.com/uciml/sms-spam-collection-dataset | |||||||||||||||||||||||
100 | Public | CLAIR collection of "Nigerian" fraud emails (email) | https://www.kaggle.com/rtatman/fraudulent-email-corpus | |||||||||||||||||||||||
101 | Public | Hillary Clinton's Emails (freedom of information act) | https://www.kaggle.com/kaggle/hillary-clinton-emails | |||||||||||||||||||||||
102 | By request | Common Crawl on AWS (network) | https://aws.amazon.com/public-datasets/common-crawl/ | |||||||||||||||||||||||
103 | collections | CAIDA Data - Overview of Datasets, Monitors, and Reports | https://www.caida.org/data/overview/ | |||||||||||||||||||||||
104 | Public | Corpus containing 200 multilingual emails (Spanish, English and Portuguese) structured according to the RFC2822 specification. | https://figshare.com/articles/Corpus_200_Emails/1326662 | |||||||||||||||||||||||
105 | ||||||||||||||||||||||||||
106 | ||||||||||||||||||||||||||
107 | ||||||||||||||||||||||||||
108 | ||||||||||||||||||||||||||
109 | ||||||||||||||||||||||||||
110 | Went though | |||||||||||||||||||||||||
111 | http://www.re3data.org | |||||||||||||||||||||||||
112 | https://computervisiononline.com/dataset/1105138639 | |||||||||||||||||||||||||
113 | https://www.cooldatasets.com/#Science-Datasets | |||||||||||||||||||||||||
114 | https://datasource.kapsarc.org/pages/home/ | |||||||||||||||||||||||||
115 | https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public | |||||||||||||||||||||||||
116 | ||||||||||||||||||||||||||
117 | 91 - | |||||||||||||||||||||||||
118 | ||||||||||||||||||||||||||
119 | ||||||||||||||||||||||||||
120 | Red = will not include | |||||||||||||||||||||||||
121 | Yellow = may include | |||||||||||||||||||||||||
122 | Green Have been included | |||||||||||||||||||||||||
123 | Orange will reference (review new) | |||||||||||||||||||||||||
124 | Compiled list | Many datasets of interest | https://computervisiononline.com/datasets https://www.cooldatasets.com/#Science-Datasets https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public | |||||||||||||||||||||||
125 | Compiled list (biometric and forensic) | Biometric and Forensic Research Database Catalog | https://tsapps.nist.gov/BDbC/Search?page=1&sortOrder=Organization | |||||||||||||||||||||||
126 | Compiled list | Image and vision group (Digital Forgery?) | http://caiivg.weebly.com/dataset.html | |||||||||||||||||||||||
127 | compiled list | Publicly available PCAP files | http://www.netresec.com/?page=PcapFiles | |||||||||||||||||||||||
128 | Compiled list | Test Images and Forensic Challenges | http://www.forensicfocus.com/images-and-challenges | |||||||||||||||||||||||
129 | compiled list | image and vision | http://www.vision.ee.ethz.ch/en/datasets/ | |||||||||||||||||||||||
130 | Compiled list | Multimodal Biometric Recognition | http://www.lvc.ele.puc-rio.br/projects/Biometric_Recognition/download.html | |||||||||||||||||||||||
131 | ||||||||||||||||||||||||||
132 | Compiled list | Mark Dredze
| http://www.cs.jhu.edu/~mdredze/code.php | |||||||||||||||||||||||
133 | Compiled lists | Network datasets | https://www.researchgate.net/post/What_are_the_different_datasets_available_for_network_intrusion_detection | |||||||||||||||||||||||
134 | Compiled list | Some financial dataset | https://relational.fit.cvut.cz/search | discarded as little information on the content of the dataset | ||||||||||||||||||||||
135 | ||||||||||||||||||||||||||
136 | Compiled list | DATASETS FOR CYBER FORENSICS | http://datasets.fbreitinger.de/datasets/ | |||||||||||||||||||||||
137 | Public? | Div datasets | https://github.com/caesar0301/awesome-public-datasets | |||||||||||||||||||||||
138 | ||||||||||||||||||||||||||
139 | ||||||||||||||||||||||||||
140 | ||||||||||||||||||||||||||
141 | ||||||||||||||||||||||||||
142 | ||||||||||||||||||||||||||
143 | ||||||||||||||||||||||||||
144 | ||||||||||||||||||||||||||
145 | resosouce for finding datasets | Welcome to Kaggle Datasets | https://www.kaggle.com/datasets?sortBy=hottest&group=featured |