NEW! Creator of Transformer & Attention Mechanisms for Udacity's Generative AI nanodegree. NEW! Invited by United Nations to deliver opening keynote at 2022 Innovations Dialogue.
[2021 - now] I'm focusing on large language models (LLMs), sparse mixture models, and multimodal NLP.
My M.S. research interests were in deep learning for object detection and 3D understanding, with an emphasis on perception and human-centered AI. In undergrad I specialized in OS, computer networks and security.
I've been fortunate to live in many places: New York, Los Angeles, Atlanta, Philadelphia, Hong Kong, San Jose, Palo Alto, and (currently) San Francisco. In my spare time, I enjoy hiking outdoors and reading history.
Research
ScreenAI: A Vision-Language Model for UI and Infographics Understanding anonymous. International Joint Conferences on Artificial Intelligence (IJCAI), under review paper. 2024.
Explore, Establish, Exploit: Red Teaming Language Models from Scratch Stephen Casper, Jason Lin, et al. International Conference on Learning Representations (ICLR), under review paper. 2023.
Zorro: the masked multimodal Transformer Adria Recasens, Jason Lin, Joao Carreira, et al. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), under review paper. 2023.
Diffusion Models as Visual Reasoners Jason Lin*, Maya Srikanth*. AAAI Conference on Artificial Intelligence (AAAI) workshop, Creative AI Across Modalities paper. poster. 2023.
A Distribution-Aware Approach to Dense Retrieval Jason Lin, Justin Young, Simran Arora. Best Poster Award / CS224N, advised by Christopher Manning (Stanford) paper. poster. 2022.
IOTA: A Cryptographic Perspective Bryan Baek, Jason Lin. Blockchain and Cryptocurrency, advised by Vladimir Kolesnikov (Georgia Tech) paper. Technical Report 2019.
AutoML: Automatic Data Augmentation with LSTM and Policy Optimization Yousef Emam, Jason Lin. Reinforcement Learning and Adaptive Control, advised by Byron Boots (UW) paper. code. 2019.
Interactive Classification for Deep Learning Interpretation Angel Cabrera, Fred Hohman, Jason Lin, Duen Horng Chau. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Demo arXiv. code. 2018.
AdVis: Visualizing and Attributing ML Attacks to scalable Adversarial Inputs Jason Lin, Dilara Soylu. Advanced Computer Vision, advised by James Hays (Georgia Tech) poster. code. YouTube. 2018.
Detecting Graphical Regions of Interest with Gaussian Process Bayesian Optimization Jason Lin, Hakki Mert Torun, Weihua Zhu, Jingjing Pan. Probabilistic Graphical Models, advised by Faramarz Fekri (Georgia Tech) slides. 2018.
Research interests: computer vision, language reasoning, deep learning Contact: jlin401 [at] gatech [dot] edu Google Scholar
During 24 hours of hacking at AngelHack Hong Kong, our team developed a full-stack Android app that allowed users to create bids and a different group of users to auction them based on geolocation proximity (a bit like Tinder for shoppers). With the backend done in NodeJS, it supported user login authentication, database access and modification via MongoDB, and pushed in-app listing updates to users' phones using Twilio. Since we've marketed our sharing economy shopping experience, "Shuber", as an ecosystem that bridges travelers with local sellers, it has won great popularity amongst the participants and judges and we were proud to have advanced as one of the 6 finalists.
TweeTINT
A Twitter sentiment analysis webapp written in Python.
Username: admin Password: tint (first time loading takes couple seconds)
TweeTINT' is a Twitter sentiment analysis webapp that takes in search queries provided by the user and searches for a designated number of related twitter feeds using Twitter's REST API. It then performs natural language processing on them using a Python NLP engine and graphs a bar chart visualization of the sentiment scores collected on the webpage. Other features such as tweet locale and country sent from will be implemented over time. Technologies used: Flask, Tweepy, Pandas, Matplotlib, AlchemyAPI, Heroku
Data Visualization
Visualization of quantitative and non-coordinate data using d3.js.
The decision to explore D3.js was two part - one for the completion of a specialization in Data Science I'm taking, and the other was pure interest from attending a JavaScript conference called ForwardJS. The following are some works I've experimented with D3.js.
This graph visualizes the GISTEMP data for the Globe and the North and South Hemispheres through year 1880 to 2015. For better comparison, multiple colored lines representing the four seasons have been plotted and annotated with a legend, highlighting the nominal differences between the seasons. In particular, as hues make a good indicator for categories, I've chosen a primary/secondary color for each season that best represents its weather - blue for winter, red for summer, green for spring and orange for autumn.
From the line chart and the zero benchmark, we can see that not only has Global Temperature Deviations gone from negative to positive, there is also a shared trend of Mean Global Temperature increasing one-way over the past 50 years. Textual aid like axis labels and a title are also included to help viewers understand the narrative.
This is a bubble graph representing the most commonly occurring topics in questions asked on Piazza for a technical Institute's engineering school. Each of the nodes represents a unique topic, with the color gradient from orange to light blue used to visualize ordinal differences between the number of mentions for each topic. The combined use of saturation, density allows the ordinal relationships to be easily compared. Given the same shape, the bubbles' area difference is also a good indicator of the popularity of topics. Hovering over any node also brings up the number of mentions associated with that topic/word.
I've chosen this set of data because as an Engineering student myself, I've found myself posting questions on the new forum more often than I thought, and this set of data reveals that many of the students are interested in learning more about the fundamental concepts rather than the advanced material. The chart is visualized from D3.js with JSON data.
Groupazza
A chatroom-enabled live discussion group webapp built on top of Piazza.
Developed a study group webapp on top of Piazza's credentials by reverse engineering its unofficial API at HackPiazza. Authenticates through Piazza, extract's the uses' list of classes enrolled. A live chat room is also available for each discussion group created.
By obtaining a token session from OAuth2 authentication with Piazza's server, we are able to extract JSON data objects of each user's registered classes and store that in our own Firebase backend. On the frontend, we used AngularJS to two-way bind the real-time Firebase class & respective chatroom data to our Bootstrap 3 customized webpage implemented in Python Flask. While able to create new chatrooms under each class, user can also browse a data visualization on the most commonly used phrases in each chatroom rendered using d3.js library.
Arduino
Arduino...hacks.
Details coming in the future. Currently working on machine learning. . .
AceCode
Preparing you for your next coding interview
AceCode is an app in progress that is aimed towards computer science students and software developers who are going through coding interviews. A native android app, it supports the rendering of code samples and features a "Play" function that intelligently ranks the topics judging on the user's familiarity. The app is planned to hit the Google Play store in the future, supporting multiple languages.
Innovation Insight
Winner - Best Overall use of Microsoft Technology
At MHacks 6, University of Michigan's premiere hackathon, our team built something revolutionary for the visually impaired to help them navigate around.
Code named Stacy, Innovation Insight is a machine learning based system that redefines the way visually-impaired navigate around. Instead of simply identifying major obstacles and navigating around predefined paths, Innovation Insight listens to the user's commands and help him or her reach a specific object. Using a Kinect V2 for object recognition (OpenCV HAAR) and depth stream, a user could use voice commands to request an object of interest. An arduino powered custom vibration arm sleeve containing 8 servo motors will then kick off the motor that is in the direction of the object to navigate user to the object. To accommodate for rotational hand gestures, we used a Myo to pick up the spatial coordination of the hand so the vibrating motor will switch to one that is always facing the same direction. Also integrated Nest API via Firebase for customizable features.
In-lecture question asking made fun and easy. No need to be shy. Just ask.
Not everyone learns at the same rate. Students often find themselves lagging behind upcoming content in class because they were trying to understand the previous topics. As college students ourselves, we find it tough to follow up with class content once we get stuck somewhere previously.
Professors can sign students into their lectures and students can anonymously post questions to their professors during lecture. Professors can mark questions answered or unanswered, while the questions are public to other students and can be voted as the most popular choice.
Jola
An adaptive and learning Android personal assistant always by your side.
Wouldn't it be good to have someone to talk to and who gets you what you want to do based on your mood or needs? Currently most handheld systems have applications that use artificial intelligence to find results for the users. However, we noticed that these often don't store or learn from the user much over time, so we decided to take it a level up and make the connection more personal.
Jola is a user-habit learning app for Android that doubles as a personal assistant and a caring buddy. Unlike other speech recognition software, Jola has a unique focus on machine learning and information recycling, enabling it to predict users' similar needs in the future.
Speech recognition & synthesizer with Android TextToSpeech
Using Firebase to store key time-based data about user real time
Hosting NLP library on Heroku with Python backend (AlchemyAPI)
Integration of Uber, Yelp, Braintree, Wolfram Alpha and Spotify's APIs
Vectorizor
A native client webapp that converts pencil sketches to colorable SVG.
Vectorize your designs
Tired of converting your pencil sketch mockup into something modifiable in Photoshop? We are here to the rescue. Keep your mind focused on your designs. We'll take care of the rest.
*Currently only supports Chrome browser version 44 or above.
Vectorizor is built in mind for the designers. With most of our world digitialized today, creative content will need to be in digital form to be shared with others across the Internet. However, it is still a lot easier to create content on paper than on, say an iPad today. I've created Vectorizor as a single-page webapp that users can upload their pencil sketches and the app would digitally trace out an outline of the sketch into a downloadable SVG format that can be and digitally rendered. This computer vision project relies on custom optimizations of the OpenCV library, and porting it's C++ instance to the web through Google's NaCl sandbox project.
Houghline Transform using OpenCV
A line can be deteced by finding the number of intersections between curves. The more curves intersecting means that the line represented by that intersection have more points. In general, we can define a threshold of the minimum number of intersections needed to detect a line. View on GitHub
Technologies used:
OpenCV
PNaCl (Portable Native Client)
RapidJSON
PepperJS
Twitter Bootstrap 3
SoundScape
UCLA IDEA Hacks 3rd Place: Distance-sensing speakers that adjust volume according to where you are.
Imagine yourself at your cozy home. You walk from one room to another and the same song will be playing to you at the same volume, from a single source. It's almost as if the music is following you.
SoundScape is a location aware, distance sensing volume control system that essentially dims the volume of individually paired speakers when the user is far away, while upping the volume of ones near the user. What this also means is that as consumers one could easily buy extra speakers to join the existing family and expand the coverage of SoundScape to limitless possibilities.
- Synchronous music is played from multiple devices connected to individual pairs of speakers positioned afar from each other
- WiFi signal strength (RSSI) is used to map out the distance between a hotspot (i.e. smartphone) and WiFi connected Arduino-controlled speakers
- Transistors are wired to the input audio jack to physically modulate volume using bitwise amplified Pulse Width Modulation, while resistors are used to eliminate background noise when distance = 0.
- Arduino-controlled "Smart" speakers each operate promiscuously, though future extensions of cross-device communication is possible as they all connect to a central WiFi hotspot.
Taskify
PennApps XIII: Your digital pal that knows, and plans your life better than you.
We all shared the common problem of running short of time in college, especially when it comes to deadlines and our leisure times. Having very erratic schedules, we feel that college students like ourselves are in great need of a planner that will adapt easily to ourselves, allowing us flexible schedule.
Taskify helps you to kill procrastination by automatically scheduling your "commitments" and "tasks" through a convenient and intelligent voice assistant. It will look at the fixed schedule you have so far, i.e. class/work, and then plan your schedule ahead according to an in-house heuristic algorithm.
Connecting people with the latest events around them effortlessly and socially
Have you ever had the thought of instantly seeing what’s going around you, without iterating through different sources of information? Ever been confused of what to do when you arrive at a new attraction, city, country?
SudoMap is a native Android app that plots nearby events on a Google Map API and allows users to create and search for events using both voice control and manually. Once signed in, users are able to RSVP or bookmark events, and see the list of attendants to each event. Every event also has chatroom that users can chat in depending on the public/private visibility of the event. Events are filtered into 6 categories, and are differentiated by pins of different color. An in-house algorithm is also implemented to compute the most popular events by ranking the number of chat posts per event. Also supports shaking the device for trending event recommendations.
Technologies used:
Voice control & client matching with Houndify voice recognition API
Using Firebase to store events, users, and relational event-chat data real time
Google Maps and Places API to input events location and recognize place names
Facebook API to login and retrieve profile picture and email address
Custom shake detector that computes G-Force from live accelerometer feed
OlympicBPM
LA Hacks Top 10: Distance-sensing speakers that adjust volume according to where you are.
LOS ANGELES - This weekend, more than one thousand high school and college students from across the United States descended upon UCLA’s Pauley Pavilion, the site of LA 2024’s proposed volleyball venue, for LA Hacks, a collaborative computer programming event sponsored by LA 2024 that offered promising technology possibilities for 2024. Bid leaders hailed the event as a reflection of LA 2024’s approach to harnessing California's creativity, innovation and connection with youth culture.
Student programmers worked intensively around the clock on software and hardware across a variety of categories. This included LA 2024 categories on sports entertainment and fitness, with the top two entries winning courtside Los Angeles Clippers and UCLA basketball tickets. All entries that fell under the LA 2024 categories provided valuable insights into the way youth audiences engage with sporting events. LA’s bid leaders will look to how these ideas can assist them in delivering a New Games for a new era to inspire a whole new generation around the Olympic values of friendship, respect and excellence.
"USC student Jason Lin and UCLA students Kai Matsuka, Anny Lin, and Donovan Fung were runners-up in LA 2024's sports technology competition, designing wearable technology hardware which they dubbed "Olympic BPM." Olympic BPM is a wristband that measures fans' heart rates and movements during live sporting events using sensors in the band. Fans' wristbands are connected with their seats, which light up in different colors that correspond with the fans' mood (for example, an excited fan might have a red seat, while a relaxed fan might have a blue one, creating a real-time visual representation of the audience's reactions). Olympic BPM wristbands are also connected to a smartphone app, which allows users to see color-coded visual representations of audience reactions during any sporting event."
UPDATED: Featured in LA2024's Official Press Release.
FooedBar
AngelHack Hong Kong 2016: Curated restaurant recommendations & experience in VR
HONG KONG - In 24 hours I've worked with Sasha, Robert and Daniel on an Android app that features Tinder-like cardstack swipe for food items around the user and intelligently recommends the user a list of restauarants he/she could experience in Virtual Reality using an in-house training set and backend ML algorithm. We came up with the idea because as three of us are from the US, we recognize that Hong Kong offers a dazzling assortment of wonderful restaurants - so much that sometimes it's a tough decision to make of what to eat. With a Google cardboard on hand, I thought of introducing one of the hottest technologies today to the mass, especially since Hong Kong is regarded by many as the "foodie" city.
FooedBar composed of two key components: the machine learning algorithm written in Go and its backend SQL server, as well as the native Android app that incorporates a Google Cardboard VR engine. Our in-house filtering algorithm ranks restaurants and curated menu items in order of decreasing preference by sampling the user's preferences when he/she plays the Tinder-like food item feed - each food item is labeled with properties like "taste", "cuisine", "flavor", etc. that are all recorded into our mySQL webserver instance. Then our algorithm in Go ranks the restaurants, and lists them to the user in List view. The user is able to click into every restaurant, and experience a 360 view inside the restaurant that is also Virtual Reality ready. Last but not least, once the user arrives at a restaurant, he can scan a QR code identifier and our backend server will automatically generate a curated version of the restaurant's menu based on the user's previous preferenes. Above are photos of ourselves recording a 360 photoshere of sample restaurants using Google Street View and after manually stitching it into a binocular photo, we render it in VR mode.
Jane Street etc San Francisco
6th place: Programming high frequency trading bot to compete in Jane Street's day-long programming contest.
ETC is Jane Steet's day-long, invite-only programming contest. Participants form teams with the mission to build a trading bot that will compete with others and the market in maximizing market-making profits with creative strategies.
The Electronic Trading Challenge is a 12-hour marathon where teams of two program a bot that can parse a custom exchange protocol, and exercise a strategy to compete with others in a simulated ETF financial market. I was in charge of network socket programming in Java and optimizing performance of our proprietary leverage arbitrage algorithm.
Textbooks and lengthy documents can be difficult to understand. Leave it to the machine and never miss a clue.
Simplify, review, go!
It sucks to be stuck behind a textbook that you don't fully understand. "Speed of Sound" is a service where you can upload a picture of a page you're having trouble with, and we'll run artificial intelligence algorithms on it to produce a simplified version that's easier to comprehend. We also identify keywords that allow us to display relevant flashcards from Quizlet, making learning more efficient and accessible.
We've developed both Web and iOS applications that work together. One can start on his or her commute home from an iPhone (i.e. snapping a photo of your notes after class) and continue right where he or she left off at home with the comfort of a larger screen. Remember those late nights where you had to cram through everything and simply "get sh** done"? We wanted to make the lives of those who work and study hard late in the evenings much easier and smarter.
Technical Specifications:
Speed of Sound -- named after Coldplay, inspired by our common goal to improve education -- was built using Python for the backend and a mix of Objective-c and Swift for the iOS app. The synthetic analysis was done with Optical Character Recognition (OCR) using Abby, then running it over Natural Language Processing (POS tagging, conceptual analysis and taxonomy prediction) with AlchemyAPI, and finally summarizing all of that information by feeding our NLP output into TextTeaser's parameters. We've also incorporated our customer wrapper API for searching and retrieving Quizlet flash cards relevant to each entry document. As for the backend, we chose SQLAlchemy for the database and hosted it Linode's high performance Linux SSD servers.
HackDartmouth: Rates drive routes based on data analysis of various safety metrics
Mentored and helped team win EverQuote API prize at Dartmouth College. Click here for project details.
VeriRoute is a unique transportation application that allows users to not only identify the quickest path to their destination, but also various safety metrics that characterize each route. When their route is selected, we utilize EverQuote’s anonymous database which characterize various dangerous behaviors, such as phone usage, speeding, acceleration, and braking, along the latitude and longitude locations that characterize their route. We utilize data that represents the frequency of crime-related incidents along the route as well, and compile all of these metrics to provide a subjective, weighted score to depict the safety of each route. The aim is to provide users, particularly those unfamiliar with the areas they are venturing into, with an added measure of security.
Technology stack: We obtained a database from EverQuote, which contained data over a period of time that identified when people at each specific latitude and longitude location were either speeding, accelerating, using their phone, or braking. Each of these points were anonymous, and we used JavaScript to parse this data file to both obtain points within a 0.5 mile radius along the route, and applied a subjective weighting system to characterize the score based on the density of each of the four events described above. We then overlapped crime data in the same fashion, and applied a final score based on this data.
OSx86 Sierra PC
Hackintosh: macOS Sierra on Skylake + Z170 with integrated graphics
Having hacked around with iOS (jailbreak) and Android systems growing up, it has always been my dream to put UNIX based macOS on a cost-efficient PC build one day. The major challenges at the time were the vast amount of compatability issues to overcome - with a late interest in systems, while only a couple months into Intel's latest Skylake launch, I've decided to conquer the task I've longed hoped to accomplish.
Hardware Build:
Dual fan setup: --> --> vs. <-- --> vs. --> <-- vs. <-- <--
Software:
For the hardware build, I tried to implement a dual-fan setup, where I arranged the case fan (12mm diameter) and the water cooler's fan (8mm diameter) in a way that maximizes heat exhaust from within the case. I eventually settled with a PUSH/PULL CPU ventilation architecture, where the smaller fan is sandwiched betweeen the inner-side of the case and the CPU cooler's heatsink, while the larger case fan is attached to the outer side of the case. Theoretically this would allow the heatsink fan to suck the hot air into the cooler's fins and the case fan sucking the air out of the cooler's fins. I hoped it result in lower temperature than single fan systems - after trying solutions both with and without the outer case fan, the results verify my hypothesis, differing between 2-5 degrees celsius.
To run a full-fledged macOS on a custom PC build, I took advantage of Hackintosh's beta support for the latest version of macOS Sierra. Because Skylake was just supported by Apple iMacs then, many driver issues existed. After loading the bootloader Clover and the OS, the following compatibility issues were found, many of which I've manually patched by reconfiguring plist and loading kext patch through Clover:
Apple Airport does not support ASRock's default ASUS WiFi module, so to get Wireless Internet I had to order Apple hardware compatible BCM94352. GitHub tutorials to recognizing the WiFi module can be found here.
Graphics glitch: Apple does not natively support Skylake's integrated graphics. Configured bootup plist and EFI settings to minimize instability.
Sound configuration unable to setup. Cannot hear sound, system appears always muted.
Lack of native support for Bluetooth.
Technical Specifications: CPU: Intel Core i5-6600K Motherboard: ASRock Mini ITX Z170M-ITX/AC Memory: Ballistix Sport LT 16GB DDR4 2400MHZ x2 Cooling System: Corsair Hydro Series Liquid CPU cooler H60 Hard Drive: Crucial MX300 525GB SATA 2.5" Network card: AzureWave Broadcom BCM94352HMB (compatible with Apple drivers) Power Supply (PSU): EVGA 600W B1 Chasis: Thermaltake CORE V1 Snow Edition
Artemis
An AI + IoT integrative virtual dietitian powering your everyday fitness and smart goals
Artemis is an Amazon Alexa experience that changes the way you engage in fitness and meal tracking. Log your food, caloric intake, water consumption and learn the breakdown of your daily diet with a simple voice command. All you have to do is tell Artemis (literally speak) that you ate something, and she'll automatically record it for you, retrieve all pertinent nutrition information, and see how it stacks up with your daily goals. Check how you're doing at anytime by asking Artemis, "How am I doing?" or looking up your stats presented in a clear and digestible way with advanced and extensive data visualization tools.
With limited resources and an Amazon Echo in hand, we've decided to take MyFitnessPal to the next level. Ubiquitous around every moment, we've shared our sentiments that entering our diet daily into our phones manually is a tedious task. With Artemis, we built a virtual nutritionist who not only cable of processing a range of nutrition and health-related queries, but also connects to the Nutrionix API for nutritional data on almost every type of food out there (think Oreos, chicken sandwich, dumplings). As a full-stack system, Artemis is composed of 3 components:
Advanced & adaptive Alexa chatbot skill - capable of on-going conversation with context
Smart water bottle with Arduino-powered IoT sensors that writes to AWS DynamoDB
Interactive, meaningful breath of Data Visualization (sankey diagrams, caloric breakdown, etc.)
Shout out and big thanks to Amazon, Qualtrics, Facebook and Ravi for making this achievement a reality!
Keplerian VR
CS Senior capstone: General Purpose Collaborative Engineering Model Visualization (GP-CEMV)
High-level architecture diagram of the Unity-based KeplerianVR multiplatform engineering collaboration system
Mars rover and other space exploration 3D models (.FBX) imported from NASA's public resources library
Myo armband + leap motion used map hand gestures and movement to object and camera controls
Immerisve VR collaboration experience by free-hand annotation with the HTC Vive controller
Free-hand drawing and Myo arm control synchronized in action
Select and move individual objects using keyboard on browser-end spectator client
In partnership with NASA JPL, the goal of this project is to explore frontiers of novel human-computer interaction, specifically involving input controls through unprecedented physical means. We essentially try to simulate a virtual collaborative room, which could be a conference room, open space, etc. where engineers and collaborators of a team and come together remotely or physically to prototype new hardware that operates in a different kind of space.
Peripheral hardware devices we use for input include the Leap Motion, the Myo armband, and etc.
Millions of dollars are invested in algorithm trading and we decided to develop an open source model which would predict stock market pricing from the long term viewpoint as well as short term trading.
Stockpedia uses Tensorflow layered with Keras to predict future stock prices. As its generative part, Stockpedia produces insightful data visualization with D3.js + ClojureScript to help formulate investment strategies.
Functionality speaking, Stockpedia predicts the opening and closing price for stocks in the future. It makes use of sentient analysis of current news feed to tap on the mood of the market. This is because stock markets are largely based on the actions and decisions of every individual participating in them and every user's action has an impact on others decisions and the optimal outcomes.
We trained a Recurrent Neural Network (RNN) using TensorFlow as backend, performed sentiment analysis of news feed trained and saved in a database created using Hasura API.
Our REST API is cooked up using Hasura.io, a Firebase-like PaaS driven by PostgreSQL as opposed to NoSQL. The frontend is a ClojureScript webapp that provides an interactive visualization of the predictions and suggested strategies for the the user using pre-trained models on 7 cross-industry, representative stocks.
For future work, we hope to analyze weather to predict the trends of the commodity trading market.
Cue: Bias-busting VR
Facebook invite-only Global Hackathon 3rd Place: VR simulation to unconver implicit workplace bias
As a continuation of the Stanford TreeHacks hackathon from 9 months ago, our team was invited to Facebook's Global Hackathon Finals to compete with 15 other Facebook's top teams from regional & collegiate hackathons worldwide.
In under 24 hours, we built a VR conversation simulation with unconscious-bias training scenes to find hidden prejucies, including cultural, racial and gender biases found in the workplace.
The goal of Cue, after discussing workplace discrimination issues with a couple engineers at Facebook in wake of the gender discromination scandal at Google, is to tackle the issue of workplace sexism by addressing ineffectiveness from traditionally dry and uninteresting compliance training questionnaires.
It's disappointing to see that when one Googles ‘VR feminism’ or ‘a woman’s perspective from VR' today we find porn, so we wanted to make VR helpful by turning it into a tool to fight workplace sexism and bias. Ideally, we envisioned to build a perfect simulation of what it’s like to be a woman who’s harassed in the workplace or told her work is not as good because she doesn’t look like a man, but we scaled it down to focus on things that are conceptually easier for people to digest and easier for us to build.
For our hack this time, we chose to focus on virtual reality due to flaws in the typical bias training a company requires: a video or PowerPoint that can be easily ignored or forgotten. Instead, the value of mixing virtual reality with mandatory training programs was that it adds an immersive, 3D perspective, and it was never attempted before.
In terms of the technololgy stack, we used IBM Watson's Speech-to-Text engine to transcribe words spoken by the user during simulation and look for bias pronouns and key phrases that sustain prejudices in the user's response to pre-defined training scenes. We further ran sentiment analysis with LSTM on the answers to look for negative tones and generate word clouds with the most discriminating words captured in the user's speech.
Original prototype supports blurring and selection from pre-defined images
Various deep learning & classical patch-based image completion (inpainting) algorithms compared
A variety of feature attribution and feature visualization techniques explored: Class Activation Maps (CAM), Graphical Saliency, DeepDream
Final conference demo version with video introduction and tutorial cards
Public trust in artificial intelligence and machine learning is essential to its prevalence and widespread acceptance. A central question for the general public and computer vision community regarding the “black-box” mystery of deep learning algorithms is whether machines see what humans see and if we interpret things the same way.
We have designed and developed an interactive system that allows users to experiment with deep learning image classifiers and explore their robustness and sensitivity. Users are able to remove selected areas of an image in real time with classical and deep learning computer vision inpainting algorithms, which allows users to ask a variety of “what if” questions by experimentally modifying images and seeing how the deep learning model reacts. These interactions reveal a wide range of surprising results ranging from spectacular failures (e.g., a “water bottle” image becomes a “concert” when removing a person) to impressive resilience (e.g., a “baseball player” image remains correctly classified even without a glove). The system also computes class activation maps for any selected class, which highlights the discriminative semantic regions of an image the model uses for classification. Combining these tools, users can develop a qualitative insight into what the model sees and what features impact an image’s classification.
To make the system accessible we use the latest web technologies, including Tensorflow.js and React, and modern deep learning image classifiers, SqueezeNet and MobileNet, to create a fully in-browser system anyone can download and use.
Using this tool, researchers and the public could actively experiment with deep learning-based image classifiers and explore their robustness and sensitivity.
This investigation will help people explore the extent to which humans and machines think alike, and shed light on the advantages and potential pitfalls of machine learning. Our demonstration will be available for conference attendees to use live.
Graphical Models in Detecting Regions of Interest (RoI)
Probabilistic Graphical Models: optimizing graph-based Object Detection and Saliency Detection
Region of Interest (RoI)[1] is a portion of image that we are interested in and want to perform some other operation on. RoIs are usually regions that are likely to contain a single coherent topic or object. ROI detection lies in the core of many topics includ- ing image segmentation, object detection, localization and web user interfaces. Among various usage of ROI detection algorithms, we will focus on two of the most significant applications of such methods namely, objectness measure [2] and saliency detection [3, 4].
Our project aims to use Gaussian-Process-based Bayesian Optimization to find better parameters for objectness and saliency algorithms, so that we can better detect object regions in difficult images (e.g., images with small objects, or lots of clutter). Our results show qualitative and quantitative improvement over the baseline for detection and better localization exceeding a newer saliency algorithm with an older baseline of Graph-based Visual Saliency[3].
Gaussian Process Bayesian Optimization
Many machine learning algorithms, including PGM based techniques, has many parameters that needs to be tuned to increase its performance on the task. These parameters can be structure level parameters, i.e. direction of an edge in a Bayesian Network or the choice of a factor function from several possibilities in undirected models; or hyper parameters that are present lower levels of the algorithms. In majority of such algorithms, finding the optimal combination of structure and hyper parameters is an exhausting problem. This is due to mainly two reasons, lack of gradient of the response surface which makes gradient based optimization such as quasi-newton or LBFGS not usable and the CPU extensive function queries which makes the use of evolutionary algorithms impractical. A possible solution for this class of black-box optimization problems is Bayesian Optimization (BO) based on Gaussian Process.
1. Objectness
As the first RoI measure, objectness quantifies how likely it is for an image window to contain an object of any class. Objectness algorithms can sample any desired number of windows from an image, and each window has an objectness score with it.
One of the prime approaches is proposed in What’s an Object?[2]. Its contribution is in twofold. First, the authors define image cues that measures characteristics of objects. Then, it combines in a Bayesian network the four image cues. The cues are multi-scale saliency (MS), color contrast (CC), edge density(ED), and straddleness(SS).
2. Optimized Saliency Detection
With saliency detection, we use the graph-based visual saliency method (GBVS) as a baseline, jointly optimizing all 7 weights for feature channels: intensity, orientation, contrast, flicker, motion, color, which encode feature representations used for activation and saliency computation. Because in the original implementation the weights need not sum to one and were initialized to 1 with a number of them disabled, we first enabled them all on to compute a baseline as shown on the left below.
For each of the 30 images we’ve chosen to optimize for, we compute a ROC score with respect to ground truth as our evaluation metric, as inspired by [3]. The score is defined to be the ROC area using our output saliency map together with FIGRIM fixation loca- tion labels as target points to detect. The ROC area is a fraction out of 1, with 1 being the most in alignment with ground truth. In our experiments we find that the baseline GBVS achieved a range of 0.68-0.72 on our separate FIGRIM data, which agrees with the paper’s results found with a different dataset averaged over 3 observer subjects.
Qualitative Results
References:
[1] Gunhee Kim and Antonio Torralba. Unsupervised detection of regions of interest using iterative link analysis. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Sys- tems 22, pages 961–969. Curran Associates, Inc., 2009.
[2] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 73–80, June 2010.
[3] Jonathan Harel, Christof Koch, and Pietro Perona. Graph-based visual saliency. In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19, pages 545–552. MIT Press, 2007.
[4] S. Goferman, L. Zelnik-Manor, and A. Tal. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10):1915–1926, Oct 2012.
AdVis.js
Tensorflow.js+React app: Real-time visualizing adversarial attacks and Fast Gradient Sign method
In this work, we explore adversarial attacks by visualizing class activation mapping (CAM) and graphical saliency detection as two feature attribution techniques. To monitor the progression of CAM formation, we employ a range of epsilon values as our independent variable, and compare the results under various ImageNet images and image classification model configurations. We also introduce AdVis, an interactive web tool for visualising adversarial attacks in real-time.
Adversarial attacks are performed by perturbing the input to a model such that the perturbation causes the model to misclassify the input, while the perturbation is imperceptible to most humans. A key parameter to adversarial attack is the epsilon value used to determine the scale of perturbation applied in Fast Gradient Sign (FGSM) method.
By varying the amount of perturbation, we validate a relation between pixel-salient regions and density of perturbed pixels. We show that the adversarial images generated using MobileNetV2 does not transform well to attack MobileNetV1. We publish an interactive tool for experimenting with and visualising different epsilon values and adversarial attacks in the browser, using TensorFlow.js on top of MobileNetV1. We hope our work provides a more in depth understanding of the generation of adversarial examples and inspire future defenses against gradient based adversarial attacks.
AdVis lets users explore adversarial attacks interactively by reflecting updates to the classification scores of an image as the user changes the epsilon value deployed for the FGSM attack. Users can also visualize the CAM overlay for a specific class by clicking on one of the rows, which is updated in real-time as they continue to tune their desired epsilon values. AdVis currently supports Fast Gradient Sign Method attacks, with future plans to support additional attack vectors against different classification models.
Georgia Tech: Grad Computer Vision
CS6476: Computer Vision coursework covering implementation of modern visual methods from feature matching to deep learning
Assignments prepared by Dr. James Hays. All implementations are programmed in MATLAB. Click on the links below to view each project's implementation writeup report.
Stereo Matching - Dense Point correspondences via estimating Fundamental matrix and epipolar line.
Dalal-Triggs style HoG face detector - Best-in-class precision at 92.6% mAP.
Scene Recognition - Finetuning pretrained VGG ConvNet to reach ~90% accuracy vs. 63.6% with Bag of Words.
Course website: https://www.cc.gatech.edu/~hays/compvision/
BubblePop AR
Stanford TreeHacks: Best Awareness Hack & Top 8: Meet people outside your opinion bubble.
They say we’re simply a reflection of the five closest people to us. It’s so easy to get comfortable in our own echo chambers, lost in opinions that seem so real because they’re all around us. But sometimes we can’t look at life as a bigger picture unless we take a step back and empathize with people we don’t normally talk about.
BubblePop presents you with a series of opinions. Chocolate is life; Psychopaths are simply less socially inhibited; The US should implement a junk food tax. Swipe up, down, left, or right – your choice. Once done, you’re connected with your furthest neighbor – a stranger who answered the most differently from you. Meet up and say hello!
With the advent of recent social media oligarchies, fake news, and extremist groups, division of opinion has received a reputation as the enemy of societal progress. This shouldn’t be the case, though. We create a place where people can anonymously proclaim their opinions on various articles regarding pressing issues. Then, we match people with differing opinions and encourage them to meet and discuss in a productive fashion. We hope to foster a diverse yet highly empathetic community.