PDF Owl

Jun. 13th, 2024 09:50 pm

My app's shameless promotion is on Reddit.

It generates the table of contents for PDF files using on-device computer vision AI.

Give it a like! It's short of some likes.

I worked on it late last year and earlier this year, and probably made all possible newbie's home marketing mistakes. Including, editing the video by myself and matching the beats to the video instead of hiring someone.

 I feel behind on what's going on with those deepfakes and generative art. It was curious to see the quick presentation looking back at the progress starting with GANs. It was presented by Shafik Quoraishee from the NYT.

It all started in 2014 with Goodfellow's GAN paper.

  • Two NNs play a game: one learns to generate an image, and another says if it's any good. One is a generator, another is a discriminator.
  • it uses Jensen-Shannon Divergence (JSD) for the objective of reducing divergence between the generator and the discriminator.

Improvement 1: 2017, Wasserstein GAN - improved GAN for face generation (paper)

  • replaces JSD by Wasserstein metric which is continuous and differentiable, making easier to optimize

Improvement 2: 2018 ProGAN by NVIDIA (paper)

  • Multiple GANs for different resolution 
  • First, runs a GAN for low resolution, then feeds the result to a higher resolution GAN

Improvement 3: 2018 StyleGAN (paper)

  • learns picture representation as a lenient vector, then some dimensions represent the style
  • old to young dimension, summer/winter, porous/glossy skin, etc

Somehow, we didn't get to stable diffusion, mentioning only that iterates by adjusting the pixels the same way as diffusion spreads.

APPLICATIONS

Inpainting - instead of generating the whole image, select a region to replace by generation.

Fast Neural Style Transfer - a recent advance that makes styling instantaneous (summer to winter, etc).

Emotion Model - transfer emotions. Face swap works the same way.

Wav2Lip - given an audio file and photo of a person, make a video with the person's lips moving according to the speech.

NVC - Neural Voice Cloning. Learns the shape of voice. 

  • The presenter gathered 6Gb of Trump speeches and generated a pretty good fake voice. The problem was to sort out a person's voice with other people voice.

FOMM - first order motion model (link)

  • Makes photos of people alive
  • Preset motions

Thin plate spline motion model 2023 (link)

  • Like FOMM, but takes a reference video for imitating the motion
  • Demo: Musk becomes Robert Downey Jr.

DeepFaceLab - complete tool for deep fakes (github)

  • Full-blown fakes creation studio that includes everything needed: images, voice, motion generation, etc.

Aitana Lopes - AI model/influencer, the person does not exist in real life (IG)

Channel 1 news - AI generated and personalized TV news channel (website - aired yet?). Not about fake news, but about using AI generators to present personalized news broadcasts.

Other:

  • Fencing problem - making many fingers like a fence. Hard to solve, but recently seems to be solved.

Question from the audience:

  • We're heading into the elections in the 1/3 of the world. What will all of these do to our democracies?

I've been coding too much lately.
Computer science major asks:

Hey guys, CS major here. So I decided to leave my room today. What is happening on campus?

To my understand, from reading online forms, there’s a war going on in another country. What does that have to do with Columbia? Better question perhaps is: what is the expected outcome of this protest?

Please explain such that a cs major would understand.

Not so long time ago, with the help of the British and Americans, Israelis wrote some code that is buggy, but kind of works and also makes a buck. Not so long time ago, Palestinians tried to throw away and rewrite that code their way; Israelis got pissed and revoked the git repository access for Palestinians, just exposing some poorly documented APIs for them. Last weekend hackers got the system down via the crappy API access. Everyone is shocked that they could even do it. Israelis will stop the crappy API access and let Palestinians go back to reading paper books and play checkers. Now, Palestinians are pissed and demand their own git repository after all. Israelis, then, argue that they will only use that repository for malicious purposes; the hacker will attempt to hack the system again if not completely wipe out the buggy code – hence they don’t deserve their own repository. One group condemns the hacker; the other group demands a git repository.

 I've got into this class called "Competitive Programming," which is basically a preparation for coding contests – preparation by solving and discussing lots of coding problems. Although my humble motivation is to prepare for coding interviews, that I will have to be doing soon, this class is still useful. This ability to solve coding problems is quite alike to a muscle – you have to exercise it regularly to keep it in shape and perform well.

So far, it feels more fun than crunching LeetCode alone. I went to a couple of 5-hour contests, and been solving a problem or two every day, realizing that I'm quite out of shape with making loops and matching indices, this thing which is usually called "algorithms". We also have a lecture once a week: our professor is super nerdy and quite hilarious – I miss this type in New York where professors tend to be well-versed in everything, fast spoken, and with a dry sense of humor. 

This is how our Prof described the rating of the problems:

  • [0, 1200) - try them and see what they give you
  • [1200, 1900) – problems to nail for passing the class
  • [1900, 2400) - problems for getting a job in big tech
  • [2400, 5000) - problems for getting a job as a quant

So quants are on the top of the pyramid.

Another thing is programming languages to use. C++, Java and Python are present, and other languages are allowed too, but our Prof stressed we better choose C++ – it's the best language to understand fundamental algorithms, and doing competitive programming is the best way to learn C++. Alright, I thought, I'll give it another try – it supposedly changed a lot with the new standards. They got for (int element: vector) iteration syntax now without ugly index++-ing loop – cool... well, not really: once you need the index and not just element in loop, you have to roll back to the ol' good for (int i=0; i<vector.size(); i++). Fine. And here is how you pop from a (max)heap:

std::cout << "initial max heap   : " << v.front() << '\n';
std::pop_heap (v.begin(),v.end()); v.pop_back();

why can't pop_back() return the popped element, I don't know. So, it is still quite ugly. But modern C++ being not really modern is part of the problem – the other part is the code I have seen from other students that don't try to write in modern way. But I'll keep doing it for now. Just want to update myself on modern C++. Maybe later I will rewrite the ugliest algorithms in Scala and indulge myself in the coding aesthetics. 

The problems seem to hold a tradition of long dorky descriptions: even the easiest problem have quite a story to read. Here is an example of an easy one:

NIT Destroys the Universe:

For a collection of integers, S define mex(S) as the smallest non-negative integer that does not appear in S.

NIT, the cleaver, decides to destroy the universe. He is not so powerful as Thanos, so he can only destroy the universe by snapping his fingers several times.

The universe can be represented as a 1-indexed array 𝑎 of length 𝑛. When NIT snaps his fingers, he does the following operation on the array:

  • He selects positive integers l and r such that 1 ≤ l ≤ r ≤ n. Let w = mex({a_𝑙, a_𝑙+1, .., a_r }). Then, for all l ≤ i ≤ r, set a_i to w.

We say the universe is destroyed if and only if for all 1 ≤ i ≤ n, a_i = 0 holds.

Find the minimum number of times NIT needs to snap his fingers to destroy the universe. That is, find the minimum number of operations NIT needs to perform to make all elements in the array equal to 0.

No idea who is Thanos, and who is NIT – so, that's quite a world to explore. Definitely more amusing that crunching LeetCode alone.


Profile

soid

November 2024

S M T W T F S
     12
3456 789
10111213141516
17181920212223
24252627282930

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 23rd, 2025 11:04 am
Powered by Dreamwidth Studios