philh

6 karmaJoined Nov 2022

Comments
6

How to help crucial AI safety legislation pass with 10 minutes of effort

So, that comment just says you can do it without living in the US. It doesn't say it's a good idea, and if there are reasons it's sometimes a bad idea, it doesn't engage with those.

If you have reason to think it's not a problem in this case, can you say what that reason is?

Sorry, but from my perspective as someone living outside the US, this whole thing is really not clear to me.

How to help crucial AI safety legislation pass with 10 minutes of effort

philh5mo1

If you don't explicitly say whether you're a CA resident or not, how would they know?

For the website form I filled in, they could look at my IP address and see that I probably didn't fill it in from California; not reliable, but certainly evidence.

If someone sends an email... maybe if there seems to be a real name attached they'd compare that to voter registrations? I dunno if that info is available to them. If so, that also suggests not sending an anonymous or pseudonymous email.

(And maybe I gave a name when filling in the website form, idk.)

How to help crucial AI safety legislation pass with 10 minutes of effort

philh5mo4

I filled out the form at https://www.gov.ca.gov/contact/.

Would sending an email have been better? I'd be interested to know why if so. I was confused by the conflicting "you should attach it as a pdf" / "we can't receive email attachments" messages. (I sent it before this comment.) And creating a pdf would have been annoying. (I don't think I have a low friction way to work with .docx - a link to a google doc that people can copy, edit, and convert to pdf, might be helpful, if a pdf attachment really is better.)

After filling it out I got

Thank you for the message - your feedback and ideas are a priority to me and my administration. Californians like you are helping us build an even stronger Golden State, and I thank you.

I am not Californian or even American, but I never said I was and they never asked or said "only Californians should fill this out" so ¯\_(ツ)_/¯

That said, I was hesitant to do this previously because this seems like the kind of thing that maybe only Californians should be doing, or they'll ask where you live and ignore everyone who doesn't say California or something? So, in case there are other people like me, it might be worth a paragraph on the subject of non-Californians writing in. (Not just "non Californians can do it too", but answering "is it considered prosocial? / is this burning some commons of mutual self-restraint?" and "should you say you're not Californian?" and "will it have as much effect as a Californian?".)

Closing Notes on Nonlinear Investigation

philh1y3

So would you say that although you have less faith in Ben than before, Alice and Chloe should have more faith in him? That seems wrong to me; I feel like "faith" in context should cash out as something less interpersonal than that? Like it should be a prediction about how Ben will act in future situations. Then "Alice should have more faith in Ben than me" sounds like a prediction that in future Ben will favor team Alice over team Chris; but that's not a prediction I'd make and I don't think it's a prediction you'd make.

(It does seem reasonable to predict something like "in future, Ben will favor team person-who-was-hurt over team person-on-sidelines-who...". But I don't think that's where you're going with this either?)

Time-stamping: An urgent, neglected AI safety measure

philh2y1

I think having to rely on an archive makes this a lot less valuable. If I find something in 2033 and want to prove it existed in 2023, I think that's going to be much harder if I have to rely on the thing itself being archived in 2023, in an archive that still exists in 2033; compared to just relying on the thing being timestamped in 2023.

I also think if you're relying on the Internet Archive, the argument that this is urgent becomes weaker. (And honestly I didn't find it compelling to begin with, though not for legible reasons I could point at.) Consider three possibilities, for something that will be created in May 2023:

We can prove it was created no later than May 2023.
We can prove it was created no later than June 2023.
We can prove it was created no later than June 2023; and that in June 2023, the Internet Archive claimed it was created no later than May 2023.

A one-month delay brings you from (1) to (2) if the IA isn't involved. But if they are, it brings you from (1) to (3). As long as you set it up before IA goes rogue, the cost of delay is lower.

Time-stamping: An urgent, neglected AI safety measure

philh2y1

So I think there's a couple levels this could be at.

There's "it's easy for someone to publish a thing and prove it was published before $time". Honestly that's pretty easy already, depending how much you have a site you can publish to and trust not to start backdating things in future (e.g. Twitter, Reddit, LW). Making it marginally lower friction/marginally more trustless (blockchain) would be marginally good, and I think cheap and easy.

(e: actually LW wouldn't be good for that because I don't think you can see last-edited timestamps there.)

But by itself it seems not that helpful because most people don't do it. So if someone in ten years shows me a video and says it's from today, it's not weird that they can't prove it.

If we could get it to a point where lots of people start timestamping, that would be an improvement. Then it might be weird if someone in the future can't prove something was from today. And the thing Plex said in comments about being able to train on non-AI generated things becomes more feasible. But this is more a social than technical problem.

But I think what you're talking about here is doing this for all public content, whether the author knows or not. And that seems neat, but... the big problem I see here is that a lot of things get edited after publishing. So if I edit a comment on Reddit, either we somehow pick up on that and it gets re-timestamped, or we lose the ability to verify edited comments. And if imgur decides to recompress old files (AI comes up with a new compression mechanism that gives us 1/4 the size of jpg with no visible loss of quality), everything on imgur can no longer be verified, at least not to before the recompression.

So there's an empirical question of how often happens, and maybe the answer is "not much". But it seems like something that even if it's rare, the few cases where it does happen could potentially be enough to lose most of the value? Like, even if imgur doesn't recompress their files, maybe some other file host has done, and you can just tell me it was hosted there.

There's a related question of how you distinguish content from metadata: if you timestamp my blog for me, you want to pick up the contents of the individual posts but not my blog theme, which I might change even if I don't edit the posts. Certainly not any ads that will change on every page load. I can think of two near-solutions for my blog specifically:

I have an RSS feed. But I write in markdown which gets rendered to HTML for the feed. If the markdown renderer changes, bad luck. I suppose stripping out all the HTML tags and just keeping the text might be fine?
To some extent this is a problem already solved by e.g. firefox reader mode, which tries to automatically extract and normalize the content from a page. But I don't by default expect a good content extractor today to be a good content extractor in ten years. (E.g. people find ways to make their ads look like content to reader mode, so reader mode updates to avoid those.) So you're hoping that a different content-extraction tool, applied to the same content in a different wrapper, extracts the exact same result.

This problem goes away if you're also hosting copies of everything, but that's no longer cheap. At that point I think you're back to "addition to the internet archive" discussed in other comments; you're only really defending against the internet archive going rogue (though this still seems valuale), and there's a lot that they don't capture.

Still. I'd be interested to see someone do this and then in a year go back and check how many hashes can be recreated. And I'd also be interested in the "make it marginally easier for people to do this themselves" thing; perhaps combine a best-effort scan of the public internet, with a way for people to add their own content (which may be private), plus some kind of standard for people to point the robots at something they expect to be stable. (I could implement "RSS feed but without rendering the markdown" for my blog.) Could implement optional email alerts for "hey all your content changed when we rescanned it, did you goof?"

philh

Comments6

Comments
6